# Higgsfield Canvas — Node-Based Movie Workflow Builder

- **Date:** 2026-06-09
- **Status:** Approved (design) — pending spec review
- **Author:** Oded (with Claude)

## 1. Goal

Build a standalone, dark-themed **node-canvas web app** that mirrors the Higgsfield
Canvas: the user drag-creates image/video "boxes," configures each (prompt, reference
images, model, aspect ratio, resolution, length), wires boxes together, and runs either a
single box or the whole pipeline. When two boxes are connected, the **last frame of the
upstream box is fed into the downstream box as its first reference image**, enabling
shot-to-shot visual continuity for building a movie.

## 2. Background & key constraint

The Higgsfield **MCP** (connected in the Claude chat) can NOT be called from a standalone
web page — those tools only exist inside the Claude session. Therefore the web app must
talk to Higgsfield through its **own** path: the public **Higgsfield Cloud API**
(`api.higgsfield.ai`, base `cloud.higgsfield.ai` dashboard for keys), authenticated with a
Bearer API key.

Higgsfield ships official SDKs we will use instead of hand-rolling HTTP:
- Node/TS SDK: https://github.com/higgsfield-ai/higgsfield-js
- Python SDK: https://github.com/higgsfield-ai/higgsfield-client

Observed endpoint shapes (to be confirmed from the SDK README during planning):
- `POST /v1/text2image/soul` — text-to-image (params: prompt, width_and_height, quality, batch_size)
- `POST /v1/image2video/dop` — image-to-video (params: model, prompt, input_images, motions)
- Status polling via request id; statuses: `Queued`, `In progress`, `Completed`, `Failed`, `NSFW`, `Cancelled`
- The Node SDK supports a `withPolling` option that polls automatically.

> **Cost note:** The Cloud API bills on **Cloud credits**, separate from the user's Ultra
> plan (109 credits visible via MCP). Cost preview on the Run button is out of MVP scope.

## 3. Architecture

```
┌──────────────────────────────┐        ┌───────────────────────────┐       ┌────────────────────┐
│ Browser (React + React Flow)  │  HTTP  │ Local backend (Node/Express)│  SDK │ Higgsfield Cloud   │
│ • canvas, nodes, edges        │ ─────► │ • holds API key (.env)     │ ────► │ api.higgsfield.ai  │
│ • per-node settings & run     │        │ • proxy generate / status  │       │                    │
│ • pipeline runner (topo order)│ ◄───── │ • upload media → HF handle │ ◄──── │                    │
│                               │        │ • ffmpeg last-frame extract│       │                    │
└──────────────────────────────┘        └───────────────────────────┘       └────────────────────┘
```

A thin backend is **required** so that (a) the API key never reaches the browser, (b) CORS
is avoided, (c) ffmpeg can extract a video's last frame, and (d) local images can be
uploaded to Higgsfield's storage to obtain handles/URLs Higgsfield's servers can read.

### 3a. Critical integration detail — image inputs must be Higgsfield-reachable
Higgsfield's `input_images` reference media by URL/handle. A `localhost` URL is **not**
reachable by Higgsfield's servers. Therefore both **user-uploaded reference images** and
**ffmpeg-extracted last frames** must be **uploaded to Higgsfield's own media storage**
(the SDK's upload/media API, analogous to the MCP `media_upload`/`media_confirm` flow) to
obtain a handle/URL usable in a generation request. The backend `/api/upload` endpoint
forwards bytes to Higgsfield and returns that handle. (Exact upload API confirmed during
planning.)

## 4. Components

### Frontend (React + TypeScript + Vite)
- **Canvas** — React Flow (`@xyflow/react`): pan/zoom, node placement, edges with handles.
- **Toolbar** — "Add Image box", "Add Video box", "Run pipeline", save/load/export.
- **BoxNode component** — renders prompt, reference-image strip, settings, status badge,
  output preview (img or `<video>`), per-node Run/download/delete.
- **Graph store** — Zustand: nodes, edges, per-node runtime state.
- **Pipeline runner** — client orchestration: topo-sort, drive backend calls in order.
- **Styling** — Tailwind, dark theme matching the Higgsfield Canvas look.

### Backend (Node + Express + Higgsfield SDK)
- `POST /api/upload` — multipart bytes → uploads to Higgsfield → returns media handle/URL.
- `POST /api/generate` — `{ kind, model, prompt, params, inputImages[] }` → submits via SDK
  → returns `{ jobId }`.
- `GET /api/status/:jobId` — returns `{ status, outputUrl?, error? }`.
- `POST /api/extract-frame` — `{ videoUrl }` → ffmpeg extracts last frame → uploads to
  Higgsfield → returns `{ handle/url }`. (`ffmpeg-static` + `fluent-ffmpeg`.)
- API key read from `.env` (`HIGGSFIELD_API_KEY`); never sent to the browser.

## 5. Data model

```ts
type NodeKind = 'image' | 'video';
type RunStatus = 'idle' | 'queued' | 'running' | 'done' | 'failed';

interface MediaRef {
  source: 'upload' | 'piped' | 'url';  // 'piped' = came from an upstream box
  url: string;                          // Higgsfield handle/URL (uploaded) or remote URL
  id?: string;
}

interface BoxNodeData {
  kind: NodeKind;
  prompt: string;
  referenceImages: MediaRef[];          // index 0 is the "first reference" / piped slot
  model: string;
  aspectRatio: string;                  // e.g. '16:9', '9:16', '1:1'
  resolution: string;                   // e.g. '720p', '1080p'
  duration?: number;                    // seconds — video only
  // runtime
  status: RunStatus;
  jobId?: string;
  output?: { type: 'image' | 'video'; url: string; lastFrameUrl?: string };
  error?: string;
}
// React Flow node holds { id, position, data: BoxNodeData }
// React Flow edge holds { id, source, target } (standard)
```

## 6. Connections & the last-frame mechanic (core feature)

- Drag from a box's **output handle** to another box's **input handle** to create an edge.
- **Resolve-to-image rule:** at run time the upstream result becomes a single image —
  - image box → its `output.url`
  - video box → its `output.lastFrameUrl` (ffmpeg-extracted then uploaded to Higgsfield)
- That resolved image is injected as the downstream box's **first reference image**
  (`referenceImages[0]`, `source: 'piped'`). User-uploaded refs are preserved; the piped
  frame occupies/overwrites the first slot.
- **Multiple incoming edges (MVP rule):** the earliest-created incoming edge supplies the
  first reference slot; any additional upstreams are appended as further references.
- **Cycles:** the graph must be acyclic; a detected cycle blocks "Run pipeline" with an error.

## 7. Run modes

- **Run this box** — runs a single node with its current inputs (uploaded refs + any piped
  frame already resolved). Does not trigger upstream/downstream.
- **Run pipeline** — Kahn topological sort of nodes by edges, then run in dependency order:
  1. For each node, ensure all upstream nodes are `done`.
  2. Resolve piped input image(s) from upstream outputs (extract+upload frame if upstream is video).
  3. Submit generation; poll status until `Completed`/`Failed`.
  4. On `Failed`/`NSFW`, mark node `failed` and **skip its descendants**; sibling branches continue.
  - **MVP executes sequentially** in topo order (simple, predictable). Parallelizing
    independent branches is a later optimization.
- Live per-node status badges update throughout.

## 8. Persistence

- Workflow graph (nodes + edges + settings, excluding large binaries) saved to
  **localStorage** with autosave.
- **Export / Import JSON** for sharing/backup. Output media are referenced by URL/handle,
  not embedded.

## 9. Tech stack

React + TypeScript + Vite · `@xyflow/react` (React Flow) · Zustand · Tailwind CSS ·
Node + Express · official Higgsfield Node SDK · `ffmpeg-static` + `fluent-ffmpeg`.
Repo layout: `/client` (Vite app) and `/server` (Express), with a Vite dev proxy to the
backend.

## 10. Scope

**MVP (this build):**
- Create/configure Image and Video boxes (prompt, reference upload, model, aspect ratio,
  resolution, duration).
- Connect boxes; last-frame → first-reference passing.
- Run single box and Run pipeline (sequential, topo order, status, failure handling).
- Output previews (image/video) per box.
- Save/load (localStorage) + JSON export/import.

**Later (not now):**
- ⚡ Credit-cost preview on the Run button (needs a cost endpoint).
- Extra left-toolbar tools (text/notes nodes), like/lock per-node actions.
- Multi-workflow management; user accounts/multi-user.
- Parallel branch execution.

## 11. To confirm during planning (from the official SDK)
- Exact SDK install name, client init, and method signatures for text2image, image2video,
  status/polling, and **media upload**.
- The **media upload** API (to host user images & extracted frames as Higgsfield handles).
- Supported **model list** and each model's allowed aspect ratios / resolutions / duration
  ranges (to drive the settings dropdowns). MVP may seed a small curated config and make it
  dynamic later.

## 12. Prerequisites
- A **Higgsfield Cloud API key** from `cloud.higgsfield.ai`, placed in `server/.env` as
  `HIGGSFIELD_API_KEY`. (User confirmed they have one.)
- Node 24 + npm (present). ffmpeg provided via `ffmpeg-static` (no system install needed).

## 13. Success criteria
1. User can add ≥2 boxes, connect them, configure prompts/settings, and click Run pipeline.
2. Upstream box generates; its last frame is extracted and used as the downstream box's
   first reference; downstream box generates a visually-continuous result.
3. "Run this box" runs exactly one node.
4. Workflow survives a page reload (localStorage) and can be exported/imported as JSON.
5. API key is never exposed to the browser.