# Atlas Modes, Dynamic Models & Pricing — Design

- **Date:** 2026-06-09
- **Status:** Approved (design) — pending spec review
- **Builds on:** `2026-06-09-higgsfield-canvas-design.md` (the node canvas) + the Atlas Cloud provider swap.

## 1. Goal

Expand each canvas box from a fixed image/video kind into a **mode → model → dynamic-settings** picker covering **all** Atlas Cloud models, and show **live pricing** per box and for the whole pipeline.

The five modes:

| Mode | Label | Output | Required input |
|------|-------|--------|----------------|
| `t2i` | Text → Image | image | none |
| `i2i` | Image → Image | image | image |
| `t2v` | Text → Video | video | none |
| `i2v` | Image → Video | video | image |
| `v2v` | Video → Video | video | video |

## 2. Key facts established from the live Atlas API

- **Pricing is flat per generation per model variant.** Each model exposes
  `price.actual.base_price` (a string dollar amount, discount already applied) and
  `price.discount`. There is **no per-second / per-resolution formula** in the API.
  → Per-box price = the selected model's `base_price`. It changes with mode/model, **not**
  with duration/resolution sliders.
- **~235 console-visible image/video models.** Modes map by `type` (Image/Video) + the
  model-id suffix, e.g. `/text-to-image`, `/edit`, `/text-to-video`, `/image-to-video`,
  `/video-to-video`. ~40 video variants (e.g. `/infinite-image-to-video`, `/animate-*`,
  `/start-end-to-video`, `/video-extend`) need the schema's `required` fields as a tiebreaker.
- **Per-model schemas are fetchable** (`/api/v1/models` → each entry's `schema` URL → an
  OpenAPI doc; read `components.schemas.Input.properties` + `.required`).

## 3. Box data model (replaces the fixed `kind`)

```ts
export type Mode = 't2i' | 'i2i' | 't2v' | 'i2v' | 'v2v';

export interface BoxNodeData {
  mode: Mode;
  model: string;                     // exact Atlas model id, or '' until chosen
  params: Record<string, unknown>;   // schema-driven settings (NOT media inputs)
  mediaInputs: MediaRef[];           // user-uploaded refs; chaining fills the primary slot at run time
  // runtime
  status: RunStatus;
  jobId?: string;
  output?: NodeOutput;               // { type: 'image'|'video'; url; lastFrameUrl? }
  error?: string;
  // display-only
  price?: number;                    // resolved model base price, for the chip
}
```

Output type is derived from mode: `t2i|i2i → image`, else `video`. Input requirement is
derived from mode: `t2i|t2v → none`, `i2i|i2v → image`, `v2v → video`.

Changing `mode` clears `model` and `params`. Changing `model` resets `params` to the new
schema's defaults (so stale params from a previous model never leak into a request).

## 4. Backend

Extends the Atlas layer. New modules and endpoints; `atlas.ts`, `jobRegistry.ts`,
`frameExtractor.ts` are unchanged.

### 4a. `catalog.ts` — model list + classification + pricing
- Fetches `https://api.atlascloud.ai/api/v1/models` once, caches in-memory with a ~10 min TTL.
- Keeps only `display_console === true` and `type` ∈ {Image, Video}.
- **Classifier** `classify(model): Mode | null`:
  - Image: id matches `/(edit|reference-to-image)/` → `i2i`; else `t2i`.
  - Video: id matches `/(video-to-video|video-edit|video-extend)/` → `v2v`;
    else matches `/(image-to-video|start-end-to-video|reference-to-video|infinite-image-to-video|animate)/` → `i2v`;
    else matches `/text-to-video/` → `t2v`;
    else **fallback**: fetch schema once — required `video*` → `v2v`, required `image*` → `i2v`, else `t2v`.
  - Unclassifiable → omitted.
- **Price** `parsePrice(model): number` = `Number(price.actual.base_price)` (fallback `origin.base_price`, else 0).
- Returns `{ [mode]: Array<{ id, label, price, outputType }> }`, each mode sorted by price asc.

### 4b. `schema.ts` — per-model input schema
- `getSchema(modelId): Promise<SchemaParam[]>` — fetch the model's `schema` URL (cached per process),
  read `components.schemas.Input.properties` + `required`.
- `SchemaParam = { name, type: 'string'|'integer'|'number'|'boolean'|'array', enum?: (string|number)[],
  default?: unknown, required: boolean, title?: string, description?: string,
  isMedia: boolean, mediaKind?: 'image'|'video' }`.
- Excludes `model` and internal params (`enable_base64_output`).
- Marks media params via `mediaInput.ts`.

### 4c. `mediaInput.ts` — find the primary media param(s)
- `primaryImageParam(params)`: first present of
  `['images','image','image_url','start_image','first_frame','input_image','subject_image']`
  (prefer `required`). `primaryVideoParam(params)`: first of `['video','video_url','input_video']`.
- `isMediaParam(name)` / `mediaKind(name)` used by `schema.ts` to flag media fields.

### 4d. Routes
- `GET /api/models` → the classified catalog (4a).
- `GET /api/schema?model=<id>` → `SchemaParam[]` (4b).
- `POST /api/generate` → body `{ model, outputType, params, imageInputs?: string[], videoInputs?: string[] }`.
  Backend loads the schema, injects `imageInputs`/`videoInputs` into the primary media param
  (array param → set the array; string param → set the first url), merges `{ model, ...params, <media> }`,
  submits to `generateVideo` if `outputType==='video'` else `generateImage`, polls, stores result.
- `GET /api/status/:id`, `POST /api/upload`, `POST /api/extract-frame` — unchanged (Atlas `uploadMedia`).

## 5. Dynamic settings form (`SchemaForm.tsx`)
Given `SchemaParam[]` and current `params`, render one control per **non-media** param:
- `enum` → `<select>`; `boolean` → checkbox; `integer`/`number` → number input (or select if enum);
  `string` named `prompt` → `<textarea>`, else text input.
- Defaults seeded from `default`. Required fields marked. Emits `onChange(name, value)` → updates box `params`.
Media params are rendered by `BoxNode` as upload/preview widgets (chaining auto-fills the primary one).

## 6. Chaining across modes (`runner.ts`)
At run time, for a downstream box, resolve its required input from `mode`:
- **needs image** (`i2i`,`i2v`): upstream image output → its url; upstream video output → extracted last frame
  (via `/api/extract-frame`). Passed as `imageInputs` (piped first, then uploaded refs).
- **needs video** (`v2v`): upstream **must** be a video output → its url as `videoInputs`; an image upstream → fail
  with "Video→Video needs a video input".
- **needs none** (`t2i`,`t2v`): incoming edges ignored.
`generate` is called with `{ model, outputType, params, imageInputs?, videoInputs? }`. Topological order,
failure-skips-descendants, and polling are unchanged.

## 7. Pricing display
- **Per box:** `BoxNode` shows the selected model's price chip (e.g. `$0.096`), from the catalog.
- **Pipeline total:** `Toolbar` sums every box's model price and shows it on the Run button
  (e.g. `Run pipeline · $0.42`), mirroring the screenshot's credit chip.
- Formatting: `$` + up to 3 decimals; `$0` when no model chosen.

## 8. Files affected
- **Backend (new):** `catalog.ts`, `schema.ts`, `mediaInput.ts`.
  **(changed):** `routes.ts` (+`/models`, +`/schema`, generalized `/generate`).
  **(unchanged):** `atlas.ts`, `jobRegistry.ts`, `frameExtractor.ts`, `index.ts`. Remove the old static `models.ts`.
- **Frontend (new):** `SchemaForm.tsx`, `catalog.ts` (fetch + cache hook).
  **(changed):** `BoxNode.tsx` (mode→model→form→price + media widgets), `store.ts`/`types.ts` (new box data),
  `runner.ts` (input resolution by mode), `Toolbar.tsx` (mode-aware "Add box" + pipeline total), `api.ts`.

## 9. Error handling
- Catalog/schema fetch failures → backend returns 502 with a message; UI shows "couldn't load models/settings, retry".
- Unknown model on generate → 400. Missing required input for a mode → the box fails with a clear message.
- Atlas errors (401/402/429) surface verbatim into the box's `error`.

## 10. Testing
- **Backend:** `classify` (id→mode incl. fallback), `parsePrice`, `primaryImageParam`/`primaryVideoParam`,
  `schema` param extraction (mock fetch), `/generate` media injection (mock atlas + schema).
- **Frontend:** `SchemaForm` renders the right control per param type; store mode/model/param updates +
  resetting params on model change; `runner` input-resolution per mode (image vs video vs none);
  pipeline-total sum.

## 11. Scope
**In:** 5 modes, all console-visible image/video models, schema-driven settings, per-box + pipeline pricing.
**Out (now):** LLM/text models; the multi-input `reference-to-video` (audio + many refs) flow — treated as
plain `i2v` with one image; persisting price history; per-second price estimation.

## 12. Success criteria
1. A box can select any of the 5 modes; the model dropdown lists every Atlas model in that mode with its price.
2. Selecting a model renders its real settings (from schema); edits flow through to the Atlas request.
3. Per-box price chip shows; the Run button shows the summed pipeline total.
4. Chaining: an `i2v` box consumes an upstream image (or a video's last frame); a `v2v` box consumes an upstream video.
5. Live generation succeeds for a representative model in each mode.