Omnigraph
CLI

embed

Generate, refresh, or clean embeddings in a JSONL file.

omnigraph embed is an offline JSONL pipeline that fills, refreshes, or strips embedding columns in a .jsonl data file. It operates on files. Not on a live graph. So you can prepare data for load / ingest without round-tripping through the engine.

The compiler-side embedder (text-embedding-3-small by default) generates query-time normalized vectors. The engine-side embedder (gemini-embedding-2-preview) runs automatically at write time for @embed-annotated vector columns; you only need omnigraph embed when you want to prebake embeddings outside the engine.

Modes

The three modes are mutually exclusive:

FlagBehavior
(default) fill_missingOnly embed rows whose target field is empty
--reembed-allRecompute every row's embedding, overwriting existing values
--cleanStrip embedding columns from every row

Usage

Two input shapes are supported. They are mutually exclusive:

# Driven by a seed manifest YAML (defines sources, artifacts, and the embed spec inline)
omnigraph embed --seed ./seed.yaml [--reembed-all|--clean]

# Direct: explicit input / output JSONL plus a standalone embed spec JSON file
omnigraph embed --input data.jsonl --output data.embedded.jsonl --spec embed-spec.json

--seed is incompatible with --input / --output / --spec; pick one input shape per invocation.

Options

OptionRequiredDescription
--inputwith --output + --specSource JSONL file
--outputwith --input + --specDestination JSONL file
--specwith --input + --outputPath to an embed-spec JSON file (see below)
--seedalternative to the trio abovePath to a seed manifest YAML describing inputs and embed specs inline
--typenoRepeatable. Embed only rows of the given node / edge type.
--selectnoRepeatable filter, T:field=value or field=value.
--reembed-allnoOverwrite existing embeddings
--cleannoStrip embedding columns instead of writing new ones

Embed-spec JSON shape

The spec file describes the embedding model, dimension, and the mapping of each entity type to its target embedding column and the source fields whose text gets embedded:

{
  "model": "gemini-embedding-2-preview",
  "dimension": 1536,
  "types": {
    "Document": { "target": "embedding", "fields": ["body"] },
    "Person":   { "target": "embedding", "fields": ["bio"] }
  }
}

model is optional and defaults to gemini-embedding-2-preview. The same shape can live under embeddings: inside a seed manifest YAML.

Environment

The compiler-side embedder reads the following environment variables:

VarDefaultPurpose
NANOGRAPH_EMBED_MODELtext-embedding-3-smallModel identifier
OPENAI_API_KEYAPI credentials
OPENAI_BASE_URLhttps://api.openai.com/v1API endpoint
NANOGRAPH_EMBED_TIMEOUT_MS30000Per-request timeout
NANOGRAPH_EMBED_RETRY_ATTEMPTS4Retry budget for transient errors
NANOGRAPH_EMBEDDINGS_MOCKunsetDeterministic mock embedder for tests

Example

Bake embeddings into a seed file before initial load:

omnigraph embed --input ./seed.jsonl \
    --output ./seed.embedded.jsonl \
    --spec ./embed-spec.json \
    --type Document

omnigraph init --schema ./schema.pg ./graph.omni
omnigraph load ./graph.omni --data ./seed.embedded.jsonl --mode overwrite

Refresh every existing embedding (e.g. after switching models):

omnigraph embed --input ./data.jsonl \
    --output ./data.embedded.jsonl \
    --spec ./embed-spec.json \
    --reembed-all

On this page