Documentation

EigenVertex guide: WikiLLM, Retrieval, and Graph-LLM

This is the practical guide for operators, developers, and product builders. It explains how to choose between WikiLLM and Retrieval, what each of the three strategies does, how the console behaves in each mode, and how to integrate the native or OpenAI-compatible API without inheriting the old hybrid confusion.

What this guide covers

  • When to choose WikiLLM versus Retrieval
  • What Wiki-LLM, Agentic RAG, and Graph-LLM actually mean
  • Why raw sources, wiki pages, and evidence layers must stay separated
  • Which chat options remain in each mode and which ones disappear
  • How to lint and maintain a WikiLLM workspace
  • curl, Python, TypeScript, and OpenAI-compatible examples

Get Started

Recommended first workflow

Start with one clean workspace and a small but representative corpus. The first decision is not “which clever question should I ask?” It is “should this workspace behave like a persistent wiki or like a retrieval engine?” Once that is clear, the rest of the system becomes much easier to operate and explain.

  1. Choose the workspace mode first: WikiLLM for durable memory, Retrieval for broad document recall.
  2. Create one clean workspace for one topic, client, project, or research corpus.
  3. Import a small but representative set of documents before scaling the corpus.
  4. Process documents so EigenVertex extracts text, OCR, transcription, and semantic metadata.
  5. If the workspace is Retrieval, chunk and index the corpus for lexical, vector, and graph search.
  6. If the workspace is WikiLLM, let the backend compile source, topic, entity, and concept pages.
  7. Use Chat only after the workspace has become readable: wiki pages for WikiLLM, indexed evidence for Retrieval.
  8. Run wiki lint and maintain in WikiLLM, or inspect evidence quality in Retrieval, before large-scale usage.
Good first test

Create two workspaces with the same corpus: one in WikiLLM and one in Retrieval. Ask the same question in both. That reveals far more than importing hundreds of documents into one ambiguous environment.

Workspace Modes

Choose the mode before you ingest

A workspace now has a primary mode. This is not a cosmetic setting. It decides how ingestion, query, maintenance, and the console itself should behave.

WikiLLM

Recommended for small to medium curated corpora and long-lived knowledge work. The system reads raw sources once, compiles a persistent markdown wiki, maintains AGENTS.md, index.md, and log.md, then answers from that memory before rereading raw evidence.

Retrieval

Recommended for larger corpora, exploratory search, and source-first recall. The system processes documents into evidence layers, then answers through lexical, vector, and graph-oriented retrieval without pretending that a compiled wiki already exists.

Rule of thumb

If the value comes from cumulative understanding, choose WikiLLM. If the value comes from broad evidence recall across a larger corpus, choose Retrieval.

Product Architecture

The three strategies

EigenVertex should not be described as a single “RAG stack”. It exposes three distinct strategies with different jobs.

Wiki-LLM

Persistent compiled memory. The wiki is the durable artifact. It accumulates useful knowledge over time, receives writeback after good answers, and becomes more valuable as the workspace matures.

Agentic RAG

Evidence-first retrieval and synthesis. The retrieval engine rereads the corpus, finds precise supporting passages, and synthesizes grounded answers when the question depends on raw evidence rather than previously compiled memory.

Graph-LLM

Relational reasoning over the corpus. The graph connects pages, concepts, claims, methods, and tensions so the system can navigate neighborhoods, surface contradictions, and explain why documents belong together.

Rule of thumb

Wiki-LLM answers “what do we know persistently?”, Agentic RAG answers “which passages support this right now?”, and Graph-LLM answers “how do these ideas connect?”.

Ingestion

Supported source types and ingestion flow

EigenVertex is designed for heterogeneous corpora. The same product can ingest written documents, media, expert notes, and generated artifacts while preserving provenance.

PDF research papers and booksDOCX, PPTX and office documentsTXT, Markdown, CSV, XLSX, HTML and URLs with archived snapshotsYouTube URLs with transcript-first ingestion and audio fallback when configuredImages, scans and photo capturesAudio and video with transcriptionInline notes, expert interviews and meeting excerptsGenerated or system-authored artifacts

WikiLLM ingestion

After processing, the backend reads the source, updates durable wiki pages, refreshes index.md, and appends to log.md. It does not auto-chunk or auto-index into Qdrant.

Retrieval ingestion

After processing, the backend builds evidence layers: chunks, lexical indexes, vector indexes, and graph material. This is the right path when recall and source-first search matter more than durable wiki maintenance.

Transcript-first YouTube

For YouTube URLs, EigenVertex now prefers captions and transcripts before falling back to audio transcription, which makes ingestion faster, cheaper, and easier to compile into usable wiki pages.

Large imports

A dry run is still recommended before importing hundreds of documents. URL imports are archived as snapshots when possible, and GitHub gists prefer the raw source view so the resulting document is cleaner and easier to retrieve.

Search & Chat

Chat options only make sense inside the right mode

The console should not present the same controls everywhere. WikiLLM and Retrieval do not have the same knobs.

WikiLLM chat

WikiLLM keeps the path intentionally simple: the system reads the wiki, synthesizes from persistent pages, and can write back durable answers. There is no meaningful choice between vector, hybrid, or graph retrieval in this mode.

Retrieval chat

Retrieval mode is where evidence strategies matter. The current console surface should expose retrieval-oriented choices such as Vector and Graph, not wiki memory shortcuts.

Retrieval strategies

Vector

Default evidence-first retrieval. Use it when you want reliable citations, precise source passages, and predictable latency from indexed chunks.

Graph

Relational navigation and graph-aware synthesis. Use it when you want concept neighborhoods, relationship inspection, or a graph-shaped answer path. Graph-LLM complements, rather than replaces, vector retrieval.

Speed profile

Fast
API usage, product flows, and quick checks. Keeps the path short and grounded. In WikiLLM it means concise wiki synthesis. In Retrieval it means a shorter evidence path with lower orchestration cost.
Balanced
Default human chat and most serious corpus questions. The normal “work carefully” profile. It preserves grounding while allowing a fuller synthesis path and better coverage.
Thorough
Ambiguous, comparative, or high-value research questions. Expands coverage and depth. Use it when latency matters less than reading breadth, comparison quality, and reasoning depth.

Answer behavior

Grounded (`strict` in API)
Audits, compliance, factual checks, and corpus-only answers. The model should stay inside the retrieved or compiled evidence. If the workspace cannot support an answer, it should say so.
RAG (`research` in API)
Serious Q&A, exploration, and synthesis. The answer can derive a fuller synthesis from the evidence, including assumptions, limitations, and follow-up questions, while staying grounded.
Builder (`build` in API)
Code, algorithms, protocols, implementation planning, and constructive outputs. The model may propose concrete artifacts guided by the workspace, clearly separating what is supported from what is derived.
WikiLLM default: Fast · Research · no retrieval knobs Retrieval quick answer: Fast · Research · Vector Retrieval concept map: Balanced · Research · Graph Corpus audit: Balanced · Strict Algorithms/code: Balanced · Build

Native API

Use EigenVertex as an application backend

The native API is the richest integration surface. Use it when you need workspaces, documents, connectors, ingestion status, SSE progress, conversations, wiki maintenance, or retrieval controls.

Create a workspace with an explicit mode

export EVTX_BASE_URL="https://api.eigenvertex.com"
export EVTX_API_KEY="evtx_..."

curl -X POST "$EVTX_BASE_URL/v1/workspaces" \
  -H "Authorization: Bearer $EVTX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Piano Research",
    "slug": "piano-research",
    "visibility": "private",
    "workspace_mode": "wiki_llm"
  }'

WikiLLM chat turn

curl -X POST "$EVTX_BASE_URL/v1/conversations/CONVERSATION_ID/chat-turn" \
  -H "Authorization: Bearer $EVTX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Summarize EigenVertex in five points.",
    "include_sources": true,
    "save_messages": true,
    "query_profile": "fast",
    "answer_mode": "research"
  }'

Retrieval chat turn

curl -X POST "$EVTX_BASE_URL/v1/conversations/CONVERSATION_ID/chat-turn" \
  -H "Authorization: Bearer $EVTX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Quels algorithmes sont robustes pour détecter la f0 du piano ?",
    "top_k": 8,
    "include_sources": true,
    "save_messages": true,
    "query_profile": "balanced",
    "answer_mode": "research",
    "retrieval_strategy": "vector",
    "retrieval_layers": ["vector"]
  }'

Wiki lint and maintain

curl -X POST "$EVTX_BASE_URL/v1/wiki/workspaces/WORKSPACE_ID/lint" \
  -H "Authorization: Bearer $EVTX_API_KEY" \
  -H "Content-Type: application/json"
curl -X POST "$EVTX_BASE_URL/v1/wiki/workspaces/WORKSPACE_ID/maintain" \
  -H "Authorization: Bearer $EVTX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "apply_safe_fixes": true
  }'

Python client example

import os
import requests

base_url = os.environ["EVTX_BASE_URL"]
api_key = os.environ["EVTX_API_KEY"]

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json",
}

payload = {
    "question": "Compare YIN, SWIPE and parametric methods for piano f0 detection.",
    "top_k": 10,
    "include_sources": True,
    "query_profile": "balanced",
    "answer_mode": "research",
    "retrieval_strategy": "vector",
    "retrieval_layers": ["vector"],
}

response = requests.post(
    f"{base_url}/v1/conversations/{os.environ['EVTX_CONVERSATION_ID']}/chat-turn",
    headers=headers,
    json=payload,
    timeout=120,
)
response.raise_for_status()

data = response.json()
print(data["assistant_message"]["content"])
print(data["query_result"]["diagnostics"])

TypeScript client example

type QueryProfile = "fast" | "balanced" | "thorough";
type AnswerMode = "strict" | "research" | "build";
type RetrievalStrategy = "vector" | "graph";

export async function askEigenVertex(params: {
  baseUrl: string;
  apiKey: string;
  conversationId: string;
  question: string;
  profile?: QueryProfile;
  mode?: AnswerMode;
  retrieval?: RetrievalStrategy;
}) {
  const response = await fetch(
    `${params.baseUrl}/v1/conversations/${params.conversationId}/chat-turn`,
    {
      method: "POST",
      headers: {
        Authorization: `Bearer ${params.apiKey}`,
        "Content-Type": "application/json"
      },
      body: JSON.stringify({
        question: params.question,
        top_k: 8,
        include_sources: true,
        save_messages: true,
        query_profile: params.profile ?? "fast",
        answer_mode: params.mode ?? "research",
        retrieval_strategy: params.retrieval ?? "vector",
        retrieval_layers: [params.retrieval ?? "vector"]
      })
    }
  );

  if (!response.ok) {
    throw new Error(await response.text());
  }

  return response.json();
}

OpenAI-compatible API

Consume EigenVertex like a model when that is easier

The OpenAI-compatible facade is useful when an existing application already uses OpenAI SDKs. The request still reaches EigenVertex, but the client can speak the familiar /v1/chat/completions or /v1/responses language.

from openai import OpenAI

client = OpenAI(
    api_key="evtx_...",
    base_url="https://api.eigenvertex.com/v1",
)

response = client.chat.completions.create(
    model="eigenvertex-grounded",
    messages=[
        {"role": "system", "content": "Answer in French with citations when available."},
        {"role": "user", "content": "What does the corpus say about tuning stability?"}
    ],
    extra_body={
        "eigenvertex": {
            "workspace_id": "WORKSPACE_ID",
            "include_sources": True,
            "answer_mode": "research",
            "query_profile": "balanced"
        }
    }
)

print(response.choices[0].message.content)
print(response.model_extra.get("eigenvertex"))
Native API vs OpenAI-compatible API

Use the native API for ingestion, workspace mode control, wiki maintenance, and operational workflows. Use the OpenAI-compatible API when your product wants EigenVertex to look like a chat model with extra grounding options.

Operations

What to monitor and maintain

EigenVertex exposes operational surfaces because large or valuable corpora are never “fire and forget”. The exact maintenance shape depends on the workspace mode.

Ingestion SSE
Watch active documents, current steps, progress percent, and recent workspace updates.
WikiLLM operations
Run lint to diagnose the wiki and maintain to apply safe repairs. This is where AGENTS.md, index.md, log.md, provenance, cross-links, and contradiction notes are kept healthy.
Retrieval evidence quality
Inspect chunking, index readiness, and citation quality. Retrieval workspaces depend on the evidence layer staying readable and trustworthy.
Deletion and cleanup
Workspace and document deletion must clean database rows, object storage, wiki pages, and retrieval artifacts according to the workspace mode.
Authors and provenance
When available, authors and provenance should be preserved and displayed. This matters especially for scientific corpora where the value of a claim depends on who made it and where.

Bootstrap

Every WikiLLM workspace starts with AGENTS.md, index.md, and log.md. They define the contract, the catalog, and the chronological journal.

Ingest

Each processed source creates or updates durable pages such as source, topic, entity, and concept pages. The system refreshes index.md and appends to log.md.

Query writeback

Good durable answers can be filed back into the wiki as question or analysis pages instead of disappearing into chat history.

Lint and maintain

The wiki can be health-checked and then repaired through safe maintenance actions such as restoring missing sections, provenance, cross-links, and contradiction notes.

Deployment Models

Cloud EigenVertex now, on-prem path next

The production posture is to keep the backend stateless and keep Postgres, Qdrant, and S3-compatible storage as separate services. This keeps cloud and self-hosted deployment cleaner and makes the split between WikiLLM and Retrieval easier to operate.

Cloud EigenVertex

EigenVertex operates the backend and service dependencies. Teams consume the console and API through API keys, CORS configuration, and provider-level controls.

On-premises / self-host

The client deploys the backend on its own infrastructure and connects it to its own Postgres, Qdrant, and object storage. WikiLLM can stay Postgres-centered, while Retrieval keeps its larger evidence stack.

APP_ENV=production
APP_REQUIRE_API_KEY=true
APP_CORS_ALLOW_ORIGINS=https://labs.eigenvertex.com

DATABASE_URL=postgresql+psycopg://eigenvertex:***@postgres:5432/eigenvertex
QDRANT_URL=http://qdrant:6333
S3_ENDPOINT_URL=http://minio:9000
S3_BUCKET=eigenvertex-documents

OPENAI_API_KEY=...
GEMINI_API_KEY=...
MISTRAL_API_KEY=...
LLAMA_API_KEY=...
Design principle

Do not bake Postgres, Qdrant, or object storage into the backend image. They are persistent data services with their own backup, upgrade, and security lifecycle.