Melian

Knowledge

Overview

The knowledge base lets Melian ingest documents (PDFs, images, plain text, web clippings), chunk them for vector storage, and search them semantically. Combined with OCR capabilities, physical documents scanned on a Remarkable tablet or photographed on a phone become searchable knowledge. All embeddings are stored in Qdrant; metadata lives in SQLite.

Interfaces

export interface KnowledgeDocument {
  id: string;
  title: string;
  source_type: "upload" | "paste" | "chat" | "url";
  mime_type: string;
  category: "reference" | "note" | "clipping";
  user_tags: string[];
  auto_tags: string[];   // generated by LLM at ingest time
  chunk_count: number;
}

export interface KnowledgeSearchResult {
  document_id: string;
  document_title: string;
  chunk_text: string;
  chunk_index: number;
  score: number;         // cosine similarity 0–1
  tags: string[];
}

Ingest Pipeline

Input (file / paste / URL)
  └── extract text
        ├── PDF → pdfparse
        ├── image → OCR (Google Cloud Vision)
        └── text/HTML → strip + normalize
  └── auto-tag via LLM (category + subject tags)
  └── chunk (2000 chars, 200 char overlap)
  └── embed (text-embedding-3-small)
  └── store chunks in Qdrant + metadata in SQLite

Chunking uses a 2000-character window with a 200-character overlap so semantic context is preserved at chunk boundaries.

Tools

Tool Parameters Description
knowledge_save text: string, title?: string, tags?: string[] Ingest text directly into the knowledge base
knowledge_search query: string, category?: string, tags?: string[], limit?: number Semantic search returning ranked chunks
knowledge_list category?: string, tags?: string[], limit?: number List stored documents with metadata
knowledge_delete document_id: string Delete a document and all its chunks

API Endpoints

Method Path Description
GET /knowledge List all documents
POST /knowledge Ingest text as a new document
POST /knowledge/upload Upload a file (PDF, image, txt) for ingestion
POST /knowledge/search Semantic search (JSON body: { query, category?, tags?, limit? })
DELETE /knowledge/:id Delete a document and its vectors