Knowledge
Overview
The knowledge base lets Melian ingest documents (PDFs, images, plain text, web clippings), chunk them for vector storage, and search them semantically. Combined with OCR capabilities, physical documents scanned on a Remarkable tablet or photographed on a phone become searchable knowledge. All embeddings are stored in Qdrant; metadata lives in SQLite.
Interfaces
export interface KnowledgeDocument {
id: string;
title: string;
source_type: "upload" | "paste" | "chat" | "url";
mime_type: string;
category: "reference" | "note" | "clipping";
user_tags: string[];
auto_tags: string[]; // generated by LLM at ingest time
chunk_count: number;
}
export interface KnowledgeSearchResult {
document_id: string;
document_title: string;
chunk_text: string;
chunk_index: number;
score: number; // cosine similarity 0–1
tags: string[];
}Ingest Pipeline
Input (file / paste / URL)
└── extract text
├── PDF → pdfparse
├── image → OCR (Google Cloud Vision)
└── text/HTML → strip + normalize
└── auto-tag via LLM (category + subject tags)
└── chunk (2000 chars, 200 char overlap)
└── embed (text-embedding-3-small)
└── store chunks in Qdrant + metadata in SQLiteChunking uses a 2000-character window with a 200-character overlap so semantic context is preserved at chunk boundaries.
Tools
| Tool | Parameters | Description |
|---|---|---|
knowledge_save |
text: string, title?: string, tags?: string[] |
Ingest text directly into the knowledge base |
knowledge_search |
query: string, category?: string, tags?: string[], limit?: number |
Semantic search returning ranked chunks |
knowledge_list |
category?: string, tags?: string[], limit?: number |
List stored documents with metadata |
knowledge_delete |
document_id: string |
Delete a document and all its chunks |
API Endpoints
| Method | Path | Description |
|---|---|---|
GET |
/knowledge |
List all documents |
POST |
/knowledge |
Ingest text as a new document |
POST |
/knowledge/upload |
Upload a file (PDF, image, txt) for ingestion |
POST |
/knowledge/search |
Semantic search (JSON body: { query, category?, tags?, limit? }) |
DELETE |
/knowledge/:id |
Delete a document and its vectors |