Vision

Overview

Melian can extract text from images using Google Cloud Vision API. This powers the Remarkable tablet integration (converting handwritten notebook pages to text), document scanning, and any workflow that starts with a photo of text.

Backend

Backend	Use case
Google Cloud Vision	Handwriting, low-quality scans, complex layouts, production accuracy

Tools

Tool	Parameters	Description
`ocr_image`	`url?: string`, `file_path?: string`, `base64?: string`	Extract text from an image by URL, local file path, or base64 data
`remarkable_ocr`	`document: string`, `page: number`	OCR a specific page of a Remarkable document
`remarkable_view`	`document: string`, `page: number`	Return the raw image of a Remarkable document page without OCR

`ocr_image` Parameter Details

Parameter	Type	Required	Notes
`url`	`string`	one of url/file_path/base64	Publicly accessible image URL
`file_path`	`string`	one of url/file_path/base64	Absolute path to a local image file
`base64`	`string`	one of url/file_path/base64	Base64-encoded image data

`remarkable_ocr` / `remarkable_view` Parameter Details

Parameter	Type	Required	Notes
`document`	`string`	yes	Document name or ID as synced via Remarkable Connect
`page`	`number`	no	1-indexed page number (default 1)

Remarkable Integration

Remarkable documents are synced via Remarkable Connect. The vision tools read pages from the local sync directory and pass them to Google Cloud Vision for OCR. This enables workflows like: write notes on the tablet, ask Melian to read them, save to knowledge base.