SDKs
The Knowhere SDKs provide high-level, type-safe clients for the Knowhere document parsing API. They handle the full parsing workflow — job creation, file upload, polling, download, and result extraction — in a single method call.
Installation
- Python
- Node.js
Quick Start
Create an API Key
Before you begin, create an API key in the dashboard. Store the key securely and export it as an environment variable:
- Python
- Node.js
export KNOWHERE_API_KEY="sk_..."
export KNOWHERE_API_KEY="sk_..."
The SDKs automatically read your API key from the environment.
Parse a Document
- Python
- Node.js
import knowhere
client = knowhere.Knowhere()
result = client.parse(url="https://example.com/report.pdf")
print(result.statistics) # chunk counts, page count
print(result.full_markdown) # complete markdown output
import Knowhere from "@knowhere-ai/sdk";
const client = new Knowhere();
const result = await client.parse({ url: "https://example.com/report.pdf" });
console.log(result.statistics); // chunk counts, page count
console.log(result.fullMarkdown); // complete markdown output
Parse a Local File
- Python
- Node.js
from pathlib import Path
result = client.parse(file=Path("quarterly-report.pdf"))
for chunk in result.text_chunks:
print(chunk.content[:200])
import { readFileSync } from "fs";
const result = await client.parse({
file: readFileSync("quarterly-report.pdf"),
fileName: "quarterly-report.pdf",
});
for (const chunk of result.textChunks) {
console.log(chunk.content.slice(0, 200));
}
Async Support
- Python
- Node.js
Every method has an async counterpart on AsyncKnowhere:
import asyncio
import knowhere
async def main():
async with knowhere.AsyncKnowhere() as client:
result = await client.parse(url="https://example.com/report.pdf")
print(result.full_markdown)
asyncio.run(main())
All methods in the Node.js SDK are async by default:
import Knowhere from "@knowhere-ai/sdk";
const client = new Knowhere();
const result = await client.parse({ url: "https://example.com/report.pdf" });
console.log(result.fullMarkdown);
Capabilities
One-Call Parsing
client.parse() orchestrates the full workflow in one call:
- Create a parsing job
- Upload the file (if local)
- Poll until the job completes
- Download and parse the results
- Python
- Node.js
# From URL — no upload step needed
result = client.parse(url="https://example.com/doc.pdf")
# From file — upload handled automatically
result = client.parse(
file=Path("report.pdf"),
parsing_params={"model": "advanced", "ocr_enabled": True},
)
// From URL — no upload step needed
const result = await client.parse({ url: "https://example.com/doc.pdf" });
// From file — upload handled automatically
const result = await client.parse({
file: readFileSync("report.pdf"),
fileName: "report.pdf",
parsingParams: { model: "advanced", ocrEnabled: true },
});
Rich Result Types
The ParseResult object provides typed access to all extracted content:
- Python
- Node.js
result = client.parse(url="https://example.com/report.pdf")
# Full markdown output
print(result.full_markdown)
# Typed chunk access
for chunk in result.text_chunks:
print(chunk.content)
for img in result.image_chunks:
img.save("./output/images") # save image bytes to disk
for tbl in result.table_chunks:
print(tbl.html) # raw HTML table
tbl.save("./output/tables") # save as .html file
# Statistics
stats = result.statistics
print(f"{stats.total_chunks} chunks, {stats.total_pages} pages")
# Save everything at once
result.save("./output") # creates full.md, images/, tables/, result.zip
const result = await client.parse({ url: "https://example.com/report.pdf" });
// Full markdown output
console.log(result.fullMarkdown);
// Typed chunk access
for (const chunk of result.textChunks) {
console.log(chunk.content);
}
for (const img of result.imageChunks) {
await img.save("./output/images"); // save image bytes to disk
}
for (const tbl of result.tableChunks) {
console.log(tbl.html); // raw HTML table
await tbl.save("./output/tables"); // save as .html file
}
// Statistics
const stats = result.statistics;
console.log(`${stats.totalChunks} chunks, ${stats.totalPages} pages`);
// Save everything at once
await result.save("./output"); // creates full.md, images/, tables/, result.zip
Step-by-Step Control
For advanced use cases, use the jobs namespace to control each step individually:
- Python
- Node.js
# 1. Create job
job = client.jobs.create(source_type="url", source_url="https://example.com/doc.pdf")
# 2. Poll for completion
job_result = client.jobs.wait(
job.job_id,
poll_interval=5.0,
on_progress=lambda jr: print(f"Status: {jr.status}"),
)
# 3. Download and parse result
result = client.jobs.load(job_result)
For file uploads:
# 1. Create job with file source
job = client.jobs.create(source_type="file", file_name="report.pdf")
# 2. Upload the file
client.jobs.upload(job, Path("report.pdf"))
# 3. Poll + download
job_result = client.jobs.wait(job.job_id)
result = client.jobs.load(job_result)
// 1. Create job
const job = await client.jobs.create({
sourceType: "url",
sourceUrl: "https://example.com/doc.pdf",
});
// 2. Poll for completion
const jobResult = await client.jobs.wait(job.jobId, {
pollInterval: 5000,
onProgress: (jr) => console.log(`Status: ${jr.status}`),
});
// 3. Download and parse result
const result = await client.jobs.load(jobResult.jobId);
For file uploads:
// 1. Create job with file source
const job = await client.jobs.create({
sourceType: "file",
fileName: "report.pdf",
});
// 2. Upload the file
await client.jobs.upload(job.jobId, { file: readFileSync("report.pdf") });
// 3. Poll + download
const jobResult = await client.jobs.wait(job.jobId);
const result = await client.jobs.load(jobResult.jobId);
Configuration
Environment Variables
| Variable | Description |
|---|---|
KNOWHERE_API_KEY | API key (used when apiKey is not passed) |
KNOWHERE_BASE_URL | Override API base URL |
KNOWHERE_LOG_LEVEL | SDK logging verbosity (DEBUG, INFO, WARNING) |
Client Options
- Python
- Node.js
client = knowhere.Knowhere(
api_key="sk_...", # or set KNOWHERE_API_KEY
base_url="https://api.knowhereto.ai", # default
timeout=60.0, # HTTP timeout (seconds)
upload_timeout=600.0, # file upload timeout
max_retries=5, # retries for retryable errors
default_headers={"X-Custom": "value"}, # extra headers
)
const client = new Knowhere({
apiKey: "sk_...", // or set KNOWHERE_API_KEY
baseUrl: "https://api.knowhereto.ai", // default
timeout: 60_000, // HTTP timeout (ms)
uploadTimeout: 600_000, // file upload timeout (ms)
maxRetries: 5, // retries for retryable errors
defaultHeaders: { "X-Custom": "value" }, // extra headers
});
Error Handling
The SDKs raise specific exception types for different error conditions:
- Python
- Node.js
from knowhere import Knowhere
from knowhere._exceptions import (
AuthenticationError,
RateLimitError,
JobFailedError,
PollingTimeoutError,
)
client = Knowhere()
try:
result = client.parse(url="https://example.com/doc.pdf")
except AuthenticationError:
print("Invalid API key")
except RateLimitError as e:
print(f"Rate limited — retry after {e.retry_after}s")
except JobFailedError as e:
print(f"Job failed: [{e.code}] {e.message}")
except PollingTimeoutError as e:
print(f"Polling timed out for job {e.job_id} after {e.elapsed:.0f}s")
import Knowhere from "@knowhere-ai/sdk";
import {
AuthenticationError,
RateLimitError,
JobFailedError,
PollingTimeoutError,
} from "knowhere/errors";
const client = new Knowhere();
try {
const result = await client.parse({ url: "https://example.com/doc.pdf" });
} catch (error) {
if (error instanceof AuthenticationError) {
console.error("Invalid API key");
} else if (error instanceof RateLimitError) {
console.error(`Rate limited — retry after ${error.retryAfter}s`);
} else if (error instanceof JobFailedError) {
console.error(`Job failed: [${error.code}] ${error.message}`);
} else if (error instanceof PollingTimeoutError) {
console.error(`Polling timed out for job ${error.jobId}`);
}
}
Retryable errors (429 with retry_after, 502, 503, 504) are automatically retried with exponential backoff.
→ Full error hierarchy and retry semantics: Error Handling Guide
SDK References
- Python SDK Reference — Complete API reference with all classes, methods, and types
- Node.js SDK Reference — Coming soon