SDKs

The Knowhere SDKs provide high-level, type-safe clients for the Knowhere document parsing API. They handle the full parsing workflow — job creation, file upload, polling, download, and result extraction — in a single method call.

Installation

Python
Node.js

pip install knowhere-python-sdk

Or with uv:

uv add knowhere-python-sdk

Requirements

Python 3.9+ · httpx · pydantic v2

npm install @knowhere-ai/sdk

Or with pnpm:

pnpm add @knowhere-ai/sdk

Requirements

Node.js 18+ · TypeScript 5.0+ (optional)

Quick Start

Create an API Key

Before you begin, create an API key in the dashboard. Store the key securely and export it as an environment variable:

Python
Node.js

export KNOWHERE_API_KEY="sk_..."

export KNOWHERE_API_KEY="sk_..."

The SDKs automatically read your API key from the environment.

Parse a Document

Python
Node.js

import knowhere

client = knowhere.Knowhere()

result = client.parse(url="https://example.com/report.pdf")

print(result.statistics)        # chunk counts, page count
print(result.full_markdown)     # complete markdown output

import Knowhere from "@knowhere-ai/sdk";

const client = new Knowhere();

const result = await client.parse({ url: "https://example.com/report.pdf" });

console.log(result.statistics);     // chunk counts, page count
console.log(result.fullMarkdown);   // complete markdown output

Parse a Local File

Python
Node.js

from pathlib import Path

result = client.parse(file=Path("quarterly-report.pdf"))

for chunk in result.text_chunks:
    print(chunk.content[:200])

import { readFileSync } from "fs";

const result = await client.parse({
  file: readFileSync("quarterly-report.pdf"),
  fileName: "quarterly-report.pdf",
});

for (const chunk of result.textChunks) {
  console.log(chunk.content.slice(0, 200));
}

Async Support

Python
Node.js

Every method has an async counterpart on AsyncKnowhere:

import asyncio
import knowhere

async def main():
    async with knowhere.AsyncKnowhere() as client:
        result = await client.parse(url="https://example.com/report.pdf")
        print(result.full_markdown)

asyncio.run(main())

All methods in the Node.js SDK are async by default:

import Knowhere from "@knowhere-ai/sdk";

const client = new Knowhere();
const result = await client.parse({ url: "https://example.com/report.pdf" });
console.log(result.fullMarkdown);

Capabilities

One-Call Parsing

client.parse() orchestrates the full workflow in one call:

Create a parsing job
Upload the file (if local)
Poll until the job completes
Download and parse the results

Python
Node.js

# From URL — no upload step needed
result = client.parse(url="https://example.com/doc.pdf")

# From file — upload handled automatically
result = client.parse(
    file=Path("report.pdf"),
    parsing_params={"model": "advanced", "ocr_enabled": True},
)

// From URL — no upload step needed
const result = await client.parse({ url: "https://example.com/doc.pdf" });

// From file — upload handled automatically
const result = await client.parse({
  file: readFileSync("report.pdf"),
  fileName: "report.pdf",
  parsingParams: { model: "advanced", ocrEnabled: true },
});

Rich Result Types

The ParseResult object provides typed access to all extracted content:

Python
Node.js

result = client.parse(url="https://example.com/report.pdf")

# Full markdown output
print(result.full_markdown)

# Typed chunk access
for chunk in result.text_chunks:
    print(chunk.content)

for img in result.image_chunks:
    img.save("./output/images")     # save image bytes to disk

for tbl in result.table_chunks:
    print(tbl.html)                 # raw HTML table
    tbl.save("./output/tables")     # save as .html file

# Statistics
stats = result.statistics
print(f"{stats.total_chunks} chunks, {stats.total_pages} pages")

# Save everything at once
result.save("./output")  # creates full.md, images/, tables/, result.zip

const result = await client.parse({ url: "https://example.com/report.pdf" });

// Full markdown output
console.log(result.fullMarkdown);

// Typed chunk access
for (const chunk of result.textChunks) {
  console.log(chunk.content);
}

for (const img of result.imageChunks) {
  await img.save("./output/images");     // save image bytes to disk
}

for (const tbl of result.tableChunks) {
  console.log(tbl.html);                 // raw HTML table
  await tbl.save("./output/tables");     // save as .html file
}

// Statistics
const stats = result.statistics;
console.log(`${stats.totalChunks} chunks, ${stats.totalPages} pages`);

// Save everything at once
await result.save("./output");  // creates full.md, images/, tables/, result.zip

Step-by-Step Control

For advanced use cases, use the jobs namespace to control each step individually:

Python
Node.js

# 1. Create job
job = client.jobs.create(source_type="url", source_url="https://example.com/doc.pdf")

# 2. Poll for completion
job_result = client.jobs.wait(
    job.job_id,
    poll_interval=5.0,
    on_progress=lambda jr: print(f"Status: {jr.status}"),
)

# 3. Download and parse result
result = client.jobs.load(job_result)

For file uploads:

# 1. Create job with file source
job = client.jobs.create(source_type="file", file_name="report.pdf")

# 2. Upload the file
client.jobs.upload(job, Path("report.pdf"))

# 3. Poll + download
job_result = client.jobs.wait(job.job_id)
result = client.jobs.load(job_result)

// 1. Create job
const job = await client.jobs.create({
  sourceType: "url",
  sourceUrl: "https://example.com/doc.pdf",
});

// 2. Poll for completion
const jobResult = await client.jobs.wait(job.jobId, {
  pollInterval: 5000,
  onProgress: (jr) => console.log(`Status: ${jr.status}`),
});

// 3. Download and parse result
const result = await client.jobs.load(jobResult.jobId);

For file uploads:

// 1. Create job with file source
const job = await client.jobs.create({
  sourceType: "file",
  fileName: "report.pdf",
});

// 2. Upload the file
await client.jobs.upload(job.jobId, { file: readFileSync("report.pdf") });

// 3. Poll + download
const jobResult = await client.jobs.wait(job.jobId);
const result = await client.jobs.load(jobResult.jobId);

Configuration

Environment Variables

Variable	Description
`KNOWHERE_API_KEY`	API key (used when `apiKey` is not passed)
`KNOWHERE_BASE_URL`	Override API base URL
`KNOWHERE_LOG_LEVEL`	SDK logging verbosity (`DEBUG`, `INFO`, `WARNING`)

Client Options

Python
Node.js

client = knowhere.Knowhere(
    api_key="sk_...",                            # or set KNOWHERE_API_KEY
    base_url="https://api.knowhereto.ai",        # default
    timeout=60.0,                                # HTTP timeout (seconds)
    upload_timeout=600.0,                        # file upload timeout
    max_retries=5,                               # retries for retryable errors
    default_headers={"X-Custom": "value"},       # extra headers
)

const client = new Knowhere({
  apiKey: "sk_...",                              // or set KNOWHERE_API_KEY
  baseUrl: "https://api.knowhereto.ai",          // default
  timeout: 60_000,                               // HTTP timeout (ms)
  uploadTimeout: 600_000,                        // file upload timeout (ms)
  maxRetries: 5,                                 // retries for retryable errors
  defaultHeaders: { "X-Custom": "value" },       // extra headers
});

Error Handling

The SDKs raise specific exception types for different error conditions:

Python
Node.js

from knowhere import Knowhere
from knowhere._exceptions import (
    AuthenticationError,
    RateLimitError,
    JobFailedError,
    PollingTimeoutError,
)

client = Knowhere()

try:
    result = client.parse(url="https://example.com/doc.pdf")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError as e:
    print(f"Rate limited — retry after {e.retry_after}s")
except JobFailedError as e:
    print(f"Job failed: [{e.code}] {e.message}")
except PollingTimeoutError as e:
    print(f"Polling timed out for job {e.job_id} after {e.elapsed:.0f}s")

import Knowhere from "@knowhere-ai/sdk";
import {
  AuthenticationError,
  RateLimitError,
  JobFailedError,
  PollingTimeoutError,
} from "knowhere/errors";

const client = new Knowhere();

try {
  const result = await client.parse({ url: "https://example.com/doc.pdf" });
} catch (error) {
  if (error instanceof AuthenticationError) {
    console.error("Invalid API key");
  } else if (error instanceof RateLimitError) {
    console.error(`Rate limited — retry after ${error.retryAfter}s`);
  } else if (error instanceof JobFailedError) {
    console.error(`Job failed: [${error.code}] ${error.message}`);
  } else if (error instanceof PollingTimeoutError) {
    console.error(`Polling timed out for job ${error.jobId}`);
  }
}

Retryable errors (429 with retry_after, 502, 503, 504) are automatically retried with exponential backoff.

→ Full error hierarchy and retry semantics: Error Handling Guide

SDK References

Python SDK Reference — Complete API reference with all classes, methods, and types
Node.js SDK Reference — Coming soon

Installation​

Quick Start​

Create an API Key​

Parse a Document​

Parse a Local File​

Async Support​

Capabilities​

One-Call Parsing​

Rich Result Types​

Step-by-Step Control​

Configuration​

Environment Variables​

Client Options​

Error Handling​

SDK References​