Skip to main content

SDKs

The Knowhere SDKs provide high-level, type-safe clients for the Knowhere document parsing API. They handle the full parsing workflow — job creation, file upload, polling, download, and result extraction — in a single method call.

Installation

pip install knowhere-python-sdk

Or with uv:

uv add knowhere-python-sdk
Requirements

Python 3.9+ · httpx · pydantic v2

Quick Start

Create an API Key

Before you begin, create an API key in the dashboard. Store the key securely and export it as an environment variable:

export KNOWHERE_API_KEY="sk_..."

The SDKs automatically read your API key from the environment.

Parse a Document

import knowhere

client = knowhere.Knowhere()

result = client.parse(url="https://example.com/report.pdf")

print(result.statistics) # chunk counts, page count
print(result.full_markdown) # complete markdown output

Parse a Local File

from pathlib import Path

result = client.parse(file=Path("quarterly-report.pdf"))

for chunk in result.text_chunks:
print(chunk.content[:200])

Async Support

Every method has an async counterpart on AsyncKnowhere:

import asyncio
import knowhere

async def main():
async with knowhere.AsyncKnowhere() as client:
result = await client.parse(url="https://example.com/report.pdf")
print(result.full_markdown)

asyncio.run(main())

Capabilities

One-Call Parsing

client.parse() orchestrates the full workflow in one call:

  1. Create a parsing job
  2. Upload the file (if local)
  3. Poll until the job completes
  4. Download and parse the results
# From URL — no upload step needed
result = client.parse(url="https://example.com/doc.pdf")

# From file — upload handled automatically
result = client.parse(
file=Path("report.pdf"),
parsing_params={"model": "advanced", "ocr_enabled": True},
)

Rich Result Types

The ParseResult object provides typed access to all extracted content:

result = client.parse(url="https://example.com/report.pdf")

# Full markdown output
print(result.full_markdown)

# Typed chunk access
for chunk in result.text_chunks:
print(chunk.content)

for img in result.image_chunks:
img.save("./output/images") # save image bytes to disk

for tbl in result.table_chunks:
print(tbl.html) # raw HTML table
tbl.save("./output/tables") # save as .html file

# Statistics
stats = result.statistics
print(f"{stats.total_chunks} chunks, {stats.total_pages} pages")

# Save everything at once
result.save("./output") # creates full.md, images/, tables/, result.zip

Step-by-Step Control

For advanced use cases, use the jobs namespace to control each step individually:

# 1. Create job
job = client.jobs.create(source_type="url", source_url="https://example.com/doc.pdf")

# 2. Poll for completion
job_result = client.jobs.wait(
job.job_id,
poll_interval=5.0,
on_progress=lambda jr: print(f"Status: {jr.status}"),
)

# 3. Download and parse result
result = client.jobs.load(job_result)

For file uploads:

# 1. Create job with file source
job = client.jobs.create(source_type="file", file_name="report.pdf")

# 2. Upload the file
client.jobs.upload(job, Path("report.pdf"))

# 3. Poll + download
job_result = client.jobs.wait(job.job_id)
result = client.jobs.load(job_result)

Configuration

Environment Variables

VariableDescription
KNOWHERE_API_KEYAPI key (used when apiKey is not passed)
KNOWHERE_BASE_URLOverride API base URL
KNOWHERE_LOG_LEVELSDK logging verbosity (DEBUG, INFO, WARNING)

Client Options

client = knowhere.Knowhere(
api_key="sk_...", # or set KNOWHERE_API_KEY
base_url="https://api.knowhereto.ai", # default
timeout=60.0, # HTTP timeout (seconds)
upload_timeout=600.0, # file upload timeout
max_retries=5, # retries for retryable errors
default_headers={"X-Custom": "value"}, # extra headers
)

Error Handling

The SDKs raise specific exception types for different error conditions:

from knowhere import Knowhere
from knowhere._exceptions import (
AuthenticationError,
RateLimitError,
JobFailedError,
PollingTimeoutError,
)

client = Knowhere()

try:
result = client.parse(url="https://example.com/doc.pdf")
except AuthenticationError:
print("Invalid API key")
except RateLimitError as e:
print(f"Rate limited — retry after {e.retry_after}s")
except JobFailedError as e:
print(f"Job failed: [{e.code}] {e.message}")
except PollingTimeoutError as e:
print(f"Polling timed out for job {e.job_id} after {e.elapsed:.0f}s")

Retryable errors (429 with retry_after, 502, 503, 504) are automatically retried with exponential backoff.

→ Full error hierarchy and retry semantics: Error Handling Guide

SDK References

  • Python SDK Reference — Complete API reference with all classes, methods, and types
  • Node.js SDK Reference — Coming soon