Skip to main content

Asynchronous Processing Model

Knowhere API uses an asynchronous processing model for document parsing. This design ensures reliability, scalability, and a better developer experience when handling documents of varying sizes and complexity.

Why Asynchronous?

Document parsing can take anywhere from a few seconds to several minutes depending on:

  • Document size (pages, file size)
  • Document complexity (tables, images, formatting)
  • Selected processing options (OCR, model type)

An asynchronous model allows you to:

  1. Submit and forget: Start processing without waiting
  2. Handle long jobs: Process large documents without timeout issues
  3. Scale efficiently: Submit multiple jobs in parallel
  4. Build resilient apps: Retry and recover from transient failures

The Job Model

Every document you submit creates a Job. A job represents a single parsing task and tracks its progress through the system.

{
"job_id": "job_abc123def456",
"status": "running",
"source_type": "file",
"created_at": "2025-01-15T10:30:00Z",
"progress": {
"total_pages": 50,
"processed_pages": 23
}
}

Workflow Overview

1. Submit a Job

POST /v1/jobs

You receive a job_id immediately. The actual processing happens in the background.

2. Track Progress

GET /v1/jobs/{job_id}

Poll this endpoint to check the job's status. For large documents, the response includes progress information.

3. Retrieve Results

When status becomes done, the response includes a result_url to download your results.

Request-Response Flow

Best Practices

Implement Exponential Backoff

Don't poll too frequently. Use increasing delays between requests:

import time

delays = [2, 5, 10, 10, 10] # seconds

for i, delay in enumerate(delays):
job = get_job_status(job_id)

if job["status"] in ["done", "failed"]:
break

time.sleep(delay)

Set Reasonable Timeouts

For most documents, processing completes within:

Document TypeTypical Time
Small PDF (1-10 pages)10-30 seconds
Medium PDF (10-50 pages)30-120 seconds
Large PDF (50+ pages)2-5 minutes

Handle All Terminal States

Jobs can end in two states:

  • done: Success, results available
  • failed: Error occurred, check error field

Always handle both cases in your code.

Next Steps