Job Lifecycle
Every parsing job goes through a defined set of states from creation to completion. Understanding these states helps you build robust integrations.
State Machine
State Descriptions
waiting-file
Only for source_type: "file"
The job has been created and is waiting for you to upload the file using the provided upload_url.
{
"job_id": "job_abc123",
"status": "waiting-file",
"upload_url": "https://storage.knowhereto.ai/...",
"upload_headers": {"Content-Type": "application/pdf"}
}
What to do: Upload your file to upload_url using HTTP PUT.
Timeout: Jobs in this state expire after 1 hour if no file is uploaded.
pending
The job is queued and waiting to be processed.
{
"job_id": "job_abc123",
"status": "pending",
"created_at": "2025-01-15T10:30:00Z"
}
What to do: Wait and poll periodically.
converting
The document is being converted to a processable format. This happens for certain file types that need preprocessing.
{
"job_id": "job_abc123",
"status": "converting",
"created_at": "2025-01-15T10:30:00Z"
}
What to do: Wait and poll periodically.
running
The document is actively being parsed. For multi-page documents, progress information is available.
{
"job_id": "job_abc123",
"status": "running",
"created_at": "2025-01-15T10:30:00Z",
"progress": {
"total_pages": 50,
"processed_pages": 23
}
}
What to do: Monitor progress, continue polling.
done
Processing completed successfully. Results are available for download.
{
"job_id": "job_abc123",
"status": "done",
"created_at": "2025-01-15T10:30:00Z",
"completed_at": "2025-01-15T10:32:45Z",
"result_url": "https://results.knowhereto.ai/...",
"result_url_expires_at": "2025-01-16T10:32:45Z",
"result_checksum": {
"algorithm": "sha256",
"value": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
}
What to do: Download results from result_url.
Note: The result_url is a presigned URL that expires after 1 hour. If expired, call GET /v1/jobs/{job_id} again to get a fresh URL. Result files are retained for 30 days after job completion.
failed
Processing failed due to an error.
{
"job_id": "job_abc123",
"status": "failed",
"created_at": "2025-01-15T10:30:00Z",
"failed_at": "2025-01-15T10:31:15Z",
"error": {
"code": "INVALID_ARGUMENT",
"message": "Unsupported file format: .xyz"
}
}
What to do: Check the error object for details. Some errors are recoverable (retry with different options), others are not.
State Transition Summary
| From State | To State | Trigger |
|---|---|---|
| (new) | waiting-file | Job created with source_type: "file" |
| (new) | pending | Job created with source_type: "url" |
waiting-file | pending | File uploaded successfully |
waiting-file | failed | Upload timeout or invalid file |
pending | converting | Worker starts processing |
pending | running | Worker starts processing (no conversion needed) |
converting | running | Conversion complete |
converting | failed | Conversion error |
running | done | Processing complete |
running | failed | Processing error |
Terminal States
Jobs end in one of two terminal states:
done: Success - results availablefailed: Error - check error details
Once a job reaches a terminal state, it will not change.
Handling States in Code
- Python
- Node.js
def wait_for_job(job_id: str, timeout: int = 300) -> dict:
"""Wait for a job to complete, handling all states."""
import time
start_time = time.time()
poll_interval = 2
while time.time() - start_time < timeout:
job = get_job(job_id)
status = job["status"]
if status == "done":
return job
elif status == "failed":
raise Exception(f"Job failed: {job['error']['message']}")
elif status == "waiting-file":
raise Exception("File not uploaded. Upload file first.")
elif status in ["pending", "converting", "running"]:
# Show progress if available
if "progress" in job:
p = job["progress"]
print(f"Progress: {p['processed_pages']}/{p['total_pages']} pages")
time.sleep(poll_interval)
poll_interval = min(poll_interval + 1, 10) # Backoff
else:
raise Exception(f"Unknown status: {status}")
raise TimeoutError(f"Job did not complete within {timeout} seconds")
async function waitForJob(jobId, timeout = 300000) {
const startTime = Date.now();
let pollInterval = 2000;
while (Date.now() - startTime < timeout) {
const job = await getJob(jobId);
const { status } = job;
if (status === 'done') {
return job;
}
if (status === 'failed') {
throw new Error(`Job failed: ${job.error.message}`);
}
if (status === 'waiting-file') {
throw new Error('File not uploaded. Upload file first.');
}
if (['pending', 'converting', 'running'].includes(status)) {
// Show progress if available
if (job.progress) {
const p = job.progress;
console.log(`Progress: ${p.processed_pages}/${p.total_pages} pages`);
}
await new Promise(r => setTimeout(r, pollInterval));
pollInterval = Math.min(pollInterval + 1000, 10000); // Backoff
} else {
throw new Error(`Unknown status: ${status}`);
}
}
throw new Error(`Job did not complete within ${timeout}ms`);
}
Next Steps
- Result Delivery - Understanding the result format
- Polling Guide - Best practices for polling
- Error Handling - Handling failure states