Skip to main content

Job Lifecycle

Every parsing job goes through a defined set of states from creation to completion. Understanding these states helps you build robust integrations.

State Machine

State Descriptions

waiting-file

Only for source_type: "file"

The job has been created and is waiting for you to upload the file using the provided upload_url.

{
"job_id": "job_abc123",
"status": "waiting-file",
"upload_url": "https://storage.knowhereto.ai/...",
"upload_headers": {"Content-Type": "application/pdf"}
}

What to do: Upload your file to upload_url using HTTP PUT.

Timeout: Jobs in this state expire after 1 hour if no file is uploaded.


pending

The job is queued and waiting to be processed.

{
"job_id": "job_abc123",
"status": "pending",
"created_at": "2025-01-15T10:30:00Z"
}

What to do: Wait and poll periodically.


converting

The document is being converted to a processable format. This happens for certain file types that need preprocessing.

{
"job_id": "job_abc123",
"status": "converting",
"created_at": "2025-01-15T10:30:00Z"
}

What to do: Wait and poll periodically.


running

The document is actively being parsed. For multi-page documents, progress information is available.

{
"job_id": "job_abc123",
"status": "running",
"created_at": "2025-01-15T10:30:00Z",
"progress": {
"total_pages": 50,
"processed_pages": 23
}
}

What to do: Monitor progress, continue polling.


done

Processing completed successfully. Results are available for download.

{
"job_id": "job_abc123",
"status": "done",
"created_at": "2025-01-15T10:30:00Z",
"completed_at": "2025-01-15T10:32:45Z",
"result_url": "https://results.knowhereto.ai/...",
"result_url_expires_at": "2025-01-16T10:32:45Z",
"result_checksum": {
"algorithm": "sha256",
"value": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
}

What to do: Download results from result_url.

Note: The result_url is a presigned URL that expires after 1 hour. If expired, call GET /v1/jobs/{job_id} again to get a fresh URL. Result files are retained for 30 days after job completion.


failed

Processing failed due to an error.

{
"job_id": "job_abc123",
"status": "failed",
"created_at": "2025-01-15T10:30:00Z",
"failed_at": "2025-01-15T10:31:15Z",
"error": {
"code": "INVALID_ARGUMENT",
"message": "Unsupported file format: .xyz"
}
}

What to do: Check the error object for details. Some errors are recoverable (retry with different options), others are not.

State Transition Summary

From StateTo StateTrigger
(new)waiting-fileJob created with source_type: "file"
(new)pendingJob created with source_type: "url"
waiting-filependingFile uploaded successfully
waiting-filefailedUpload timeout or invalid file
pendingconvertingWorker starts processing
pendingrunningWorker starts processing (no conversion needed)
convertingrunningConversion complete
convertingfailedConversion error
runningdoneProcessing complete
runningfailedProcessing error

Terminal States

Jobs end in one of two terminal states:

  • done: Success - results available
  • failed: Error - check error details

Once a job reaches a terminal state, it will not change.

Handling States in Code

def wait_for_job(job_id: str, timeout: int = 300) -> dict:
"""Wait for a job to complete, handling all states."""
import time

start_time = time.time()
poll_interval = 2

while time.time() - start_time < timeout:
job = get_job(job_id)
status = job["status"]

if status == "done":
return job

elif status == "failed":
raise Exception(f"Job failed: {job['error']['message']}")

elif status == "waiting-file":
raise Exception("File not uploaded. Upload file first.")

elif status in ["pending", "converting", "running"]:
# Show progress if available
if "progress" in job:
p = job["progress"]
print(f"Progress: {p['processed_pages']}/{p['total_pages']} pages")

time.sleep(poll_interval)
poll_interval = min(poll_interval + 1, 10) # Backoff

else:
raise Exception(f"Unknown status: {status}")

raise TimeoutError(f"Job did not complete within {timeout} seconds")

Next Steps