Skip to main content

Get Job

Retrieve the status and results of a parsing job.

GET /v1/jobs/{job_id}

Path Parameters

ParameterTypeDescription
job_idstringThe unique identifier of the job

Response

The response structure varies based on the job's current status.

Response Headers

HTTP/1.1 200 OK
Content-Type: application/json
RateLimit-Limit: 60
RateLimit-Remaining: 59
RateLimit-Reset: 1672531200

Status: waiting-file

Waiting for file upload (only for source_type: "file").

{
"job_id": "job_abc123",
"status": "waiting-file",
"source_type": "file",
"data_id": "my_document",
"upload_url": "https://storage.knowhereto.ai/...",
"upload_headers": {
"Content-Type": "application/pdf"
},
"created_at": "2025-01-15T10:30:00Z"
}

Status: pending

Job is queued for processing.

{
"job_id": "job_abc123",
"status": "pending",
"source_type": "url",
"data_id": "my_document",
"created_at": "2025-01-15T10:30:00Z"
}

Status: converting

Document format conversion in progress.

{
"job_id": "job_abc123",
"status": "converting",
"source_type": "file",
"data_id": "my_document",
"created_at": "2025-01-15T10:30:00Z"
}

Status: running

Document parsing in progress. Includes progress information for multi-page documents.

{
"job_id": "job_abc123",
"status": "running",
"source_type": "file",
"data_id": "my_document",
"created_at": "2025-01-15T10:30:00Z",
"progress": {
"total_pages": 100,
"processed_pages": 42
}
}

Status: done

Processing completed successfully.

{
"job_id": "job_abc123",
"status": "done",
"source_type": "file",
"data_id": "my_document",
"created_at": "2025-01-15T10:30:00Z",
"completed_at": "2025-01-15T10:35:30Z",
"result_url": "https://results.knowhereto.ai/result_job_abc123.zip?X-Amz-...",
"result_url_expires_at": "2025-01-16T10:35:30Z",
"result_checksum": {
"algorithm": "sha256",
"value": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
}

Status: failed

Processing failed.

{
"job_id": "job_abc123",
"status": "failed",
"source_type": "url",
"data_id": "my_document",
"created_at": "2025-01-15T10:30:00Z",
"failed_at": "2025-01-15T10:31:15Z",
"error": {
"code": "INVALID_ARGUMENT",
"message": "Unable to parse document: file appears to be corrupted"
}
}

Response Fields

Common Fields (all statuses)

FieldTypeDescription
job_idstringUnique job identifier
statusstringCurrent job status
source_typestring"url" or "file"
data_idstring | nullYour custom identifier (if provided)
created_atstringJob creation timestamp (ISO 8601)

waiting-file Status Fields

FieldTypeDescription
upload_urlstringPresigned URL for file upload
upload_headersobjectHeaders to use when uploading

running Status Fields

FieldTypeDescription
progressobjectProcessing progress
progress.total_pagesintegerTotal pages in document
progress.processed_pagesintegerPages processed so far

done Status Fields

FieldTypeDescription
completed_atstringCompletion timestamp (ISO 8601)
result_urlstringPresigned URL to download results
result_url_expires_atstringURL expiration time (ISO 8601)
result_checksumobjectZIP file integrity checksum
result_checksum.algorithmstringHash algorithm ("sha256")
result_checksum.valuestringHex-encoded hash (64 characters)

failed Status Fields

FieldTypeDescription
failed_atstringFailure timestamp (ISO 8601)
errorobjectError details
error.codestringError code
error.messagestringHuman-readable error message

Examples

Basic Request

curl https://api.knowhereto.ai/v1/jobs/job_abc123 \
-H "Authorization: Bearer $KNOWHERE_API_KEY"

Polling Until Complete

import time
import requests

def wait_for_job(job_id: str, timeout: int = 300) -> dict:
"""Poll until job completes or fails."""
start = time.time()
delay = 2

while time.time() - start < timeout:
response = requests.get(
f"https://api.knowhereto.ai/v1/jobs/{job_id}",
headers={"Authorization": f"Bearer {API_KEY}"}
)
job = response.json()

if job["status"] == "done":
return job
elif job["status"] == "failed":
raise Exception(f"Job failed: {job['error']['message']}")

# Show progress
if "progress" in job:
p = job["progress"]
print(f"Progress: {p['processed_pages']}/{p['total_pages']}")

time.sleep(delay)
delay = min(delay + 2, 10) # Exponential backoff, max 10s

raise TimeoutError("Job did not complete in time")

# Usage
job = wait_for_job("job_abc123")
print(f"Download results: {job['result_url']}")

Errors

CodeHTTP StatusDescription
NOT_FOUND404Job does not exist
UNAUTHENTICATED401Invalid API key
RESOURCE_EXHAUSTED429Rate limit exceeded

Example: Job Not Found

{
"success": false,
"error": {
"code": "NOT_FOUND",
"message": "Job not found",
"request_id": "req_xyz789",
"details": {
"resource": "Job",
"id": "job_nonexistent"
}
}
}

Next Steps