Get Job

Retrieve the status and results of a parsing job.

GET /v1/jobs/{job_id}

Path Parameters

Parameter	Type	Description
`job_id`	string	The unique identifier of the job

Response

The response structure varies based on the job's current status.

Response Headers

HTTP/1.1 200 OK
Content-Type: application/json
RateLimit-Limit: 60
RateLimit-Remaining: 59
RateLimit-Reset: 1672531200

Status: `waiting-file`

Waiting for file upload (only for source_type: "file").

{
  "job_id": "job_abc123",
  "status": "waiting-file",
  "source_type": "file",
  "data_id": "my_document",
  "upload_url": "https://storage.knowhereto.ai/...",
  "upload_headers": {
    "Content-Type": "application/pdf"
  },
  "created_at": "2025-01-15T10:30:00Z"
}

Status: `pending`

Job is queued for processing.

{
  "job_id": "job_abc123",
  "status": "pending",
  "source_type": "url",
  "data_id": "my_document",
  "created_at": "2025-01-15T10:30:00Z"
}

Status: `converting`

Document format conversion in progress.

{
  "job_id": "job_abc123",
  "status": "converting",
  "source_type": "file",
  "data_id": "my_document",
  "created_at": "2025-01-15T10:30:00Z"
}

Status: `running`

Document parsing in progress. Includes progress information for multi-page documents.

{
  "job_id": "job_abc123",
  "status": "running",
  "source_type": "file",
  "data_id": "my_document",
  "created_at": "2025-01-15T10:30:00Z",
  "progress": {
    "total_pages": 100,
    "processed_pages": 42
  }
}

Status: `done`

Processing completed successfully.

{
  "job_id": "job_abc123",
  "status": "done",
  "source_type": "file",
  "data_id": "my_document",
  "created_at": "2025-01-15T10:30:00Z",
  "completed_at": "2025-01-15T10:35:30Z",
  "result_url": "https://results.knowhereto.ai/result_job_abc123.zip?X-Amz-...",
  "result_url_expires_at": "2025-01-16T10:35:30Z",
  "result_checksum": {
    "algorithm": "sha256",
    "value": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
  }
}

Status: `failed`

Processing failed.

{
  "job_id": "job_abc123",
  "status": "failed",
  "source_type": "url",
  "data_id": "my_document",
  "created_at": "2025-01-15T10:30:00Z",
  "failed_at": "2025-01-15T10:31:15Z",
  "error": {
    "code": "INVALID_ARGUMENT",
    "message": "Unable to parse document: file appears to be corrupted"
  }
}

Response Fields

Common Fields (all statuses)

Field	Type	Description
`job_id`	string	Unique job identifier
`status`	string	Current job status
`source_type`	string	`"url"` or `"file"`
`data_id`	string \| null	Your custom identifier (if provided)
`created_at`	string	Job creation timestamp (ISO 8601)

`waiting-file` Status Fields

Field	Type	Description
`upload_url`	string	Presigned URL for file upload
`upload_headers`	object	Headers to use when uploading

`running` Status Fields

Field	Type	Description
`progress`	object	Processing progress
`progress.total_pages`	integer	Total pages in document
`progress.processed_pages`	integer	Pages processed so far

`done` Status Fields

Field	Type	Description
`completed_at`	string	Completion timestamp (ISO 8601)
`result_url`	string	Presigned URL to download results
`result_url_expires_at`	string	URL expiration time (ISO 8601)
`result_checksum`	object	ZIP file integrity checksum
`result_checksum.algorithm`	string	Hash algorithm (`"sha256"`)
`result_checksum.value`	string	Hex-encoded hash (64 characters)

`failed` Status Fields

Field	Type	Description
`failed_at`	string	Failure timestamp (ISO 8601)
`error`	object	Error details
`error.code`	string	Error code
`error.message`	string	Human-readable error message

Examples

Basic Request

cURL
Python
Node.js

curl https://api.knowhereto.ai/v1/jobs/job_abc123 \
  -H "Authorization: Bearer $KNOWHERE_API_KEY"

import requests

response = requests.get(
    "https://api.knowhereto.ai/v1/jobs/job_abc123",
    headers={"Authorization": f"Bearer {API_KEY}"}
)

job = response.json()
print(f"Status: {job['status']}")

if job["status"] == "done":
    print(f"Result URL: {job['result_url']}")
elif job["status"] == "failed":
    print(f"Error: {job['error']['message']}")

const response = await fetch('https://api.knowhereto.ai/v1/jobs/job_abc123', {
  headers: {
    'Authorization': `Bearer ${API_KEY}`
  }
});

const job = await response.json();
console.log(`Status: ${job.status}`);

if (job.status === 'done') {
  console.log(`Result URL: ${job.result_url}`);
} else if (job.status === 'failed') {
  console.log(`Error: ${job.error.message}`);
}

Polling Until Complete

Python
Node.js

import time
import requests

def wait_for_job(job_id: str, timeout: int = 300) -> dict:
    """Poll until job completes or fails."""
    start = time.time()
    delay = 2
    
    while time.time() - start < timeout:
        response = requests.get(
            f"https://api.knowhereto.ai/v1/jobs/{job_id}",
            headers={"Authorization": f"Bearer {API_KEY}"}
        )
        job = response.json()
        
        if job["status"] == "done":
            return job
        elif job["status"] == "failed":
            raise Exception(f"Job failed: {job['error']['message']}")
        
        # Show progress
        if "progress" in job:
            p = job["progress"]
            print(f"Progress: {p['processed_pages']}/{p['total_pages']}")
        
        time.sleep(delay)
        delay = min(delay + 2, 10)  # Exponential backoff, max 10s
    
    raise TimeoutError("Job did not complete in time")

# Usage
job = wait_for_job("job_abc123")
print(f"Download results: {job['result_url']}")

async function waitForJob(jobId, timeout = 300000) {
  const start = Date.now();
  let delay = 2000;
  
  while (Date.now() - start < timeout) {
    const response = await fetch(
      `https://api.knowhereto.ai/v1/jobs/${jobId}`,
      { headers: { 'Authorization': `Bearer ${API_KEY}` } }
    );
    const job = await response.json();
    
    if (job.status === 'done') {
      return job;
    } else if (job.status === 'failed') {
      throw new Error(`Job failed: ${job.error.message}`);
    }
    
    // Show progress
    if (job.progress) {
      console.log(`Progress: ${job.progress.processed_pages}/${job.progress.total_pages}`);
    }
    
    await new Promise(r => setTimeout(r, delay));
    delay = Math.min(delay + 2000, 10000);
  }
  
  throw new Error('Job did not complete in time');
}

// Usage
const job = await waitForJob('job_abc123');
console.log(`Download results: ${job.result_url}`);

Errors

Code	HTTP Status	Description
`NOT_FOUND`	404	Job does not exist
`UNAUTHENTICATED`	401	Invalid API key
`RESOURCE_EXHAUSTED`	429	Rate limit exceeded

Example: Job Not Found

{
  "success": false,
  "error": {
    "code": "NOT_FOUND",
    "message": "Job not found",
    "request_id": "req_xyz789",
    "details": {
      "resource": "Job",
      "id": "job_nonexistent"
    }
  }
}

Next Steps

Result Handling Guide - How to process results
Polling Best Practices - Efficient polling strategies
Error Handling - Handle all error cases

Path Parameters​

Response​

Response Headers​

Status: waiting-file​

Status: pending​

Status: converting​

Status: running​

Status: done​

Status: failed​

Response Fields​

Common Fields (all statuses)​

waiting-file Status Fields​

running Status Fields​

done Status Fields​

failed Status Fields​

Examples​

Basic Request​

Polling Until Complete​

Errors​

Example: Job Not Found​

Next Steps​