File Sources

Knowhere API supports two ways to submit documents for parsing: URL and File Upload. Choose the method that best fits your use case.

Source Type Comparison

Feature	URL Source	File Upload
Use case	Publicly accessible documents	Local or private files
Initial response	`202 Accepted`	`200 OK` with upload URL
Extra step needed	No	Yes (upload file)
Best for	Web scraping, CDN files	User uploads, local processing

Option 1: URL Source

Use source_type: "url" when your document is publicly accessible via HTTP/HTTPS.

Request

curl -X POST https://api.knowhereto.ai/v1/jobs \
  -H "Authorization: Bearer $KNOWHERE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source_type": "url",
    "source_url": "https://example.com/report.pdf"
  }'

Response

{
  "job_id": "job_abc123",
  "status": "pending",
  "source_type": "url",
  "created_at": "2025-01-15T10:30:00Z"
}

Requirements

URL must be publicly accessible (no authentication required)
URL must point directly to the file (not a download page)
Supported protocols: http://, https://
File must have a recognized extension or correct Content-Type header

Option 2: File Upload (Presigned URL)

Use source_type: "file" for local files or when the document isn't publicly accessible. This uses a secure two-step process we call the "Slot Model":

How It Works

Step 1: Request Upload Slot

curl -X POST https://api.knowhereto.ai/v1/jobs \
  -H "Authorization: Bearer $KNOWHERE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source_type": "file",
    "file_name": "report.pdf"
  }'

Step 2: Receive Upload URL

{
  "job_id": "job_xyz789",
  "status": "waiting-file",
  "source_type": "file",
  "upload_url": "https://storage.knowhereto.ai/uploads/...",
  "upload_headers": {
    "Content-Type": "application/pdf"
  },
  "created_at": "2025-01-15T10:30:00Z"
}

Step 3: Upload the File

Use the upload_url and upload_headers from the response:

curl -X PUT "https://storage.knowhereto.ai/uploads/..." \
  -H "Content-Type: application/pdf" \
  --data-binary @report.pdf

Important

You must use the exact headers provided in upload_headers. The presigned URL is configured to expect specific headers, and mismatches will cause upload failures.

Step 4: Processing Begins Automatically

Once the upload completes successfully, our system automatically detects the file and begins processing. The job status transitions from waiting-file to pending.

Choosing the Right Method

Use URL Source When:

Document is on a public CDN or website
You're processing documents from known sources
You want the simplest integration
File is already hosted somewhere accessible

Use File Upload When:

User uploads files through your application
Documents are stored privately
You're processing local files
Security requires files not be publicly accessible

Code Examples

Complete File Upload Flow

Python
Node.js

import requests

API_KEY = "your_api_key"
BASE_URL = "https://api.knowhereto.ai"

def parse_local_file(file_path: str) -> dict:
    """Parse a local file using the upload flow."""
    
    # Step 1: Create job and get upload URL
    file_name = file_path.split("/")[-1]
    
    response = requests.post(
        f"{BASE_URL}/v1/jobs",
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json"
        },
        json={
            "source_type": "file",
            "file_name": file_name
        }
    )
    job = response.json()
    
    # Step 2: Upload the file
    with open(file_path, "rb") as f:
        upload_response = requests.put(
            job["upload_url"],
            headers=job.get("upload_headers", {}),
            data=f.read()
        )
    
    if upload_response.status_code not in [200, 204]:
        raise Exception(f"Upload failed: {upload_response.status_code}")
    
    return job

import fs from 'fs';
import path from 'path';

const API_KEY = 'your_api_key';
const BASE_URL = 'https://api.knowhereto.ai';

async function parseLocalFile(filePath) {
  // Step 1: Create job and get upload URL
  const fileName = path.basename(filePath);
  
  const response = await fetch(`${BASE_URL}/v1/jobs`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      source_type: 'file',
      file_name: fileName
    })
  });
  
  const job = await response.json();
  
  // Step 2: Upload the file
  const fileBuffer = fs.readFileSync(filePath);
  const uploadResponse = await fetch(job.upload_url, {
    method: 'PUT',
    headers: job.upload_headers || {},
    body: fileBuffer
  });
  
  if (!uploadResponse.ok) {
    throw new Error(`Upload failed: ${uploadResponse.status}`);
  }
  
  return job;
}

Next Steps

Job Lifecycle - Understand all job states
File Upload Guide - Detailed upload instructions
Create Job API - Full API reference

Source Type Comparison​

Option 1: URL Source​

Request​

Response​

Requirements​

Option 2: File Upload (Presigned URL)​

How It Works​

Step 1: Request Upload Slot​

Step 2: Receive Upload URL​

Step 3: Upload the File​

Step 4: Processing Begins Automatically​

Choosing the Right Method​

Use URL Source When:​

Use File Upload When:​

Code Examples​

Complete File Upload Flow​

Next Steps​