Skip to main content

File Sources

Knowhere API supports two ways to submit documents for parsing: URL and File Upload. Choose the method that best fits your use case.

Source Type Comparison

FeatureURL SourceFile Upload
Use casePublicly accessible documentsLocal or private files
Initial response202 Accepted200 OK with upload URL
Extra step neededNoYes (upload file)
Best forWeb scraping, CDN filesUser uploads, local processing

Option 1: URL Source

Use source_type: "url" when your document is publicly accessible via HTTP/HTTPS.

Request

curl -X POST https://api.knowhereto.ai/v1/jobs \
-H "Authorization: Bearer $KNOWHERE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source_type": "url",
"source_url": "https://example.com/report.pdf"
}'

Response

{
"job_id": "job_abc123",
"status": "pending",
"source_type": "url",
"created_at": "2025-01-15T10:30:00Z"
}

Requirements

  • URL must be publicly accessible (no authentication required)
  • URL must point directly to the file (not a download page)
  • Supported protocols: http://, https://
  • File must have a recognized extension or correct Content-Type header

Option 2: File Upload (Presigned URL)

Use source_type: "file" for local files or when the document isn't publicly accessible. This uses a secure two-step process we call the "Slot Model":

How It Works

Step 1: Request Upload Slot

curl -X POST https://api.knowhereto.ai/v1/jobs \
-H "Authorization: Bearer $KNOWHERE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source_type": "file",
"file_name": "report.pdf"
}'

Step 2: Receive Upload URL

{
"job_id": "job_xyz789",
"status": "waiting-file",
"source_type": "file",
"upload_url": "https://storage.knowhereto.ai/uploads/...",
"upload_headers": {
"Content-Type": "application/pdf"
},
"created_at": "2025-01-15T10:30:00Z"
}

Step 3: Upload the File

Use the upload_url and upload_headers from the response:

curl -X PUT "https://storage.knowhereto.ai/uploads/..." \
-H "Content-Type: application/pdf" \
--data-binary @report.pdf
Important

You must use the exact headers provided in upload_headers. The presigned URL is configured to expect specific headers, and mismatches will cause upload failures.

Step 4: Processing Begins Automatically

Once the upload completes successfully, our system automatically detects the file and begins processing. The job status transitions from waiting-file to pending.

Choosing the Right Method

Use URL Source When:

  • Document is on a public CDN or website
  • You're processing documents from known sources
  • You want the simplest integration
  • File is already hosted somewhere accessible

Use File Upload When:

  • User uploads files through your application
  • Documents are stored privately
  • You're processing local files
  • Security requires files not be publicly accessible

Code Examples

Complete File Upload Flow

import requests

API_KEY = "your_api_key"
BASE_URL = "https://api.knowhereto.ai"

def parse_local_file(file_path: str) -> dict:
"""Parse a local file using the upload flow."""

# Step 1: Create job and get upload URL
file_name = file_path.split("/")[-1]

response = requests.post(
f"{BASE_URL}/v1/jobs",
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
},
json={
"source_type": "file",
"file_name": file_name
}
)
job = response.json()

# Step 2: Upload the file
with open(file_path, "rb") as f:
upload_response = requests.put(
job["upload_url"],
headers=job.get("upload_headers", {}),
data=f.read()
)

if upload_response.status_code not in [200, 204]:
raise Exception(f"Upload failed: {upload_response.status_code}")

return job

Next Steps