File Sources
Knowhere API supports two ways to submit documents for parsing: URL and File Upload. Choose the method that best fits your use case.
Source Type Comparison
| Feature | URL Source | File Upload |
|---|---|---|
| Use case | Publicly accessible documents | Local or private files |
| Initial response | 202 Accepted | 200 OK with upload URL |
| Extra step needed | No | Yes (upload file) |
| Best for | Web scraping, CDN files | User uploads, local processing |
Option 1: URL Source
Use source_type: "url" when your document is publicly accessible via HTTP/HTTPS.
Request
curl -X POST https://api.knowhereto.ai/v1/jobs \
-H "Authorization: Bearer $KNOWHERE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source_type": "url",
"source_url": "https://example.com/report.pdf"
}'
Response
{
"job_id": "job_abc123",
"status": "pending",
"source_type": "url",
"created_at": "2025-01-15T10:30:00Z"
}
Requirements
- URL must be publicly accessible (no authentication required)
- URL must point directly to the file (not a download page)
- Supported protocols:
http://,https:// - File must have a recognized extension or correct
Content-Typeheader
Option 2: File Upload (Presigned URL)
Use source_type: "file" for local files or when the document isn't publicly accessible. This uses a secure two-step process we call the "Slot Model":
How It Works
Step 1: Request Upload Slot
curl -X POST https://api.knowhereto.ai/v1/jobs \
-H "Authorization: Bearer $KNOWHERE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source_type": "file",
"file_name": "report.pdf"
}'
Step 2: Receive Upload URL
{
"job_id": "job_xyz789",
"status": "waiting-file",
"source_type": "file",
"upload_url": "https://storage.knowhereto.ai/uploads/...",
"upload_headers": {
"Content-Type": "application/pdf"
},
"created_at": "2025-01-15T10:30:00Z"
}
Step 3: Upload the File
Use the upload_url and upload_headers from the response:
curl -X PUT "https://storage.knowhereto.ai/uploads/..." \
-H "Content-Type: application/pdf" \
--data-binary @report.pdf
Important
You must use the exact headers provided in upload_headers. The presigned URL is configured to expect specific headers, and mismatches will cause upload failures.
Step 4: Processing Begins Automatically
Once the upload completes successfully, our system automatically detects the file and begins processing. The job status transitions from waiting-file to pending.
Choosing the Right Method
Use URL Source When:
- Document is on a public CDN or website
- You're processing documents from known sources
- You want the simplest integration
- File is already hosted somewhere accessible
Use File Upload When:
- User uploads files through your application
- Documents are stored privately
- You're processing local files
- Security requires files not be publicly accessible
Code Examples
Complete File Upload Flow
- Python
- Node.js
import requests
API_KEY = "your_api_key"
BASE_URL = "https://api.knowhereto.ai"
def parse_local_file(file_path: str) -> dict:
"""Parse a local file using the upload flow."""
# Step 1: Create job and get upload URL
file_name = file_path.split("/")[-1]
response = requests.post(
f"{BASE_URL}/v1/jobs",
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
},
json={
"source_type": "file",
"file_name": file_name
}
)
job = response.json()
# Step 2: Upload the file
with open(file_path, "rb") as f:
upload_response = requests.put(
job["upload_url"],
headers=job.get("upload_headers", {}),
data=f.read()
)
if upload_response.status_code not in [200, 204]:
raise Exception(f"Upload failed: {upload_response.status_code}")
return job
import fs from 'fs';
import path from 'path';
const API_KEY = 'your_api_key';
const BASE_URL = 'https://api.knowhereto.ai';
async function parseLocalFile(filePath) {
// Step 1: Create job and get upload URL
const fileName = path.basename(filePath);
const response = await fetch(`${BASE_URL}/v1/jobs`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
source_type: 'file',
file_name: fileName
})
});
const job = await response.json();
// Step 2: Upload the file
const fileBuffer = fs.readFileSync(filePath);
const uploadResponse = await fetch(job.upload_url, {
method: 'PUT',
headers: job.upload_headers || {},
body: fileBuffer
});
if (!uploadResponse.ok) {
throw new Error(`Upload failed: ${uploadResponse.status}`);
}
return job;
}
Next Steps
- Job Lifecycle - Understand all job states
- File Upload Guide - Detailed upload instructions
- Create Job API - Full API reference