Skip to main content

Result Delivery

When a job completes successfully, Knowhere API delivers results as a ZIP package containing structured data, extracted images, and tables.

Result URL

Completed jobs include a result_url field:

{
"job_id": "job_abc123",
"status": "done",
"result_url": "https://results.knowhereto.ai/result_job_abc123.zip?...",
"result_url_expires_at": "2025-01-16T10:32:45Z",
"result_checksum": {
"algorithm": "sha256",
"value": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
}
FieldDescription
result_urlPresigned URL to download the ZIP package
result_url_expires_atURL expiration time (1 hour from generation)
result_checksumSHA-256 hash for integrity verification
URL Expired?

The result_url is a presigned URL that expires after 1 hour. If expired, simply call GET /v1/jobs/{job_id} again to get a fresh URL.

Note: The result files themselves are retained for 30 days after job completion. After that, the files are permanently deleted.

ZIP Package Structure

result_job_abc123.zip
├── manifest.json # Package metadata and file index
├── chunks.json # All chunks with content and metadata (CORE FILE)
├── content.md # Full document as Markdown (optional)
├── images/ # Extracted images
│ ├── IMAGE_a8e1bf14-6ac4-5072-ae7d-fb47cd7dd948.jpg
│ └── IMAGE_1a9df66f-21c6-5138-996b-ada0165f37db.png
└── tables/ # Extracted tables as HTML
├── TABLE_fa7e0e7f-e815-5dc0-a998-593b8fc6d283.html
└── TABLE_5d5ae092-afe7-524b-b8ba-1195b5432b98.html

Core Files

manifest.json

The package index containing metadata and file listings:

{
"version": "1.0",
"job_id": "job_abc123",
"data_id": "my_custom_id",
"source_file_name": "annual_report.pdf",
"processing_date": "2025-01-15T10:32:45Z",
"checksum": {
"algorithm": "sha256",
"value": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
},
"statistics": {
"total_pages": 45,
"total_chunks": 152,
"text_chunks": 125,
"image_chunks": 15,
"table_chunks": 12
},
"files": {
"chunks": "chunks.json",
"markdown": "content.md",
"images": [...],
"tables": [...]
}
}

chunks.json

The core data file containing all parsed chunks:

{
"chunks": [
{
"chunk_id": "b0a1a904-6a50-5509-84ae-460509428111",
"type": "text",
"content": "The company achieved record revenue in Q3 2025...",
"path": "Financial Report/Executive Summary",
"metadata": {
"length": 782,
"tokens": 450,
"keywords": ["revenue", "growth", "Q3"],
"summary": "Executive summary of Q3 financial performance"
}
},
{
"chunk_id": "IMAGE_a8e1bf14-6ac4-5072-ae7d-fb47cd7dd948",
"type": "image",
"content": "Chart showing quarterly revenue growth from 2023-2025",
"path": "Financial Report/Revenue Charts",
"metadata": {
"file_path": "images/IMAGE_a8e1bf14-6ac4-5072-ae7d-fb47cd7dd948.jpg",
"original_name": "revenue-chart.jpg",
"alt_text": "Revenue growth chart"
}
},
{
"chunk_id": "TABLE_fa7e0e7f-e815-5dc0-a998-593b8fc6d283",
"type": "table",
"content": "Revenue by region: Americas $5.2B, EMEA $3.1B, APAC $2.8B",
"path": "Financial Report/Regional Breakdown",
"metadata": {
"file_path": "tables/TABLE_fa7e0e7f-e815-5dc0-a998-593b8fc6d283.html",
"original_name": "regional-revenue.html",
"table_type": "financial_data"
}
}
]
}

Chunk Types

Text Chunks

Standard text content from the document:

{
"chunk_id": "uuid",
"type": "text",
"content": "The actual text content...",
"path": "Section/Subsection",
"metadata": {
"length": 500,
"tokens": 120,
"keywords": ["keyword1", "keyword2"],
"summary": "AI-generated summary",
"relationships": ["IMAGE_xxx", "TABLE_yyy"]
}
}

Image Chunks

Extracted images with descriptions:

{
"chunk_id": "IMAGE_uuid",
"type": "image",
"content": "Description of the image content",
"path": "Section/Images",
"metadata": {
"file_path": "images/IMAGE_uuid.jpg",
"original_name": "descriptive-name.jpg",
"alt_text": "Alt text for accessibility"
}
}

Table Chunks

Extracted tables with text representation:

{
"chunk_id": "TABLE_uuid",
"type": "table",
"content": "Text representation of table data",
"path": "Section/Tables",
"metadata": {
"file_path": "tables/TABLE_uuid.html",
"original_name": "table-name.html",
"table_type": "financial_data"
}
}

The path Field

The path field represents the document's logical structure:

path: "Chapter 1/Introduction/Background"
└──────┘ └──────────┘ └────────┘
Level 1 Level 2 Level 3

Use this for:

  • Reconstructing document hierarchy
  • Filtering chunks by section
  • Building navigation trees
  • Contextual RAG retrieval

Integrity Verification

Always verify the download integrity using the SHA-256 checksum:

import hashlib

def verify_download(file_path: str, expected_checksum: str) -> bool:
sha256 = hashlib.sha256()
with open(file_path, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
sha256.update(chunk)
return sha256.hexdigest() == expected_checksum

Next Steps