What is Content Ingestion in Kadal?
Content ingestion in Kadal refers to the process of enriching the ingested content in the platform by processing, chunking, adding relevant metadata thereby making it available for enhanced search, AI powered interactions, repository management, and agent-based processing. Once a file is uploaded, it undergoes processing to ensure it is correctly stored, indexed, and ready for use. The file types currently supported for ingestion are Documents (.docx), Spreadsheet (.csv, .xls, .xlsx), txt, PDF, Presentation (.pptx), HTML, epub)
What Happens When You Upload a File?
When a file is uploaded, it goes through the following stages:
-
Processing
- Indication:
- The file is actively analyzed, indexed, and ingested into the system.
-
Note:
- Users can continue to interact with AI agents while files are being processed, but responses referring to these files may not be fully accurate until processing is complete.
- Users cannot publish an agent if its associated files are still processing.
- When a group is created, files under processing will become visible to users once processing is completed.
- Indication:
-
Not Processed
- Indication: No Icon
- The files that have not been processed. This includes unsupported file types, which are not taken for ingestion.
-
Processed & Available
-
- Indication: No Icon
- The file has been successfully ingested and is ready for AI interactions.
- Note: The file is fully functional in AI Agents.
-
-
Failed
-
Indication:
- The file could not be processed due to some errors and should be re-uploaded.
- Action:
- Re-upload the file with a valid version to restart ingestion.
-
Note:
- The system will attempt an automatic retry before notifying the user of failure.
- Users cannot select failed files when creating an agent or assigning files to a group.
- Failed files can still be downloaded or deleted, but they cannot be renamed while in a failed state.
-
Indication:
What to Do If Your File Fails to Process?
- Check File Format: Ensure the file is in a supported format. The file types currently supported for ingestion are Documents (.docx), Spreadsheet (.csv, .xls, .xlsx), txt, PDF, Presentation (.pptx), HTML, and epub.
- Re-upload the File: Re-upload the file to restart ingestion.
- Check Error Messages: If a specific issue is mentioned, follow the suggested resolution.
- Contact Support: If the file repeatedly fails or remains in a continuous processing state for too long, reach out to kadalsupport@learningmate.com for assistance.
Best Practices for Uploading Files
To optimize AI responses and ensure efficient file processing, follow these best practices:
-
Use Input Files for Full-Document Queries: If a query requires scanning the entire document rather than retrieving information from specific sections, upload the document as an Input File.
-
Character Limit for Chat-Time Uploads
- The total text across all uploaded files in a single query should not exceed 350,000 characters (approximately 100,000 tokens).
- You can upload up to three files at a time.
-
File Replacement Rule
- Only the three most recently uploaded files will be considered for query responses.
- If a new file is uploaded beyond this limit, the oldest file will be removed from consideration.
-
Avoid Large Non-Selectable PDFs
- Scanned PDFs exceeding 200 pages can be processed but will take significantly more time.
- Whenever possible, use selectable text PDFs instead of image-based PDFs to improve processing speed.
By following these best practices, you can ensure that your files are ingested efficiently and that AI responses remain accurate and relevant.
Permissions & Restrictions Based on File Status Kadal enforces specific rules based on the file's processing state to ensure a smooth user experience and prevent errors.
| File Status | Select for Agent Creation | Attach in Threads/Agents | Download | Rename | Delete |
|---|---|---|---|---|---|
| Processing | ✅ Yes | ❌ No | ✅ Yes | ❌ No | ✅ Yes |
| Processed | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| Failed | ❌ No | ❌ No | ✅ Yes | ❌ No | ✅ Yes |
Mistral OCR for Content Ingestion
Kadal offers Mistral OCR, a specialized Large Language Model (LLM), enhancing the platform's capabilities for content ingestion.
- While Docling is the default LLM used for content ingestion, Mistral OCR is now available for processing complex content that contains images, tables, mathematical formulas, etc.
- Efficient Handling of Unstructured Data: Mistral OCR provides more intelligent and accurate ingestion of unstructured data, resulting in improved AI responses during conversations.
- Supported File Types: Mistral OCR supports ingestion of:
- PDF files
- DOCX files
- PPTX files
- Limitations: Files can have up to 1000 pages/slides and a maximum size of 50 MB.
How to Enable Mistral OCR:
- Mistral OCR is not enabled by default. To use this feature, users must place a request with kadalsupport to activate Mistral OCR for their tenant.
- For other supported file types or simpler content, Docling continues to be the default processing model.
- Adding Mistral OCR enhances the efficiency and accuracy of content ingestion, particularly for advanced and complex content.