Automated Document Encoding/Type Classification (SCDF APIs ML)
The MARS SCDF (Stare and Compare Document Forensics) APIs leverage Machine Learning (ML) models to provide sophisticated, automated analysis and classification of documents based on their intrinsic characteristics, applicable across diverse repositories and data feeds.
Primary ML-driven capabilities offered via these APIs involve:
- Encoding Format Detection: Programmatically identifies the underlying file encoding or format of a document object, whether it's a print stream (AFP, EBCDIC, DJDE), an image (TIFF, JPEG), a standard document type (PDF, Office formats), or other enterprise formats. This ensures the correct processing path can be selected.
- Business Document Type Classification: Extends beyond format identification to determine the functional type of the document (e.g., classifying it as an invoice, purchase order, bank statement, insurance policy, etc.). This classification relies on ML models trained to recognize patterns in layout, structure, keywords, and other features characteristic of specific document types within a business context.
- System-Agnostic Application: This automated classification can be applied consistently to documents regardless of their source system, repository, or how they were ingested (e.g., legacy archives, active ECMs, new data feeds). The resulting classification metadata can then be used to drive workflows, apply retention policies, enhance search, or enforce compliance rules.