MARS Scanner Document Separation Methods

When processing batches containing multiple documents scanned together, the Helix MARS Scanner employs several automated methods to accurately separate the image stream into distinct, logical documents before classification and extraction.

Separation Techniques:

Barcode Recognition: The system can detect specific barcodes (using integrated ZBarCode technology or similar) placed on the first or last page of a document, or on dedicated separator sheets, using the barcode value to signal the start or end of a document boundary.
Separator Sheets/Lists: Predefined blank pages or pages with specific markings (sometimes called patch codes or listed values) can be inserted between documents before scanning. The scanner application recognizes these specific sheets and uses them as delimiters to separate adjacent documents within the batch.
Keyword/Content Analysis: The application can analyze the text content (obtained via OCR) of each page, looking for specific keywords, phrases, or patterns defined by regular expressions that consistently indicate the start or end of a particular document type. For example, recognizing "Invoice Number:" might signal the start of a new invoice document.
Machine Learning (ML) Classification: Advanced configurations can utilize machine learning models (like Support Vector Machines or others) trained to recognize the visual layout and structural characteristics of different document types. The ML model can then classify pages and identify boundaries between different document classes within the batch, enabling separation even without explicit barcodes or keywords.

The choice of method often depends on the consistency of the documents being scanned and the level of preparation feasible before scanning. Multiple methods might sometimes be used in combination for improved accuracy.