HSS Storage Data Handling (Blobs, Byte Offsets)
The MARS HSS (High Speed Storage) format utilizes a specific methodology for organizing and accessing archived data, prioritizing both storage efficiency and rapid retrieval. It moves away from storing millions of individual small files towards managing data within larger constructs.
Core Data Handling Concepts:
- Storage Objects (Blobs): Content, particularly from print streams or large reports, is encoded and consolidated into larger binary objects, often referred to as blobs. A single storage object can contain data corresponding to numerous individual documents or report pages.
- Resource Separation: Common resources (like fonts, form definitions, images, overlays used repeatedly within a print job or report set) are typically stored separately, either within dedicated resource datasets (RDS) or within the object header, rather than being duplicated for every document instance.
- Byte Offset Indexing: Crucially, HSS uses byte offsets to locate specific documents or pages within a large storage object. When data is ingested:
- Metadata describing the content (e.g., document boundaries, index values) is captured.
- This metadata, along with the starting byte position (offset) and length of each individual document/page within the larger storage object, is stored in segmentation tables, usually within an associated database.
- Metadata might also be stored within the header of the storage object itself for redundancy or recovery purposes.
- Rapid Retrieval: When a user requests a specific document, the system uses the index database to look up its byte offset and length within the appropriate storage object. It can then retrieve only that specific segment of the blob directly, without needing to read or process the entire large object. This mechanism enables retrieval times often measured in milliseconds, even from highly compressed archives.
This combination of consolidation into blobs and precise byte-offset indexing allows HSS to achieve both high storage density and fast random access to individual archived items.