We current AMES (Approximate Multimodal Enterprise Search), a unified multimodal late interplay retrieval structure which is backend agnostic. AMES demonstrates that fine-grained multimodal late interplay retrieval will be deployed inside a manufacturing grade enterprise search engine with out architectural redesign. Textual content tokens, picture patches, and video frames are embedded right into a shared illustration area utilizing multi-vector encoders, enabling cross-modal retrieval with out modality particular retrieval logic. AMES employs a two-stage pipeline: parallel token degree ANN search with per doc High-M MaxSim approximation, adopted by accelerator optimized Precise MaxSim re-ranking. Experiments on the ViDoRe V3 benchmark present that AMES achieves aggressive rating efficiency inside a scalable, manufacturing prepared Solr based mostly system.







