Carrara F., Vadicamo L., Gennaro C., Amato G.
Surrogate text representation Inverted index Approximate search High-dimensional indexing Very large databases
Approximate search for high-dimensional vectors is commonly addressed using dedicated techniques often combined with hardware acceleration provided by GPUs, FPGAs, and other custom in-memory silicon. Despite their effectiveness, harmonizing those optimized solutions with other types of searches often poses technological difficulties. For example, to implement a combined text+image multimodal search, we are forced first to query the index of high-dimensional image descriptors and then filter the results based on the textual query or vice versa. This paper proposes a text surrogate technique to translate real-valued vectors into text and index them with a standard textual search engine such as Elasticsearch or Apache Lucene. This technique allows us to perform approximate kNN searches of high-dimensional vectors alongside classical full-text searches natively on a single textual search engine, enabling multimedia queries without sacrificing scalability. Our proposal exploits a combination of vector quantization and scalar quantization. We compared our approach to the existing literature in this field of research, demonstrating a significant improvement in performance through preliminary experimentation.
Source: SISAP 2022 - 15th International Conference on Similarity Search and Applications, pp. 214–221, Bologna, Italy, 7-9/10/2022
@inproceedings{oai:it.cnr:prodotti:471829, title = {Approximate nearest neighbor search on standard search engines}, author = {Carrara F. and Vadicamo L. and Gennaro C. and Amato G.}, doi = {10.1007/978-3-031-17849-8_17}, booktitle = {SISAP 2022 - 15th International Conference on Similarity Search and Applications, pp. 214–221, Bologna, Italy, 7-9/10/2022}, year = {2022} }