Proposal and Evaluation of Efficient Pruning Approaches for Multi-Vector Representation in Passage Retrieval

Carregando...
Imagem de Miniatura

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de Goiás

Resumo

Multi-vector retrieval models employ bi-encoders to generate contextualized embeddings for queries and passages, and have proven highly effective in capturing fine-grained token-level interactions. Models such as ColBERT, ColBERTv2, and PLAID leverage all token-level output vectors from the encoder to accurately model query-passage relationships. However, storing dense vectors for every token in each passage results in substantial mem-ory overhead. Additionally, query latency is significantly affected by the computational cost of computing inner products between each query token and all passage tokens to obtain similarity scores. In this work, we explore pruning techniques applied to passage vectors produced by PLAID, aiming to remove less important token vectors to improve memory efficiency and reduce query processing time, with minimal impact on retrieval effectiveness. We propose two novel pruning methods: MLM Max with Token Reordering (MMTR) and TF-IDF pruning. We conducted extensive experiments on both in-domain and zero-shot (out-of-domain) datasets, following best-practice evaluation protocols. Our results show that MMTR consistently yields the smallest effectiveness drop compared to the original, unpruned PLAID model. We observe that retaining 50% of the passage to-ken embeddings provides the best trade-off between effectiveness, index size, and latency across most datasets. Interestingly, on certain out-of-domain datasets, pruning acts as a form of noise reduction—where retaining only 25% of the token embeddings leads to improved retrieval performance over the full, unpruned index.

Descrição

Citação

CHIHURURU, A. M. Proposal and Evaluation of Efficient Pruning Approaches for Multi-Vector Representation in Passage Retrieval = Proposta e Avaliação de Abordagens Eficientes de Poda para Representações Multi-Vetoriais na Recuperação de Passagens. 2025. 125 f. Dissertação (Mestrado em Ciência da Computação) – Instituto de Informática, Universidade Federal de Goiás, Goiânia, 2025.