Mestrado em Ciência da Computação (INF)
URI Permanente para esta coleção
Navegar
Navegando Mestrado em Ciência da Computação (INF) por Por Área do CNPQ "CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO"
Agora exibindo 1 - 2 de 2
Resultados por página
Opções de Ordenação
Item Estudo comparativo de comitês de sub-redes neurais para o problema de aprender a ranquear(Universidade Federal de Goiás, 2023-09-01) Ribeiro, Diogo de Freitas; Sousa, Daniel Xavier de; http://lattes.cnpq.br/4603724338719739; Rosa, Thierson Couto; http://lattes.cnpq.br/4414718560764818; Rosa, Thierson Couto; Sousa, Daniel Xavier de; Canuto, Sérgio Daniel Carvalho; Martins, Wellington SantosLearning to Rank (L2R) is a sub-area of Information Retrieval that aims to use machine learning to optimize the positioning of the most relevant documents in the answer ranking to a specific query. Until recently, the LambdaMART method, which corresponds to an ensemble of regression trees, was considered state-of-the-art in L2R. However, the introduction of AllRank, a deep learning method that incorporates self-attention mechanisms, has overtaken LambdaMART as the most effective approach for L2R tasks. This study, at issued, explored the effectiveness and efficiency of sub-networks ensemble as a complementary method to an already excellent idea, which is the self-attention used in AllRank, thus establishing a new level of innovation and effectiveness in the field of ranking. Different methods for forming sub-networks ensemble, such as MultiSample Dropout, Multi-Sample Dropout (Training and Testing), BatchEnsemble and Masksembles, were implemented and tested on two standard data collections: MSLRWEB10K and YAHOO!. The results of the experiments indicated that some of these ensemble approaches, specifically Masksembles and BatchEnsemble, outperformed the original AllRank in metrics such as NDCG@1, NDCG@5 and NDCG@10, although they were more costly in terms of training and testing time. In conclusion, the research reveals that the application of sub-networks ensemble in L2R models is a promising strategy, especially in scenarios where latency time is not critical. Thus, this work not only advances the state of the art in L2R, but also opens up new possibilities for improvements in effectiveness and efficiency, inspiring future research into the use of sub-networks ensemble in L2R.Item Junções por similaridade aproximadas em espaços vetoriais densos(Universidade Federal de Goiás, 2023-08-24) Santana , Douglas Rolins de; Santana; Ribeiro, Leonardo Andrade; http://lattes.cnpq.br/4036932351063584; Ribeiro, Leonardo Andrade; Bedo, Marcos Vinicius Naves; Martins, Wellington SantosSimilarity Join is an operation that returns pairs of objects whose similarity is greater than or equal to a specified threshold, and is essential for tasks such as cleaning, mining, and data integration. A common approach is to use data vector representations, such as the TFIDF method, and measure the similarity between vectors using the cosine function. However, computing the similarity for all pairs of vectors can be computationally prohibitive on large data sets. Traditional algorithms exploit the sparsity of vectors and apply filters to reduce the comparison space. Recently, advances in natural language processing have produced in semantically richer vectors, improving the results quality. However, these vectors have different characteristics from those generated by traditional methods, being dense and of high dimensionality. Preliminary experiments demonstrated that L2AP, the best known algorithm for similarity join, is not efficient for dense vector spaces. Due to the intrinsic characteristics of such vectors, approximate solutions based on specialized indices are predominant for dealing with large datasets. In this context, we investigate how to perform similarity joins using the Hierarchical Navigable Small World (HNSW), a state-of-the-art graph-based index designed for approximate k-nearest neighbor (kNN) queries. We explored the design space of possible solutions, ranging from top-end alternatives to HNSW to deeper integration of similarity join processing into this framework. The experiments carried out demonstrated accelerations of up to 2.48 and 3.47 orders of magnitude in relation to the exact method and the baseline approach, respectively, maintaining recovery rates close to 100%.