Junções por similaridade com expressões complexas em ambientes distribuídos
Nenhuma Miniatura disponível
Data
2018-08-31
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de Goiás
Resumo
A recurrent problem that degrades the quality of the information in databases is the presence
of duplicates, i.e., multiple representations of the same real-world entity. Despite being
computationally expensive, the use of similarity operations is fundamental to identify
duplicates. Furthermore, real-world data is typically composed of different attributes and each
attribute represents a distinct type of information. The application of complex similarity
expressions is important in this context because they allow considering the importance of
each attribute in the similarity evaluation. However, due to a large amount of data present in
Big Data applications, it has become crucial to perform these operations in parallel and
distributed processing environments. In order to solve such problems of great relevance to
organizations, this work proposes a novel strategy to identify duplicates in textual data by
using similarity joins with complex expressions in a distributed environment.
Descrição
Palavras-chave
Citação
OLIVEIRA, D. J. C. Junções por similaridade com expressões complexas em ambientes distribuídos. 2018. 61 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Goiás, Goiânia, 2018.