Algoritmos de junção por similaridade sobre fluxo de dados
Nenhuma Miniatura disponível
Data
2020-07-21
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de Goiás
Resumo
In today's Big Data era, data is generated and collected at high speed, which imposes strict performance and
memory requirements for processing this data. Also, the presence of heterogeneity data demands the use of
similarity operations, which are computationally more expensive. In this context, the present work
investigates the problem of performing similarity join over a continuous stream of data represented by sets.
The concept of temporal similarity is employed, where the similarity between two data items decreases with
the distance in their arrival time. The proposed algorithms directly incorporates this concept to reduce the
comparison of space and memory consumption. Moreover, a new technique based on the partial frequency
of the data elements is presented to substantially reduce processing cost. Results of the experimental
evaluation performed demonstrate that the techniques presented provide substantial performance gains and
good memory usage.
Descrição
Palavras-chave
Citação
PACÍFICO, L. O. Algoritmos de junção por similaridade sobre fluxo de dados. 2020. 51 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Goiás, Goiânia, 2020.