Efficient processing of multiway spatial join queries in distributed systems
Nenhuma Miniatura disponível
Data
2017-11-29
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de Goiás
Resumo
Multiway spatial join is an important type of query in spatial data processing, and its
efficient execution is a requirement to move spatial data analysis to scalable platforms
as has already happened with relational and unstructured data. In this thesis, we provide
a set of comprehensive models and methods to efficiently execute multiway spatial join
queries in distributed systems. We introduce a cost-based optimizer that is able to select a
good execution plan for processing such queries in distributed systems taking into account:
the partitioning of data based on the spatial attributes of datasets; the intra-operator level
of parallelism, which enables high scalability; and the economy of cluster resources by
appropriately scheduling the queries before execution. We propose a cost model based on
relevant metadata about the spatial datasets and the data distribution, which identifies the
pattern of costs incurred when processing a query in this environment. We formalized the
distributed multiway spatial join plan scheduling problem as a bi-objective linear integer
model, considering the minimization of both the makespan and the communication cost
as objectives. Three methods are proposed to compute schedules based on this model
that significantly reduce the resource consumption required to process a query. Although
targeting multiway spatial join query scheduling, these methods can be applied to other
kinds of problems in distributed systems, notably problems that require both the alignment
of data partitions and the assignment of jobs to machines. Additionally, we propose a
method to control the usage of resources and increase system throughput in the presence
of constraints on the network or processing capacity. The proposed cost-based optimizer
was able to select good execution plans for all queries in our experiments, using public
datasets with a significant range of sizes and complex spatial objects. We also present an
execution engine that is capable of performing the queries with near-linear scalability with
respect to execution time.
Descrição
Citação
OLIVEIRA, Thiago Borges de. Efficient processing of multiway spatial join queries in distributed systems. 2017. 156 f. Tese (Doutorado em Ciência da Computação em Rede) - Universidade Federal de Goiás, Goiânia, 2017.