Mestrado em Ciência da Computação (INF)

URI Permanente para esta coleção

Navegar

Submissões Recentes

Agora exibindo 1 - 20 de 302
  • Item
    Avaliação de Grandes Modelos de Linguagem para Classificação de Documentos Jurídicos em Português
    (Universidade Federal de Goiás, 2024-11-26) Santos, Willgnner Ferreira; Oliveira, Sávio Salvarino Teles de; http://lattes.cnpq.br/1905829499839846; Galvão Filho, Arlindo Rodrigues; http://lattes.cnpq.br/7744765287200890; Galvão Filho, Arlindo Rodrigues; Oliveira, Sávio Salvarino Teles de; Fanucchi , Rodrigo Zempulski; Soares, Anderson da Silva
    The increasing procedural demand in judicial institutions has caused a workload overload, impacting the efficiency of the legal system. This scenario, exacerbated by limited human resources, highlights the need for technological solutions to streamline the processing and analysis of documents. In light of this reality, this work proposes a pipeline for automating the classification of these documents, evaluating four methods of representing legal texts at the pipeline’s input: original text, summaries, centroids, and document descriptions. The pipeline was developed and tested at the Public Defender’s Office of the State of Goiás (DPE-GO). Each approach implements a specific strategy to structure the input texts, aiming to enhance the models’ ability to interpret and classify legal documents. A new Portuguese dataset was introduced, specifically designed for this application, and the performance of Large Language Models (LLMs) was evaluated in classification tasks. The analysis results demonstrate that the use of summaries improves classification accuracy and maximizes the F1-score, optimizing the use of LLMs by reducing the number of tokens processed without compromising precision. These findings highlight the impact of textual representations of documents and the potential of LLMs for the automatic classification of legal documents, as in the case of DPE-GO. The contributions of this work indicate that the application of LLMs, combined with optimized textual representations, can significantly increase the productivity and quality of services provided by judicial institutions, promoting advancements in the overall efficiency of the legal system.
  • Item
    Em Busca do Estado da Arte e da Práca sobre Schema Matching na Indústria Brasileira - Resultados Preliminares de uma Revisão de Literatura e uma Pesquisa de Opinião
    (Universidade Federal de Goiás, 2024-09-02) Borges, Ricardo Henricki Dias; Ribeiro, Leonardo Andrade; http://lattes.cnpq.br/4036932351063584; Graciano Neto, Valdemar Vicente; http://lattes.cnpq.br/9864803557706493; Graciano Neto, Valdemar Vicente; Ribeiro, Leonardo Andrade; Frantz, Rafael Zancan; Galvao Filho, Arlindo Rodrigues
    The integration of systems and interoperability between different databases are critical challenges in information technology, mainly due to the diversity of data schemas. The schema matching technique is essential for unifying these schemas, facilitating research, analysis, and knowledge discovery. This dissertation investigates the application of schema matching in the Brazilian software industry, focusing on understanding the reasons for its low adoption. The research included a systematic mapping of the use of Artificial Intelligence (AI) algorithms and similarity techniques in schema matching, as well as a survey with 35 professionals in the field. The results indicate that, although schema matching offers significant improvements in data integration processes, such as reducing time and increasing accuracy, most professionals are unfamiliar with the term, even among those who use similar tools. The low adoption of these techniques can be attributed to the lack of free or open source tools and the absence of implementation plans within companies. The dissertation highlights the need for initiatives that overcome these barriers, empower professionals, and promote broader use of schema matching in the Brazilian industry.
  • Item
    Legal Domain Adaptation in Portuguese Language Models - Developing and Evaluating RoBERTa-based Models on Legal Corpora
    (Universidade Federal de Goiás, 2024-05-28) Garcia, Eduardo Augusto Santos; Lima, Eliomar Araújo de; http://lattes.cnpq.br/1362170231777201; Silva, Nádia Félix Felipe da; http://lattes.cnpq.br/7864834001694765; Silva, Nádia Félix Felipe da; Lima, Eliomar Araújo de; Soares, Anderson da Silva; Placca, José Avelino
    This research investigates the application of Natural Language Processing (NLP) within the legal domain for the Portuguese language, emphasizing the importance of domain adaptation for pre-trained language models, such as RoBERTa, using specialized legal corpora. We compiled and pre-processed a Portuguese legal corpus, named LegalPT, addressing the challenges of high near-duplicate document rates in legal corpora and conducting a comparison with generic web-scraped corpora. Experiments with these corpora revealed that pre-training on a combined dataset of legal and general data resulted in a more effective model for legal tasks. Our model, called RoBERTaLexPT, outperformed larger models trained solely on generic corpora, such as BERTimbau and Albertina-PT-*, and other legal models from similar works. For evaluating the performance of these models, we propose in this Master’s dissertation a legal benchmark composed of several datasets, including LeNER-Br, RRI, FGV, UlyssesNER-Br, CEIAEntidades, and CEIA-Frases. This study contributes to the improvement of NLP solutions in the Brazilian legal context by openly providing enhanced models, a specialized corpus, and a rigorous benchmark suite.
  • Item
    Análise de um Fluxo Completo Automatizado de Etapas Voltado ao Reconhecimento de Texto em Imagens de Prescrições Médicas Manuscritas
    (Universidade Federal de Goiás, 2024-01-10) Corrêa, André Pires; Lima, Eliomar Araújo de; http://lattes.cnpq.br/1362170231777201; Nascimento, Hugo Alexandre Dantas do; http://lattes.cnpq.br/2920005922426876; Nascimento, Hugo Alexandre Dantas do; Costa, Ronaldo Martins da; Pedrini, Hélio; Lima, Eliomar Araújo de
    Compounding pharmacies deal with large volumes of medical prescriptions on a daily basis, whose data needs to be manually inputted into information management systems to properly process their customers’ orders. A considerable portion of these prescriptions tend to be written by doctors with poorly legible handwriting, which can make decoding them an arduous and time-consuming process. Previous works have investigated the use of machine learning for medical prescription recognition. However, the accuracy rates in these works are still fairly low and their approaches tend to be rather limited, as they typically utilize small datasets, focus only on specific steps of the automated analysis pipeline or use proprietary tools, which makes it difficult to replicate and analyse their results. The present work contributes towards filling this gap by presenting an end-toend process for automated data extraction from handwritten medical prescriptions, from text segmentation, to recognition and post-processing. The approach was built based on an evaluation and adaptation of multiple existing methods for each step of the pipeline. The methods were evaluated on a dataset of 993 images of medical prescriptions with 27,933 annotated words, produced with the support of a compounding pharmacy that participated in the project. The results obtained by the best performing methods indicate that the developed approach is reasonably effective, reaching an accuracy of 68% in the segmentation step, and a character accuracy rate of 86.8% in the text recognition step.
  • Item
    Análise de técnicas de ajuste fino em classificação de texto
    (Universidade Federal de Goiás, 2024-09-23) Pires, Tobias Gonçalves; Soares, Anderson da Silva; Soares, Anderson da Silva; Fanucchi, Rodrigo Zempulski; Galvão Filho, Arlindo Rodrigues
    Natural Language Processing (NLP) aims to develop models that enable computers to understand, interpret, process and generate text in a way similar to human communication. The last decade has seen significant advances in the field, with the introduction of deep neural network models, and the subsequent evolution of the architecture of these models such as the attention mechanism and the Transformers architecture, culminating in language models such as ELMo, BERT and GPT. And later models called Large Language Models (LLMs) improved the ability to understand and generate texts in a sophisticated way. Pre-trained models offer the advantage of reusing knowledge accumulated from vast datasets, although specific fine-tuning is required for individual tasks. However, training and tuning these models consumes a lot of processing resources, making it unfeasible for many organizations due to high costs. In resource-constrained environments, efficient fine-tuning techniques such as LoRA (Low-Rank Adaptation) were developed to optimize the model adaptation process, minimizing the number of adjustable parameters and avoiding overfitting. These techniques allow for faster and more economical training, while maintaining the robustness and generalization of the models. This work evaluates three efficient fine-tuning techniques LoRA, AdaLoRA and IA3 (in addition to full fine-tuning) in terms of memory consumption, training time and accuracy, using the DistilBERT, Roberta-base and TinyLlama models on different datasets (AG News, IMDb and SNLI).
  • Item
    Transcrição automática de sons polifônicos de guitarra na notação de tablaturas utilizando classificação temporal conexionista
    (Universidade Federal de Goiás, 2024-09-23) Gris, Lucas Rafael Stefanel; Soares, Anderson da Silva; http://lattes.cnpq.br/1096941114079527; Soares, Anderson da Silva; Laureano, Gustavo Teodoro; Barbosa, Yuri de Almeida Malheiros
    Automatic Guitar Transcription, a branch of Automatic Musical Transcription, is a task with great applicability for musicians of fretted instruments such as the electric guitar and acoustic guitar. Often, musicians on these instruments transcribe or read songs and musical pieces in tablature format, a notation widely used for this type of instrument. Despite its relevance, this annotation is still done manually, making it a very complex process, even for experienced musicians. In this context, this work proposes the use of artificial intelligence to develop models capable of performing the task of transcribing polyphonic guitar sounds automatically. In particular, this work investigates the use of a specific method called Connectionist Temporal Classification (CTC), an algorithm that can be used to train sequence classification models without the need for alignment, a fundamental aspect for training more robust models, as there are few openly available datasets. Additionally, this work investigates multi-task learning for note prediction alongside tablature prediction, achieving significant improvements over conventional learning. Overall, the results indicate that the use of CTC is very promising for tablature transcription, showing only a 14.28% relative decrease compared to the result obtained with aligned data.
  • Item
    Uma abordagem de alinhamento de minimapa usando informações de observação de pontos intrínsecos aos sistemas de SLAM Visual baseado em características
    (Universidade Federal de Goiás, 2024-10-04) Saraiva, Felipe Pires; Laureano, Gustavo Teodoro; http://lattes.cnpq.br/4418446095942420; Laureano, Gustavo Teodoro; Tarallo, André de Souza; Costa, Ronaldo Martins da
    This work proposes a map alignment approach based on a modification of the Iterative Closest Point (ICP) algorithm to consider point estimation confidence metrics already available in feature-based Visual SLAM systems. Mini-map alignment, in the context of a hierarchical map composed of several local representations of the environment, is an important task to allow the relationship of metric information between them, usually performed through the registration of the point clouds of each map. ICP is a widely used method in the literature for point cloud registration, but in the originally proposed form it does not consider the uncertainty of the points, and can be sensitive to noise, outliers and the initial estimate of the transformation. Feature-based Visual SLAM methods produce information intrinsic to the way they are modeled, which can represent the confidence of the map points and can be used to improve the alignment process. This research enumerates three possible SLAM metrics that can be used to represent the confidence of a map landmark, and investigates the potential of using these metrics to improve the ICP algorithm. The confidence metrics are incorporated into the ICP through a simple change in the correspondence estimation step to find the point with the highest confidence in a neighborhood of k nearest points. Experiments are conducted in different cases of initial misalignment to evaluate the influence of the confidence information suggested in this work, comparing the error of the final alignment of the point clouds and the number of iterations to achieve this alignment. The results show evidence that the use of confidence can help to improve the convergence of the ICP, both in the error of the final configuration and in the number of iterations required to achieve it.
  • Item
    Integração do modelo de referência Multi-Access Edge Computing (MEC) com o Núcleo 5G
    (Universidade Federal de Goiás, 2022-10-21) Xavier, Rúben França; Freitas, Leandro Alexandre; http://lattes.cnpq.br/7450982711522425; Oliveira Junior, Antonio Carlos de; http://lattes.cnpq.br/3148813459575445; Oliveira Junior, Antonio Carlos de; Freitas, Leandro Alexandre; Venâncio Neto, Augusto José; Figueiredo, Fabricio Lira
    Multi-access Edge Computing (MEC) is the key concept for developing new applications and services that bring the benefits of edge computing to networks and users. With applications and services at the edge, that is, closer to users and devices, it will be possible to use features such as ultra low latency, high bandwidth and resource consumption. However, for this to be possible, it is necessary to integrate the MEC framework and the 5G core. For this, this work proposes the specification of a service that will extend the Multi-Access Traffic Steering (MTS) API as a bridge for the connection between MEC and 5G Core. Free5GC and a Kubernetes cluster were used to simulate an end-to-end environment. The proposed service was validated using the aforementioned tool, with end-to-end scenarios and also in scenarios with a large volume of users. The validation demonstrated that the service solves the presented problem, in addition to demonstrating its viability in use cases.
  • Item
    Um modelo de alocação de recursos de rede e cache na borda para vídeo em fluxo contínuo armazenado e 360º
    (Universidade Federal de Goiás, 2024-09-04) Oliveira, Gustavo Dias de; Cardoso, Kleber Vieira; http://lattes.cnpq.br/0268732896111424; Correa, Sand Luz; http://lattes.cnpq.br/3386409577930822; Cardoso, Kleber Vieira; Cerqueira, Eduardo Coelho; Oliveira Júnior, Antonio Carlos de
    The advancement of immersive technologies, such as Augmented Reality (AR) and Virtual Reality (VR), has introduced significant challenges in the transmission of 360-degree videos, due to the increasing bandwidth and low latency requirements resulting from the large size of video frames used in these technologies. At the same time, video streaming consumption has grown exponentially, driven by technological advances and the widespread use of Internet-connected devices. Efficient transmission of 360-degree videos faces challenges such as the need for up to five times more bandwidth than that required for conventional vídeo high-definition transmissions, as well as stricter latency constraints. Strategies such as video projection slicing and transmitting only the user’s field of view, along with efficient network resource allocation, have been explored to overcome these limitations. To address these challenges, we propose DTMCash, which stands out by using dynamic tiles and combining users’ viewports, effectively tackling transmission in multi-user scenarios. The goal of this work is to develop a model for network and edge cache resource allocation for 360-degree video transmission, focusing on the optimization of these resources. To validate the proposed model, we initially conducted comparative experiments with 6 users, later expanding to 30 users. We also tested performance with different cache sizes and experiments varying user entry times, in addition to evaluating the transmission of different video content. Compared to a state-of-the-art solution, our proposal reduced the aggregate bandwidth consumption of the Internet link by at least 48.2%, while maintaining the same consumption on the wireless link and providing greater efficiency in cache usage
  • Item
    Um catálogo de padrões de requisitos de privacidade baseado na lei geral de proteção de dados pessoais
    (Universidade Federal de Goiás, 2024-03-18) Carneiro, Cinara Gomes de Melo; Kudo, Taciana Novo; http://lattes.cnpq.br/7044035224784132; Bulcão Neto, Renato de Freitas; http://lattes.cnpq.br/5627556088346425; Bulcão Neto , Renato de Freitas; Vincenzi , Auri Marcelo Rizzo; Alencar, Wanderley de Souza
    [Context] Currently, Brazilian companies are concerned about protecting the personal data of their customers and employees to ensure the privacy of these individuals. This concern arises from the fact that personal data protection is an obligation imposed by the General Data Protection Law (LGPD). Since most organizations store this data digitally to carry out various operations, software must comply with the current legislation. [Problem] According to recent research, a large portion of professionals in the software industry do not have comprehensive knowledge of privacy requirements or the LGPD. [Objective] The objective of this work is to construct and evaluate a Catalog of Privacy Requirement Patterns (CPRP) based on the LGPD. [Method] A method for syntactic analysis of the articles composing the LGPD was defined to extract privacy requirements. These requirements were converted into requirement patterns (RP) using a method for constructing RP catalogs based on the grammar of the Software Pattern Metamodel (SoPaMM), with the support of the Terminal Model Editor (TMed) tool. Finally, two experts in LGPD and Software Engineering evaluated the completeness and correctness of the developed catalog concerning the legislation. [Contributions] The conversion of legal requirements into privacy RPs can assist professionals in eliciting and specifying requirements, as privacy requirements can be reused in various contexts with minor or any modifications.
  • Item
    Classificação de documentos da administração pública utilizando inteligência artificial
    (Universidade Federal de Goiás, 2024-04-30) Carvalho, Rogerio Rodrigues; Costa, Ronaldo Martins da; http://lattes.cnpq.br/7080590204832262; Costa, Ronaldo Martins da; Souza, Rodrigo Gonçalves de; Silva, Nádia Félix Felipe da
    Public organizations face difficulties in classifying and promoting transparency of the numerous documents produced during the execution of their activities. Correct classification of documents is critical to prevent public access to sensitive information and protect individuals and organizations from malicious use. This work proposes two approachs to perform the task of classifying sensitive documents, using state-of-the-art artificial intelligence techniques and best practices found in the literature: a conventional method, which uses artificial intelligence techniques and regular expressions to analyze the textual content of documents, and an alternative method, which employs the CBIR technique to classify documents when text extraction is not viable. Using real data from the Electronic Information System (SEI) of the Federal University of Goiás (UFG), the results achieved demonstrated that the application of regular expressions as a preliminary check can improve the computational efficiency of the classification process, despite showing a modest increase in classification precision. The conventional method proved to be effective in document classification, with the BERT model standing out for its performance with an accuracy rate of 94%. The alternative method, in turn, offered a viable solution for challenging scenarios, showing promising results with an accuracy rate of 87% in classifying public documents
  • Item
    Detecção de posicionamento do cidadão em Projetos de Lei
    (Universidade Federal de Goiás, 2024-03-22) Maia, Dyonnatan Ferreira; Silva, Nádia Félix Felipe da; http://lattes.cnpq.br/7864834001694765; Silva, Nádia Félix Felipe da; Pereira, Fabíola Souza Fernande; Fernandes, Deborah Silva Alves
    Background: Comments on political projects on the internet reflect the aspirations of a significant portion of the population. The automatic stance detection of these comments regarding specific topics can help better understand public opinion. This study aims to develop a computational model with supervised learning capable of estimating the stance of comments on legislative propositions, considering the challenge of diversity and the constant emergence of new bills. Method: For the domain studied, a specific corpus was constructed by collecting comments from surveys available on the Chamber of Deputies website. The experiments included the evaluation of classic machine learning models, such as Logistic Regression, Naive Bayes, Support Vector Machine, Random Forest, and Multilayer Perceptron, in addition to the fine-tuning of BERT language models. Automatic data annotation was also performed using the zero-shot approach based on prompts from the generative GPT-3.5 model, aiming to overcome the difficulties related to human annotation and the scarcity of annotated data, generating approximately three times the size of the manually annotated corpus. Results: The results indicate that the adjusted BERTimbau model surpassed the classic approaches, achieving an average F1- score of 70.4% on unseen topics. Moreover, the application of automatically annotated data in the initial stage of BERTimbau fine-tuning resulted in performance improvement, reaching an F1-score of 73.3%. The results present deep learning models as options with positive performance for the task under the conditions of this domain. Conclusion: It was observed that the ability to generate contextualized representations, along with the number of topics and comments trained, can directly interfere with performance. This makes automatic annotation and the exploration of topic diversity with Transformer architectures, promising approaches for the task
  • Item
    A comparative study of text classification techniques for hate speech detection
    (Universidade Federal de Goiás, 2022-01-27) Silva, Rodolfo Costa Cezar da; Rosa, Thierson Couto; http://lattes.cnpq.br/4414718560764818; Rosa, Thierson Couto; Moura, Edleno Silva de; Silva, Nádia Félix Felipe da
    The dissemination of hate speech on the Internet, specially on social media platforms, has been a serious and recurrent problem. In the present study, we compare eleven methods for classifying hate speech, including traditional machine learning methods, neural network-based approaches and transformers, as well as their combination with eight techniques to address the class imbalance problem, which is a recurrent issue in hate speech classification. The data transformation techniques we investigated include data resampling techniques and a modification of a technique based on compound features (c_features).All models have been tested on seven datasets with varying specificity, following a rigorous experimentation protocol that includes cross-validation and the use of appropriate evaluation metrics, as well as validation of the results through appropriate statistical tests for multiple comparisons. To our knowledge, there is no broader comparative study in data enhancing techniques for hate speech detection, nor any work that combine data resampling techniques with transformers. Our extensive experimentation, based on over 2,900measurements, reveal that most data resampling techniques are ineffective to enhance the effectiveness of classifiers, with the exception of ROS which improves most classification methods, including the transformers. For the smallest dataset, ROS provided gains of 60.43% and 33.47% for BERT and RoBERTa, respectively. The experiments revealed that c_features improved all classification methods that they could be combined with. The compound features technique provided satisfactory gains of up to 7.8% for SVM. Finally,we investigate cost-effectiveness for a few of the best classification methods. This analysis provided confirmation that the traditional method Logistic Regression (LR) combined with the use of c_features can provide great effectiveness with low overhead in all datasets considered
  • Item
    Implementação de princípios de gamificação adaptativa em uma aplicação mHealth
    (Universidade Federal de Goiás, 2023-08-25) Anjos, Filipe Maciel de Souza dos; Carvalho, Sergio Teixeira de; http://lattes.cnpq.br/2721053239592051; Carvalho, Sergio Teixeira de; Mata, Luciana Regina Ferreira da; Berretta, Luciana de Oliveira
    This work describes the implementation of a gamified mHealth application called IUProst for the treatment of urinary incontinence through the performance of pelvic exercises for men who have undergone prostate removal surgery. The development of the application followed the guidelines of Framework L, designed to guide the creation of gamified mHealth applications. The initial version of IUProst was exclusively focused on the self-care dimension of Framework L and was released in November 2022. It was used by hundreds of users seeking the treatment provided by the application. Subsequently, the Gamification dimension of Framework L was employed to gamify IUProst. During the process of implementing game elements, it was noted that there were no clear definitions of how to implement the components to allow for gamification adaptation based on user profiles. To address this gap, an implementation model for gamification components was developed to guide developers in creating gamification that could adapt to the user profile dynamics proposed by the adaptive gamification of Framework L. Therefore, the contributions of this research include delivering a gamified mHealth application, analyzing usage data generated by the gamified application, and providing an implementation model for game components that were incorporated into Framework L, enabling the use of components in the context of adaptive gamification. The gamified version of IUProst was published in July 2023 and was used for 30 days until the writing of this dissertation. The results obtained demonstrate that during the gamified month, patients performed approximately 2/3 more exercises compared to the previous two months, reaching 61% of the total exercises performed during the three months analyzed. The data confirmed the hypothesis that game components indeed contribute to patient engagement with the application and also highlighted areas for improvement in the mHealth application.
  • Item
    Uma arquitetura para integração de dispositivos e coleta de dados em jogos sérios multimodais
    (Universidade Federal de Goiás, 2023-08-25) Zacca, Flavio Augusto Glapinski; Carvalho, Sérgio Teixeira de; http://lattes.cnpq.br/2721053239592051; Carvalho, Sérgio Teixeira De; Silvestre, Bruno Oliveira; Copetti, Alessandro; Zacca
    This dissertation addresses the development of an architecture for integrating devices and collecting data in multimodal serious games. However, integrating devices and collecting data in multimodal serious games present technical and scientific challenges to overcome. The research problem consists of identifying and analyzing difficulties in sharing information among heterogeneous devices, the absence of common communication protocols, limitations in utilizing data in other applications, and the need for an architecture enabling the collection, filtering, processing, and provisioning of treated data for serious games. The general objective of the research is to develop a solution allowing the integration of devices and data collection in multimodal serious games, aiming for efficient and transparent provisioning of treated data. The research is grounded in theoretical studies, analysis of related works, requirements gathering, and the implementation of this architecture. The research includes theoretical studies, analysis of related works, requirements gathering, and the implementation of this architecture. Practical studies and experiments were conducted to assess the efficiency and viability of the proposed architecture, using multimodal games developed by the research group, such as Salus Cyber Ludens and Cicloexergame, as use cases. The obtained results demonstrated the effectiveness of the architecture in handling and leveraging data from devices, contributing to the advancement of the field of multimodal serious games. In conclusion, the proposed architecture represents a promising solution for integrating devices and collecting data in multimodal serious games. Its application can benefit areas such as health, education, and industry, expanding interaction possibilities and promoting advancements in the field of multimodal serious games.
  • Item
    Estudo comparativo de comitês de sub-redes neurais para o problema de aprender a ranquear
    (Universidade Federal de Goiás, 2023-09-01) Ribeiro, Diogo de Freitas; Sousa, Daniel Xavier de; http://lattes.cnpq.br/4603724338719739; Rosa, Thierson Couto; http://lattes.cnpq.br/4414718560764818; Rosa, Thierson Couto; Sousa, Daniel Xavier de; Canuto, Sérgio Daniel Carvalho; Martins, Wellington Santos
    Learning to Rank (L2R) is a sub-area of Information Retrieval that aims to use machine learning to optimize the positioning of the most relevant documents in the answer ranking to a specific query. Until recently, the LambdaMART method, which corresponds to an ensemble of regression trees, was considered state-of-the-art in L2R. However, the introduction of AllRank, a deep learning method that incorporates self-attention mechanisms, has overtaken LambdaMART as the most effective approach for L2R tasks. This study, at issued, explored the effectiveness and efficiency of sub-networks ensemble as a complementary method to an already excellent idea, which is the self-attention used in AllRank, thus establishing a new level of innovation and effectiveness in the field of ranking. Different methods for forming sub-networks ensemble, such as MultiSample Dropout, Multi-Sample Dropout (Training and Testing), BatchEnsemble and Masksembles, were implemented and tested on two standard data collections: MSLRWEB10K and YAHOO!. The results of the experiments indicated that some of these ensemble approaches, specifically Masksembles and BatchEnsemble, outperformed the original AllRank in metrics such as NDCG@1, NDCG@5 and NDCG@10, although they were more costly in terms of training and testing time. In conclusion, the research reveals that the application of sub-networks ensemble in L2R models is a promising strategy, especially in scenarios where latency time is not critical. Thus, this work not only advances the state of the art in L2R, but also opens up new possibilities for improvements in effectiveness and efficiency, inspiring future research into the use of sub-networks ensemble in L2R.
  • Item
    Interpretabilidade de modelos de aprendizado de máquina: uma abordagem baseada em árvores de decisão
    (Universidade Federal de Goiás, 2023-09-22) Silva, Jurandir Junior de Deus da; Salvini, Rogerio Lopes; http://lattes.cnpq.br/5009392667450875; Salvini, Rogerio Lopes; Silva, Nadia Félix Felipe da; Alonso, Eduardo José Aguilar
    Interpretability is defined as the ability of a human to understand why an AI model makes certain decisions. Interpretability can be achieved through the use of interpretable models, such as linear regression and decision trees, and through model-agnostic interpretation methods, which treat any predictive model as a "black box". Another concept related to interpretability is that of Counterfactual Explanations, which show the minimal changes in inputs that would lead to different results, providing a deeper understanding of the model’s decisions. The approach proposed in this work exploits the explanatory power of Decision Trees to create a method that offers more concise explanations and counterfactual explanations. The results of the study indicate that Decision Trees not only explain the “why” of model decisions, but also show how different attribute values could result in alternative outputs.
  • Item
    Uma estratégia de pós-processamento para seleção de regras de associação para descoberta de conhecimento
    (Universidade Federal de Goiás, 2023-08-22) Cintra, Luiz Fernando da Cunha; Salvini, Rogerio Lopes; http://lattes.cnpq.br/5009392667450875; Salvini, Rogerio Lopes; Rosa, Thierson Couto; Aguilar Alonso, Eduardo José
    Association rule mining (ARM) is a traditional data mining method that provides information about associations between items in transactional databases. A known problem of ARM is the large amount of rules generated, thus requiring approaches to post-process these rules so that a human expert is able to analyze the associations found. In some contexts the domain expert is interested in investigating only one item of interest, in these cases a search guided by the item of interest can help to mitigate the problem. For an exploratory analysis, this implies looking for associations in which the item of interest appears in any part of the rule. Few methods focus on post-processing the generated rules targeting an item of interest. The present work seeks to highlight the relevant associations of a given item in order to bring knowledge about its role through its interactions and relationships in common with the other items. For this, this work proposes a post-processing strategy of association rules, which selects and groups rules oriented to a certain item of interest provided by an expert of a domain of knowledge. In addition, a graphical form is also presented so that the associations between rules and groupings of rules found are more easily visualized and interpreted. Four case studies show that the proposed method is admissible and manages to reduce the number of relevant rules to a manageable amount, allowing analysis by domain experts. Graphs showing the relationships between the groups were generated in all case studies and facilitate their analysis.
  • Item
    Secure D2Dcaching framework based on trust management and blockchain for mobile edge caching - a multi domain approach
    (Universidade Federal de Goiás, 2023-08-18) Rocha, Acquila Santos; Pinheiro, Billy Anderson; http://lattes.cnpq.br/1882589984835011; Borges, Vinicius da Cunha Martins; http://lattes.cnpq.br/6904676677900593; Borges, Vinicius da Cunha Martins; Pinheiro, Billy Anderson; Cordeiro, Weverton Luis da Costa; Carvalho, Sérgio Teixeira de
    Device–to-Device communication (D2D), combined with edge caching and mobile edge computing, is a promising approach that allows offloading data from the wireless mobile network. However, user security is still an open issue in D2D communication. Security vulnerabilities remain possible owing to easy, direct and spontaneous interactions between untrustworthy users and different degrees of mobility. This dissertation encompasses the designing of a multi-layer framework that combines diverse technologies inspired in blockchain to come up with a secure multi domain D2D caching framework. Regarding the intra-domain aspect we establish Secure D2D Caching framework inspired on trUst management and Blockchain (SeCDUB) to improve the security of D2D communication in video caching, through the combination of direct and indirect observations. In addition, blockchain concepts were adapted to the dynamic and restricted scenario of D2D networks to prevent data interception and alteration of indirect observations. This adaptation considered the development of a Clustering Approach (CA) that enables scalable and lightweight blockchain for D2D networks. Two different uncertainty mathematical models were used to infer direct and indirect trust values: Bayesian inference and the Theory Of Dempster Shafer (TDS) respectively. Regarding the inter-domain approach we developed Trust in Multiple Domains (TrustMD) framework. This approach combines edge trust storage with blockchain for distributed storage management in a multi layer architecture, designed to efficiently store trust control data in edge across different domains. Regarding the collected results, we performed simulations to test SecDUB’s intra-domain approach. The proposed clustering approach plays a key role to mitigate the SecDuB overhead as well as the consensus time. TrustMD results demonstrated a significant enhancement in goodput, reaching at best, 95% of the total network throughput, whicle SecDUB achieved approximately 80%. Even though there was a 7% increase in D2D overhead, TrustMD effectively keep control of latency levels, resulting in a slight decrease of 1.3 seconds. Hence, the achieved results indicates that TrustMD efficiently manages securitywithout compromising network performance reducing false negative rate up to 31% on the best case scenario. Actually, the combination of SecDUB and TrustMD offers a scalable and effective security solution that boosts network performance and ensures robust protection.
  • Item
    Classificação das despesas com pessoal no contexto dos Tribunais de Contas
    (Universidade Federal de Goiás, 2023-08-22) Teixeira, Pedro Henrique; Silva, Nadia Félix Felipe da; http://lattes.cnpq.br/7864834001694765; Salvini, Rogerio Lopes; http://lattes.cnpq.br/5009392667450875; Salvini, Rogerio Lopes; Silva, Nadia Félix Felipe da; Fernandes, Deborah Silva Alves; Costa, Nattane Luíza da
    The Court of Accounts of the Municipalities of the State of Goiás (TCMGO) uses the expenditure data received monthly from the municipalities of Goiás to check the expenditure related to personnel expenses, as determined by LRF. However, there are indications that the classification of expenses sent by the municipal manager may contain inconsistencies arising from fiscal tricks, creative accounting or material errors, leading TCMGO to make decisions based on incorrect reports, resulting in serious consequences for the inspection process. As a way of dealing with this problem, this work used text classification techniques to identify, based on the description of the expense and instead of the code provided by the municipality, the class of a personnel expense. For this, a corpus was built with 17,116 expense records labeled by domain experts, using binary and multi-class approaches. Data processing procedures were applied to extract attributes from the textual description, as well as assign numerical values to each instance of the data set with the TF-IDF algorithm. In the modeling stage, the algorithms Multinomial Naïve Bayes, Logistic Regression and Support Vector Machine (SVM) were used in supervised classification. SVM proved to be the best algorithm, with F-Score of 0.92 and 0.97, respectively, on the multi-class and binary corpus. However, it was found that the labeling process carried out by human experts is complex, time-consuming and expensive. Therefore, this work developed a method to classify personnel expenses using only 235 labeled samples, improved by unlabeled instances, based on the adaptation of the Self-Training algorithm, producing very promising results, with an average F-Score between 0.86 and 0.89.