Mestrado em Ciência da Computação (INF)

URI Permanente para esta coleção

Navegar

Submissões Recentes

Agora exibindo 1 - 20 de 308
  • Item
    Modelo de linguagem para o mercado de ações Brasileiro: Uma abordagem baseada em análise de sentimentos usando o modelo BERTimbau
    (Universidade Federal de Goiás, 2024-10-03) Araujo, Leandro dos Santos; Fernandes, Deborah Silva Alves; http://lattes.cnpq.br/0380764911708235; Fernandes, Deborah Silva Alves; Santos, Adam Dreyton Ferreira dos; Soares, Fabrízzio Alphonsus Alves de Melo Nunes
    Embargada.
  • Item
    Reconhecimento de entidades nomeadas em editais de licitação
    (Universidade Federal de Goiás, 2024-11-29) Souza Filho, Ricardo Pereira de; Silva, Nádia Félix Felipe da; http://lattes.cnpq.br/7864834001694765; Silva, Nádia Félix Felipe da; Fernandes, Deborah Silva Alves; Souza, Ellen Polliana Ramos
    This work explores the use of large language models (LLMs) for information extraction in public procurement notices, focusing on the Named Entity Recognition (NER) task. Given the diverse and unstandardized nature of these documents, the study proposes a methodology that integrates semantic selection techniques with Zero-Shot and Few-Shot scenarios, aiming to optimize the annotation and entity extraction process, reduce manual intervention, and improve accuracy. The first step involved building an annotated corpus containing named entities from pro-curement notices. Subsequently, the BERTimbau, BERTikal, and mDeBERTa models were trained in a supervised manner using this annotated dataset. Experiments showed that BERTimbau achieved the best overall performance, with an F1-score above 0.80. In the Zero-Shot and Few-Shot scenarios, various prompt templates and example selection strategies were tested. Models such as GPT-4 and LLaMA achieved performance compa-rable to supervised models when aided by semantically relevant examples, despite modest results in the absence of examples. The results indicate that combining enriched prompts with examples and the pre-selection of relevant sentences during the annotation phase contributes to greater accuracy and efficiency in the NER process for procurement notices. The proposed methodology can be applied to information extraction, with potential impacts on transparency and auditing in public procurement.
  • Item
    Avaliação de grandes modelos de linguagem na simplificação de texto de decisões jurídicas utilizando pontuações de legibilidade como alvo
    (Universidade Federal de Goiás, 2024-11-29) Paula, Antônio Flávio Castro Torres de; Camilo Junior, Celso Gonçalves; http://lattes.cnpq.br/6776569904919279; Camilo Júnior, Celso Gonçalves; Oliveira, Sávio Salvarino Teles de; Naves, Eduardo Lázaro Martins
    The complexity of language used in legal documents, such as technical terms and legal jargon, hinders access to and understanding of the Brazilian justice system for laypeo ple. This work presents text simplification approaches and assesses the state-of-the-art by considering large language models with readability scoring as a parameter for simplification. Due to limited resources for text simplification in Portuguese, especially within the legal domain, the application of a methodology based on text modification using readability scoring enables experiments that leverage the knowledge acquired during the training of these large language models, while also allowing for automatic evaluation without the need for labeled data. This study evaluates the simplification capabilities of large language models by using eleven models as case studies. Additionally, a real corpus was developed, based on legal rulings from the Brazilian justice system.
  • Item
    Avaliação de Grandes Modelos de Linguagem para Raciocínio em Direito Tributário
    (Universidade Federal de Goiás, 2024-11-22) Presa, João Paulo Cavalcante; Oliveira, Sávio Salvarino Teles de; http://lattes.cnpq.br/1905829499839846; Camilo Junior, Celso Gonçalves; http://lattes.cnpq.br/6776569904919279; Camilo Júnior, Celso Gonçalves; Oliveira, Sávio Salvarino Teles de; Silva , Nádia Felix Felipe da; Leite, Karla Tereza Figueiredo
    Tax law is essential for regulating relationships between the State and taxpayers, being crucial for tax collection and maintaining public functions. The complexity and constant evolution of tax laws make their interpretation an ongoing challenge for legal professionals. Although Natural Language Processing (NLP) has become a promising technology in the legal field, its application in brazilian tax law, especially for legal entities, remains a relatively unexplored area. This work evaluates the use of Large Language Models (LLMs) in Brazilian tax law covering federal tax aspects, analyzing their ability to process questions and generate answers in Portuguese for legal entities’ queries. For this purpose, we built an original dataset composed of real questions and answers provided by experts, allowing us to evaluate the ability of both proprietary and open-source LLMs to generate legally valid answers. The research uses quantitative and qualitative metrics to measure the accuracy and relevance of generated answers, capturing aspects of legal reasoning and semantic coherence. As contributions, this work presents a dataset specific to the tax law domain, a detailed evaluation of different LLMs’ performance in legal reasoning tasks, and an evaluation approach that combines quantitative and qualitative metrics, thus advancing the application of artificial intelligence in the analysis of tax laws and regulations.
  • Item
    Integração de uma aplicação de realidade aumentada com sistemas 5G seguindo o padrão 3GPP
    (Universidade Federal de Goiás, 2024-12-04) Cardoso, Pabllo Borges; Cardoso, Kleber Vieira; http://lattes.cnpq.br/0268732896111424; Corrêa, Sand Luz; http://lattes.cnpq.br/3386409577930822; Cardoso, Kleber Vieira; Freitas , Leandro Alexandre; Oliveira Junior, Antonio Carlos de
    Based on the standards defined by the 3rd Generation Partnership Project (3GPP), this work validates the 5G Media Streaming (5GMS) model, using the MR-Leo prototype as a case study. MR-Leo is a Mixed Reality (MR) application designed to explore the potential of these technologies in high-demand computational environments. The study begins with a review of advancements enabled by 5G networks, emphasizing their ability to provide low-latency connectivity, high bandwidth, and support for heterogeneous devices at scale. Additionally, the frameworks CAPIF and SEAL are discussed as tools to facilitate interoperability and API management in the 5G architecture, though recognized for their technical complexity and limited practical adoption. Edge computing is then investigated as a strategic component capable of bringing computational resources closer to end users, reducing latencies and enhancing the performance of intensive algorithms critical for MR applications. The validation of the proposed study was carried out in three distinct scenarios: a local controlled environment, an emulated 5G network, and a real 5G callbox. Experimental evaluation demonstrated the superiority of the protocol combined with video compression, achieving consistent metrics that meet the key performance indicators (KPIs) defined in the literature. The comparative qualitative analysis highlighted significant compatibilities as well as gaps, such as the absence of a functional component equivalent to the 5GMS Application Function (AF). In this regard, this work makes important contributions by demonstrating the technical feasibility of delivering MR services on 5G networks through edge computing.
  • Item
    Arquitetura holística de redes 6G: integração das camadas de comunicação espacial, aérea, terrestre, marinha e submarina com gêmeos digitais e inteligência artificial
    (Universidade Federal de Goiás, 2024-11-22) Araújo, Antonia Vanessa Dias; Oliveira Júnior, Antonio Carlos de; http://lattes.cnpq.br/3148813459575445; Oliveira Júnior, Antonio Carlos de; Moreira, Rodrigo; Freitas, Leandro Alexandre
    This dissertation proposes a holistic architecture for 6G networks, aiming at the integration of space, aerial, terrestrial, maritime, and submarine communication networks, targeting global and continuous connectivity. The integration of these networks, especially non-terrestrial networks (NTN), with terrestrial infrastructure presents significant technical and architectural challenges. The study focuses on modeling a unified architecture that fosters interaction between these network layers, with an emphasis on extreme and ubiquitous coverage. The methodology involves a detailed analysis of technological challenges and key enablers, such as digital twins, artificial intelligence (AI), and network orchestration, which facilitate the integration and efficient operation of 6G networks. The proposal is evaluated through simulations, highlighting the synergy between the different network components and their ability to provide ubiquitous and transparent communication to the user. It concludes that the proposed architecture provides a promising foundation for the implementation of innovative use cases, such as emergency communications, environmental monitoring, telemedicine, and smart agriculture, emphasizing the importance of extreme global coverage as one of the architectural cornerstones.
  • Item
    Avaliação de Grandes Modelos de Linguagem para Classificação de Documentos Jurídicos em Português
    (Universidade Federal de Goiás, 2024-11-26) Santos, Willgnner Ferreira; Oliveira, Sávio Salvarino Teles de; http://lattes.cnpq.br/1905829499839846; Galvão Filho, Arlindo Rodrigues; http://lattes.cnpq.br/7744765287200890; Galvão Filho, Arlindo Rodrigues; Oliveira, Sávio Salvarino Teles de; Fanucchi , Rodrigo Zempulski; Soares, Anderson da Silva
    The increasing procedural demand in judicial institutions has caused a workload overload, impacting the efficiency of the legal system. This scenario, exacerbated by limited human resources, highlights the need for technological solutions to streamline the processing and analysis of documents. In light of this reality, this work proposes a pipeline for automating the classification of these documents, evaluating four methods of representing legal texts at the pipeline’s input: original text, summaries, centroids, and document descriptions. The pipeline was developed and tested at the Public Defender’s Office of the State of Goiás (DPE-GO). Each approach implements a specific strategy to structure the input texts, aiming to enhance the models’ ability to interpret and classify legal documents. A new Portuguese dataset was introduced, specifically designed for this application, and the performance of Large Language Models (LLMs) was evaluated in classification tasks. The analysis results demonstrate that the use of summaries improves classification accuracy and maximizes the F1-score, optimizing the use of LLMs by reducing the number of tokens processed without compromising precision. These findings highlight the impact of textual representations of documents and the potential of LLMs for the automatic classification of legal documents, as in the case of DPE-GO. The contributions of this work indicate that the application of LLMs, combined with optimized textual representations, can significantly increase the productivity and quality of services provided by judicial institutions, promoting advancements in the overall efficiency of the legal system.
  • Item
    Em Busca do Estado da Arte e da Práca sobre Schema Matching na Indústria Brasileira - Resultados Preliminares de uma Revisão de Literatura e uma Pesquisa de Opinião
    (Universidade Federal de Goiás, 2024-09-02) Borges, Ricardo Henricki Dias; Ribeiro, Leonardo Andrade; http://lattes.cnpq.br/4036932351063584; Graciano Neto, Valdemar Vicente; http://lattes.cnpq.br/9864803557706493; Graciano Neto, Valdemar Vicente; Ribeiro, Leonardo Andrade; Frantz, Rafael Zancan; Galvao Filho, Arlindo Rodrigues
    The integration of systems and interoperability between different databases are critical challenges in information technology, mainly due to the diversity of data schemas. The schema matching technique is essential for unifying these schemas, facilitating research, analysis, and knowledge discovery. This dissertation investigates the application of schema matching in the Brazilian software industry, focusing on understanding the reasons for its low adoption. The research included a systematic mapping of the use of Artificial Intelligence (AI) algorithms and similarity techniques in schema matching, as well as a survey with 35 professionals in the field. The results indicate that, although schema matching offers significant improvements in data integration processes, such as reducing time and increasing accuracy, most professionals are unfamiliar with the term, even among those who use similar tools. The low adoption of these techniques can be attributed to the lack of free or open source tools and the absence of implementation plans within companies. The dissertation highlights the need for initiatives that overcome these barriers, empower professionals, and promote broader use of schema matching in the Brazilian industry.
  • Item
    Legal Domain Adaptation in Portuguese Language Models - Developing and Evaluating RoBERTa-based Models on Legal Corpora
    (Universidade Federal de Goiás, 2024-05-28) Garcia, Eduardo Augusto Santos; Lima, Eliomar Araújo de; http://lattes.cnpq.br/1362170231777201; Silva, Nádia Félix Felipe da; http://lattes.cnpq.br/7864834001694765; Silva, Nádia Félix Felipe da; Lima, Eliomar Araújo de; Soares, Anderson da Silva; Placca, José Avelino
    This research investigates the application of Natural Language Processing (NLP) within the legal domain for the Portuguese language, emphasizing the importance of domain adaptation for pre-trained language models, such as RoBERTa, using specialized legal corpora. We compiled and pre-processed a Portuguese legal corpus, named LegalPT, addressing the challenges of high near-duplicate document rates in legal corpora and conducting a comparison with generic web-scraped corpora. Experiments with these corpora revealed that pre-training on a combined dataset of legal and general data resulted in a more effective model for legal tasks. Our model, called RoBERTaLexPT, outperformed larger models trained solely on generic corpora, such as BERTimbau and Albertina-PT-*, and other legal models from similar works. For evaluating the performance of these models, we propose in this Master’s dissertation a legal benchmark composed of several datasets, including LeNER-Br, RRI, FGV, UlyssesNER-Br, CEIAEntidades, and CEIA-Frases. This study contributes to the improvement of NLP solutions in the Brazilian legal context by openly providing enhanced models, a specialized corpus, and a rigorous benchmark suite.
  • Item
    Análise de um Fluxo Completo Automatizado de Etapas Voltado ao Reconhecimento de Texto em Imagens de Prescrições Médicas Manuscritas
    (Universidade Federal de Goiás, 2024-01-10) Corrêa, André Pires; Lima, Eliomar Araújo de; http://lattes.cnpq.br/1362170231777201; Nascimento, Hugo Alexandre Dantas do; http://lattes.cnpq.br/2920005922426876; Nascimento, Hugo Alexandre Dantas do; Costa, Ronaldo Martins da; Pedrini, Hélio; Lima, Eliomar Araújo de
    Compounding pharmacies deal with large volumes of medical prescriptions on a daily basis, whose data needs to be manually inputted into information management systems to properly process their customers’ orders. A considerable portion of these prescriptions tend to be written by doctors with poorly legible handwriting, which can make decoding them an arduous and time-consuming process. Previous works have investigated the use of machine learning for medical prescription recognition. However, the accuracy rates in these works are still fairly low and their approaches tend to be rather limited, as they typically utilize small datasets, focus only on specific steps of the automated analysis pipeline or use proprietary tools, which makes it difficult to replicate and analyse their results. The present work contributes towards filling this gap by presenting an end-toend process for automated data extraction from handwritten medical prescriptions, from text segmentation, to recognition and post-processing. The approach was built based on an evaluation and adaptation of multiple existing methods for each step of the pipeline. The methods were evaluated on a dataset of 993 images of medical prescriptions with 27,933 annotated words, produced with the support of a compounding pharmacy that participated in the project. The results obtained by the best performing methods indicate that the developed approach is reasonably effective, reaching an accuracy of 68% in the segmentation step, and a character accuracy rate of 86.8% in the text recognition step.
  • Item
    Análise de técnicas de ajuste fino em classificação de texto
    (Universidade Federal de Goiás, 2024-09-23) Pires, Tobias Gonçalves; Soares, Anderson da Silva; Soares, Anderson da Silva; Fanucchi, Rodrigo Zempulski; Galvão Filho, Arlindo Rodrigues
    Natural Language Processing (NLP) aims to develop models that enable computers to understand, interpret, process and generate text in a way similar to human communication. The last decade has seen significant advances in the field, with the introduction of deep neural network models, and the subsequent evolution of the architecture of these models such as the attention mechanism and the Transformers architecture, culminating in language models such as ELMo, BERT and GPT. And later models called Large Language Models (LLMs) improved the ability to understand and generate texts in a sophisticated way. Pre-trained models offer the advantage of reusing knowledge accumulated from vast datasets, although specific fine-tuning is required for individual tasks. However, training and tuning these models consumes a lot of processing resources, making it unfeasible for many organizations due to high costs. In resource-constrained environments, efficient fine-tuning techniques such as LoRA (Low-Rank Adaptation) were developed to optimize the model adaptation process, minimizing the number of adjustable parameters and avoiding overfitting. These techniques allow for faster and more economical training, while maintaining the robustness and generalization of the models. This work evaluates three efficient fine-tuning techniques LoRA, AdaLoRA and IA3 (in addition to full fine-tuning) in terms of memory consumption, training time and accuracy, using the DistilBERT, Roberta-base and TinyLlama models on different datasets (AG News, IMDb and SNLI).
  • Item
    Transcrição automática de sons polifônicos de guitarra na notação de tablaturas utilizando classificação temporal conexionista
    (Universidade Federal de Goiás, 2024-09-23) Gris, Lucas Rafael Stefanel; Soares, Anderson da Silva; http://lattes.cnpq.br/1096941114079527; Soares, Anderson da Silva; Laureano, Gustavo Teodoro; Barbosa, Yuri de Almeida Malheiros
    Automatic Guitar Transcription, a branch of Automatic Musical Transcription, is a task with great applicability for musicians of fretted instruments such as the electric guitar and acoustic guitar. Often, musicians on these instruments transcribe or read songs and musical pieces in tablature format, a notation widely used for this type of instrument. Despite its relevance, this annotation is still done manually, making it a very complex process, even for experienced musicians. In this context, this work proposes the use of artificial intelligence to develop models capable of performing the task of transcribing polyphonic guitar sounds automatically. In particular, this work investigates the use of a specific method called Connectionist Temporal Classification (CTC), an algorithm that can be used to train sequence classification models without the need for alignment, a fundamental aspect for training more robust models, as there are few openly available datasets. Additionally, this work investigates multi-task learning for note prediction alongside tablature prediction, achieving significant improvements over conventional learning. Overall, the results indicate that the use of CTC is very promising for tablature transcription, showing only a 14.28% relative decrease compared to the result obtained with aligned data.
  • Item
    Uma abordagem de alinhamento de minimapa usando informações de observação de pontos intrínsecos aos sistemas de SLAM Visual baseado em características
    (Universidade Federal de Goiás, 2024-10-04) Saraiva, Felipe Pires; Laureano, Gustavo Teodoro; http://lattes.cnpq.br/4418446095942420; Laureano, Gustavo Teodoro; Tarallo, André de Souza; Costa, Ronaldo Martins da
    This work proposes a map alignment approach based on a modification of the Iterative Closest Point (ICP) algorithm to consider point estimation confidence metrics already available in feature-based Visual SLAM systems. Mini-map alignment, in the context of a hierarchical map composed of several local representations of the environment, is an important task to allow the relationship of metric information between them, usually performed through the registration of the point clouds of each map. ICP is a widely used method in the literature for point cloud registration, but in the originally proposed form it does not consider the uncertainty of the points, and can be sensitive to noise, outliers and the initial estimate of the transformation. Feature-based Visual SLAM methods produce information intrinsic to the way they are modeled, which can represent the confidence of the map points and can be used to improve the alignment process. This research enumerates three possible SLAM metrics that can be used to represent the confidence of a map landmark, and investigates the potential of using these metrics to improve the ICP algorithm. The confidence metrics are incorporated into the ICP through a simple change in the correspondence estimation step to find the point with the highest confidence in a neighborhood of k nearest points. Experiments are conducted in different cases of initial misalignment to evaluate the influence of the confidence information suggested in this work, comparing the error of the final alignment of the point clouds and the number of iterations to achieve this alignment. The results show evidence that the use of confidence can help to improve the convergence of the ICP, both in the error of the final configuration and in the number of iterations required to achieve it.
  • Item
    Integração do modelo de referência Multi-Access Edge Computing (MEC) com o Núcleo 5G
    (Universidade Federal de Goiás, 2022-10-21) Xavier, Rúben França; Freitas, Leandro Alexandre; http://lattes.cnpq.br/7450982711522425; Oliveira Junior, Antonio Carlos de; http://lattes.cnpq.br/3148813459575445; Oliveira Junior, Antonio Carlos de; Freitas, Leandro Alexandre; Venâncio Neto, Augusto José; Figueiredo, Fabricio Lira
    Multi-access Edge Computing (MEC) is the key concept for developing new applications and services that bring the benefits of edge computing to networks and users. With applications and services at the edge, that is, closer to users and devices, it will be possible to use features such as ultra low latency, high bandwidth and resource consumption. However, for this to be possible, it is necessary to integrate the MEC framework and the 5G core. For this, this work proposes the specification of a service that will extend the Multi-Access Traffic Steering (MTS) API as a bridge for the connection between MEC and 5G Core. Free5GC and a Kubernetes cluster were used to simulate an end-to-end environment. The proposed service was validated using the aforementioned tool, with end-to-end scenarios and also in scenarios with a large volume of users. The validation demonstrated that the service solves the presented problem, in addition to demonstrating its viability in use cases.
  • Item
    Um modelo de alocação de recursos de rede e cache na borda para vídeo em fluxo contínuo armazenado e 360º
    (Universidade Federal de Goiás, 2024-09-04) Oliveira, Gustavo Dias de; Cardoso, Kleber Vieira; http://lattes.cnpq.br/0268732896111424; Correa, Sand Luz; http://lattes.cnpq.br/3386409577930822; Cardoso, Kleber Vieira; Cerqueira, Eduardo Coelho; Oliveira Júnior, Antonio Carlos de
    The advancement of immersive technologies, such as Augmented Reality (AR) and Virtual Reality (VR), has introduced significant challenges in the transmission of 360-degree videos, due to the increasing bandwidth and low latency requirements resulting from the large size of video frames used in these technologies. At the same time, video streaming consumption has grown exponentially, driven by technological advances and the widespread use of Internet-connected devices. Efficient transmission of 360-degree videos faces challenges such as the need for up to five times more bandwidth than that required for conventional vídeo high-definition transmissions, as well as stricter latency constraints. Strategies such as video projection slicing and transmitting only the user’s field of view, along with efficient network resource allocation, have been explored to overcome these limitations. To address these challenges, we propose DTMCash, which stands out by using dynamic tiles and combining users’ viewports, effectively tackling transmission in multi-user scenarios. The goal of this work is to develop a model for network and edge cache resource allocation for 360-degree video transmission, focusing on the optimization of these resources. To validate the proposed model, we initially conducted comparative experiments with 6 users, later expanding to 30 users. We also tested performance with different cache sizes and experiments varying user entry times, in addition to evaluating the transmission of different video content. Compared to a state-of-the-art solution, our proposal reduced the aggregate bandwidth consumption of the Internet link by at least 48.2%, while maintaining the same consumption on the wireless link and providing greater efficiency in cache usage
  • Item
    Um catálogo de padrões de requisitos de privacidade baseado na lei geral de proteção de dados pessoais
    (Universidade Federal de Goiás, 2024-03-18) Carneiro, Cinara Gomes de Melo; Kudo, Taciana Novo; http://lattes.cnpq.br/7044035224784132; Bulcão Neto, Renato de Freitas; http://lattes.cnpq.br/5627556088346425; Bulcão Neto , Renato de Freitas; Vincenzi , Auri Marcelo Rizzo; Alencar, Wanderley de Souza
    [Context] Currently, Brazilian companies are concerned about protecting the personal data of their customers and employees to ensure the privacy of these individuals. This concern arises from the fact that personal data protection is an obligation imposed by the General Data Protection Law (LGPD). Since most organizations store this data digitally to carry out various operations, software must comply with the current legislation. [Problem] According to recent research, a large portion of professionals in the software industry do not have comprehensive knowledge of privacy requirements or the LGPD. [Objective] The objective of this work is to construct and evaluate a Catalog of Privacy Requirement Patterns (CPRP) based on the LGPD. [Method] A method for syntactic analysis of the articles composing the LGPD was defined to extract privacy requirements. These requirements were converted into requirement patterns (RP) using a method for constructing RP catalogs based on the grammar of the Software Pattern Metamodel (SoPaMM), with the support of the Terminal Model Editor (TMed) tool. Finally, two experts in LGPD and Software Engineering evaluated the completeness and correctness of the developed catalog concerning the legislation. [Contributions] The conversion of legal requirements into privacy RPs can assist professionals in eliciting and specifying requirements, as privacy requirements can be reused in various contexts with minor or any modifications.
  • Item
    Classificação de documentos da administração pública utilizando inteligência artificial
    (Universidade Federal de Goiás, 2024-04-30) Carvalho, Rogerio Rodrigues; Costa, Ronaldo Martins da; http://lattes.cnpq.br/7080590204832262; Costa, Ronaldo Martins da; Souza, Rodrigo Gonçalves de; Silva, Nádia Félix Felipe da
    Public organizations face difficulties in classifying and promoting transparency of the numerous documents produced during the execution of their activities. Correct classification of documents is critical to prevent public access to sensitive information and protect individuals and organizations from malicious use. This work proposes two approachs to perform the task of classifying sensitive documents, using state-of-the-art artificial intelligence techniques and best practices found in the literature: a conventional method, which uses artificial intelligence techniques and regular expressions to analyze the textual content of documents, and an alternative method, which employs the CBIR technique to classify documents when text extraction is not viable. Using real data from the Electronic Information System (SEI) of the Federal University of Goiás (UFG), the results achieved demonstrated that the application of regular expressions as a preliminary check can improve the computational efficiency of the classification process, despite showing a modest increase in classification precision. The conventional method proved to be effective in document classification, with the BERT model standing out for its performance with an accuracy rate of 94%. The alternative method, in turn, offered a viable solution for challenging scenarios, showing promising results with an accuracy rate of 87% in classifying public documents
  • Item
    Detecção de posicionamento do cidadão em Projetos de Lei
    (Universidade Federal de Goiás, 2024-03-22) Maia, Dyonnatan Ferreira; Silva, Nádia Félix Felipe da; http://lattes.cnpq.br/7864834001694765; Silva, Nádia Félix Felipe da; Pereira, Fabíola Souza Fernande; Fernandes, Deborah Silva Alves
    Background: Comments on political projects on the internet reflect the aspirations of a significant portion of the population. The automatic stance detection of these comments regarding specific topics can help better understand public opinion. This study aims to develop a computational model with supervised learning capable of estimating the stance of comments on legislative propositions, considering the challenge of diversity and the constant emergence of new bills. Method: For the domain studied, a specific corpus was constructed by collecting comments from surveys available on the Chamber of Deputies website. The experiments included the evaluation of classic machine learning models, such as Logistic Regression, Naive Bayes, Support Vector Machine, Random Forest, and Multilayer Perceptron, in addition to the fine-tuning of BERT language models. Automatic data annotation was also performed using the zero-shot approach based on prompts from the generative GPT-3.5 model, aiming to overcome the difficulties related to human annotation and the scarcity of annotated data, generating approximately three times the size of the manually annotated corpus. Results: The results indicate that the adjusted BERTimbau model surpassed the classic approaches, achieving an average F1- score of 70.4% on unseen topics. Moreover, the application of automatically annotated data in the initial stage of BERTimbau fine-tuning resulted in performance improvement, reaching an F1-score of 73.3%. The results present deep learning models as options with positive performance for the task under the conditions of this domain. Conclusion: It was observed that the ability to generate contextualized representations, along with the number of topics and comments trained, can directly interfere with performance. This makes automatic annotation and the exploration of topic diversity with Transformer architectures, promising approaches for the task
  • Item
    A comparative study of text classification techniques for hate speech detection
    (Universidade Federal de Goiás, 2022-01-27) Silva, Rodolfo Costa Cezar da; Rosa, Thierson Couto; http://lattes.cnpq.br/4414718560764818; Rosa, Thierson Couto; Moura, Edleno Silva de; Silva, Nádia Félix Felipe da
    The dissemination of hate speech on the Internet, specially on social media platforms, has been a serious and recurrent problem. In the present study, we compare eleven methods for classifying hate speech, including traditional machine learning methods, neural network-based approaches and transformers, as well as their combination with eight techniques to address the class imbalance problem, which is a recurrent issue in hate speech classification. The data transformation techniques we investigated include data resampling techniques and a modification of a technique based on compound features (c_features).All models have been tested on seven datasets with varying specificity, following a rigorous experimentation protocol that includes cross-validation and the use of appropriate evaluation metrics, as well as validation of the results through appropriate statistical tests for multiple comparisons. To our knowledge, there is no broader comparative study in data enhancing techniques for hate speech detection, nor any work that combine data resampling techniques with transformers. Our extensive experimentation, based on over 2,900measurements, reveal that most data resampling techniques are ineffective to enhance the effectiveness of classifiers, with the exception of ROS which improves most classification methods, including the transformers. For the smallest dataset, ROS provided gains of 60.43% and 33.47% for BERT and RoBERTa, respectively. The experiments revealed that c_features improved all classification methods that they could be combined with. The compound features technique provided satisfactory gains of up to 7.8% for SVM. Finally,we investigate cost-effectiveness for a few of the best classification methods. This analysis provided confirmation that the traditional method Logistic Regression (LR) combined with the use of c_features can provide great effectiveness with low overhead in all datasets considered
  • Item
    Implementação de princípios de gamificação adaptativa em uma aplicação mHealth
    (Universidade Federal de Goiás, 2023-08-25) Anjos, Filipe Maciel de Souza dos; Carvalho, Sergio Teixeira de; http://lattes.cnpq.br/2721053239592051; Carvalho, Sergio Teixeira de; Mata, Luciana Regina Ferreira da; Berretta, Luciana de Oliveira
    This work describes the implementation of a gamified mHealth application called IUProst for the treatment of urinary incontinence through the performance of pelvic exercises for men who have undergone prostate removal surgery. The development of the application followed the guidelines of Framework L, designed to guide the creation of gamified mHealth applications. The initial version of IUProst was exclusively focused on the self-care dimension of Framework L and was released in November 2022. It was used by hundreds of users seeking the treatment provided by the application. Subsequently, the Gamification dimension of Framework L was employed to gamify IUProst. During the process of implementing game elements, it was noted that there were no clear definitions of how to implement the components to allow for gamification adaptation based on user profiles. To address this gap, an implementation model for gamification components was developed to guide developers in creating gamification that could adapt to the user profile dynamics proposed by the adaptive gamification of Framework L. Therefore, the contributions of this research include delivering a gamified mHealth application, analyzing usage data generated by the gamified application, and providing an implementation model for game components that were incorporated into Framework L, enabling the use of components in the context of adaptive gamification. The gamified version of IUProst was published in July 2023 and was used for 30 days until the writing of this dissertation. The results obtained demonstrate that during the gamified month, patients performed approximately 2/3 more exercises compared to the previous two months, reaching 61% of the total exercises performed during the three months analyzed. The data confirmed the hypothesis that game components indeed contribute to patient engagement with the application and also highlighted areas for improvement in the mHealth application.