INF - Instituto de Informática

URI Permanente desta comunidade

http://200.137.215.59/tede/handle/tde/145

Navegar

Agora exibindo 1 - 20 de 21

Aplicação de técnicas de visualização de informações para os problemas de agendamento de horários educacionais
(Universidade Federal de Goiás, 2023-10-20) Alencar, Wanderley de Souza; Jradi, Walid Abdala Rfaei; http://lattes.cnpq.br/6868170610194494; Nascimento, Hugo Alexandre Dantas do; http://lattes.cnpq.br/2920005922426876; Nascimento, Hugo Alexandre Dantas do; Jradi, Walid Abdala Rfaei; Bueno, Elivelton Ferreira; Gondim, Halley Wesley Alexandre Silva; Carvalho, Cedric Luiz de
An important category, or class, of combinatorial optimization problems is called Educational Timetabling Problems (Ed-TTPs). Broadly, this category includes problems in which it is necessary to allocate teachers, subjects (lectures) and, eventually, rooms in order to build a timetable, of classes or examinations, to be used in a certain academic period in an educational institution (school, college, university, etc.). The timetable to be prepared must observe a set of constraints in order to satisfy, as much as possible, a set of desirable goals. The current research proposes the use of methods and/or techniques from the Information Visualization (IV) area to, in an interactive approach, help a better understanding and resolution, by non-technical users, of problem instances in the scope of their educational institutions. In the proposed approach, human actions and others performed by a computational system interact in a symbiotic way targeting the problem resolution, with the interaction carried out through a graphical user interface that implements ideas originating from the User Hints framework [Nas03]. Among the main contributions achieved are: (1) recognition, and characterization, of the most used techniques for the presentation and/or visualization of Ed-TTPs solutions; (2) conception of a mathematical notation to formalize the problem specification, including the introduction of a new idea called flexibility applied to the entities involved in the timetable; (3) proposition of visualizations able to contribute to a better understanding of a problem instance; (4) make available a computational tool that provides interactive resolution of Ed-TTPs, together with a specific entity-relationship model for this kind of problem; and, finally, (5) the proposal of a methodology to evaluate visualizations applied to the problem in focus.
Preditor híbrido de estruturas terciárias de proteínas
(Universidade Federal de Goiás, 2023-08-10) Almeida, Alexandre Barbosa de; Soares, Telma Woerle de Lima; http://lattes.cnpq.br/6296363436468330; Soares , Telma Woerle de Lima; Camilo Junior , Celso Gonoalves; Vieira, Flávio Henrique Teles; Delbem, Alexandre Cláudio Botazzo; Faccioli, Rodrigo Antônio
Proteins are organic molecules composed of chains of amino acids and play a variety of essential biological functions in the body. The native structure of a protein is the result of the folding process of its amino acids, with their spatial orientation primarily determined by two dihedral angles (φ, ψ). This work proposes a new hybrid method for predicting the tertiary structures of proteins called hyPROT, combining techniques of Multi-objective Evolutionary Algorithm optimization (MOEA), Molecular Dynamics, and Recurrent Neural Networks (RNNs). The proposed approach investigates the evolutionary profile of dihedral angles (φ, ψ) obtained by different MOEAs during the minimization process of the objective function by dominance and energy minimization by molecular dynamics. This proposal is unprecedented in the protein prediction literature. The premise under investigation is that the evolutionary profile of dihedrals may be concealing relevant patterns about folding mechanisms. To analyze the evolutionary profile of angles (φ, ψ), RNNs were used to abstract and generalize the specific biases of each MOEA. The selected MOEAs were NSGAII, BRKGA, and GDE3, and the objective function investigated combines the potential energy from non-covalent interactions and the solvation energy. The results obtained show that the hyPROT was able to reduce the RMSD value of the best prediction generated by the MOEAs individually by at least 33%. Predicting new series for dihedral angles allowed for the formation of histograms, indicating the formation of a possible statistical ensemble responsible for the distribution of dihedrals (φ, ψ) during the folding process
Alocação de recursos e posicionamento de funções virtualizadas em redes de acesso por rádio desagregadas
(Universidade Federal de Goiás, 2023-08-30) Almeida, Gabriel Matheus Faria de; Pinto, Leizer de Lima; http://lattes.cnpq.br/0611031507120144; Cardoso, Kleber Vieira; http://lattes.cnpq.br/0268732896111424; Cardoso, Kleber Vieira; Pinto, Leizer de Lima; Klautau Júnior, Aldebaro Barreto da Rocha; Silva, Luiz Antonio Pereira da
Jointly choosing a functional split of the protocol stack and placement of network functions in a virtualized RAN is critical to efficiently using the access network resources. This problem represents a current research topic in 5G and Post-5G networks, which involves the challenge of simultaneously choosing the placement of virtualized functions, the routes for traffic and the management of available computing resources. In this work, we present three approaches to solve this problem considering the planning scenario and two approaches considering the network operation scenario. The first result is a Mixed Integer Linear Programming (MILP) model, considering a generic set of processing nodes and multipath routing. The second approach uses artificial intelligence and machine learning concepts, in which we formulate a deep reinforcement learning agent. The third approach used is based on search meta-heuristics, through a genetic algorithm. The last two approaches are Markov Decision Process (MDP) formulations that consider dynamic demand on radio units. In all formulations, the objective is to maximize the network function’s centralization while minimizing positioning cost. Analysis of the solutions and comparison of their results show that exact approaches such as MILP naturally provide the best solution. However, in terms of efficiency, the genetic algorithm has the best search time, finding a high quality solution in a few seconds. The deep reinforcement learning agent presents a high convergence, finding high quality solutions for the problem and showing problem generalization capacity with different topologies. Finally, the formulations considering the network operation scenario with dynamic demand are highly complex due to the size of the action space
Implementação de princípios de gamificação adaptativa em uma aplicação mHealth
(Universidade Federal de Goiás, 2023-08-25) Anjos, Filipe Maciel de Souza dos; Carvalho, Sergio Teixeira de; http://lattes.cnpq.br/2721053239592051; Carvalho, Sergio Teixeira de; Mata, Luciana Regina Ferreira da; Berretta, Luciana de Oliveira
This work describes the implementation of a gamified mHealth application called IUProst for the treatment of urinary incontinence through the performance of pelvic exercises for men who have undergone prostate removal surgery. The development of the application followed the guidelines of Framework L, designed to guide the creation of gamified mHealth applications. The initial version of IUProst was exclusively focused on the self-care dimension of Framework L and was released in November 2022. It was used by hundreds of users seeking the treatment provided by the application. Subsequently, the Gamification dimension of Framework L was employed to gamify IUProst. During the process of implementing game elements, it was noted that there were no clear definitions of how to implement the components to allow for gamification adaptation based on user profiles. To address this gap, an implementation model for gamification components was developed to guide developers in creating gamification that could adapt to the user profile dynamics proposed by the adaptive gamification of Framework L. Therefore, the contributions of this research include delivering a gamified mHealth application, analyzing usage data generated by the gamified application, and providing an implementation model for game components that were incorporated into Framework L, enabling the use of components in the context of adaptive gamification. The gamified version of IUProst was published in July 2023 and was used for 30 days until the writing of this dissertation. The results obtained demonstrate that during the gamified month, patients performed approximately 2/3 more exercises compared to the previous two months, reaching 61% of the total exercises performed during the three months analyzed. The data confirmed the hypothesis that game components indeed contribute to patient engagement with the application and also highlighted areas for improvement in the mHealth application.
Future-Shot: Few-Shot Learning to tackle new labels on high-dimensional classification problems
(Universidade Federal de Goiás, 2024-02-23) Camargo, Fernando Henrique Fernandes de; Soares, Anderson da Silva; http://lattes.cnpq.br/1096941114079527; Soares, Anderson da Silva; Galvão Filho, Arlindo Rodrigues; Vieira, Flávio Henrique Teles; Gomes, Herman Martins; Lotufo, Roberto de Alencar
This thesis introduces a novel approach to address high-dimensional multiclass classification challenges, particularly in dynamic environments where new classes emerge. Named Future-Shot, the method employs metric learning, specifically triplet learning, to train a model capable of generating embeddings for both data points and classes within a shared vector space. This facilitates efficient similarity comparisons using techniques like k-nearest neighbors (\acrshort{knn}), enabling seamless integration of new classes without extensive retraining. Tested on lab-of-origin prediction tasks using the Addgene dataset, Future-Shot achieves top-10 accuracy of $90.39\%$, surpassing existing methods. Notably, in few-shot learning scenarios, it achieves an average top-10 accuracy of $81.2\%$ with just $30\%$ of the data for new classes, demonstrating robustness and efficiency in adapting to evolving class structures
Classificação de documentos da administração pública utilizando inteligência artificial
(Universidade Federal de Goiás, 2024-04-30) Carvalho, Rogerio Rodrigues; Costa, Ronaldo Martins da; http://lattes.cnpq.br/7080590204832262; Costa, Ronaldo Martins da; Souza, Rodrigo Gonçalves de; Silva, Nádia Félix Felipe da
Public organizations face difficulties in classifying and promoting transparency of the numerous documents produced during the execution of their activities. Correct classification of documents is critical to prevent public access to sensitive information and protect individuals and organizations from malicious use. This work proposes two approachs to perform the task of classifying sensitive documents, using state-of-the-art artificial intelligence techniques and best practices found in the literature: a conventional method, which uses artificial intelligence techniques and regular expressions to analyze the textual content of documents, and an alternative method, which employs the CBIR technique to classify documents when text extraction is not viable. Using real data from the Electronic Information System (SEI) of the Federal University of Goiás (UFG), the results achieved demonstrated that the application of regular expressions as a preliminary check can improve the computational efficiency of the classification process, despite showing a modest increase in classification precision. The conventional method proved to be effective in document classification, with the BERT model standing out for its performance with an accuracy rate of 94%. The alternative method, in turn, offered a viable solution for challenging scenarios, showing promising results with an accuracy rate of 87% in classifying public documents
Uma estratégia de pós-processamento para seleção de regras de associação para descoberta de conhecimento
(Universidade Federal de Goiás, 2023-08-22) Cintra, Luiz Fernando da Cunha; Salvini, Rogerio Lopes; http://lattes.cnpq.br/5009392667450875; Salvini, Rogerio Lopes; Rosa, Thierson Couto; Aguilar Alonso, Eduardo José
Association rule mining (ARM) is a traditional data mining method that provides information about associations between items in transactional databases. A known problem of ARM is the large amount of rules generated, thus requiring approaches to post-process these rules so that a human expert is able to analyze the associations found. In some contexts the domain expert is interested in investigating only one item of interest, in these cases a search guided by the item of interest can help to mitigate the problem. For an exploratory analysis, this implies looking for associations in which the item of interest appears in any part of the rule. Few methods focus on post-processing the generated rules targeting an item of interest. The present work seeks to highlight the relevant associations of a given item in order to bring knowledge about its role through its interactions and relationships in common with the other items. For this, this work proposes a post-processing strategy of association rules, which selects and groups rules oriented to a certain item of interest provided by an expert of a domain of knowledge. In addition, a graphical form is also presented so that the associations between rules and groupings of rules found are more easily visualized and interpreted. Four case studies show that the proposed method is admissible and manages to reduce the number of relevant rules to a manageable amount, allowing analysis by domain experts. Graphs showing the relationships between the groups were generated in all case studies and facilitate their analysis.
Análise multirresolução de imagens gigapixel para detecção de faces e pedestres
(Universidade Federal de Goiás, 2023-09-27) Ferreira, Cristiane Bastos Rocha; Pedrini, Hélio; http://lattes.cnpq.br/9600140904712115; Soares, Fabrízzio Alphonsus Alves de Melo Nunes; http://lattes.cnpq.br/7206645857721831; Soares, Fabrízzio Alphonsus Alves de Melo Nunes; Pedrini, Helio; Santos, Edimilson Batista dos; Borges, Díbio Leandro; Fernandes, Deborah Silva Alves
Gigapixel images, also known as gigaimages, can be formed by merging a sequence of individual images obtained from a scene scanning process. Such images can be understood as a mosaic construction based on a large number of high resolution digital images. A gigapixel image provides a powerful way to observe minimal details that are very far from the observer, allowing the development of research in many areas such as pedestrian detection, surveillance, security, and so forth. As this image category has a high volume of data captured in a sequential way, its generation is associated with many problems caused by the process of generating and analyzing them, thus, applying conventional algorithms designed for non-gigapixel images in a direct way can become unfeasible in this context. Thus, this work proposes a method for scanning, manipulating and analyzing multiresolution Gigapixel images for pedestrian and face identification applications using traditional algorithms. This approach is analyzed using both Gigapixel images with low and high density of people and faces, presenting promising results.
Reconhecimento de padrões em imagens radiográficas de tórax: apoiando o diagnóstico de doenças pulmonares infecciosas
(Universidade Federal de Goiás, 2023-09-29) Fonseca, Afonso Ueslei da; Soares, Fabrízzio Alphonsus Alves de Melo Nunes; http://lattes.cnpq.br/7206645857721831; Soares, Fabrízzio Alphonsus Alves de Melo Nunes; Laureano, Gustavo Teodoro; Pedrini, Hélio; Rabahi, Marcelo Fouad; Salvini, Rogerio Lopes
Pattern Recognition (PR) is a field of computer science that aims to develop techniques and algorithms capable of identifying regularities in complex data, enabling intelligent systems to perform complicated tasks with precision. In the context of diseases, PR plays a crucial role in diagnosis and detection, revealing patterns hidden from human eyes, assisting doctors in making decisions and identifying correlations. Infectious pulmonary diseases (IPD), such as pneumonia, tuberculosis, and COVID-19, challenge global public health, causing thousands of deaths annually, affecting healthcare systems, and demanding substantial financial resources. Diagnosing them can be challenging due to the vagueness of symptoms, similarities with other conditions, and subjectivity in clinical assessment. For instance, chest X-ray (CXR) examinations are a tedious and specialized process with significant variation among observers, leading to failures and delays in diagnosis and treatment, especially in underdeveloped countries with a scarcity of radiologists. In this thesis, we investigate PR and Artificial Intelligence (AI) techniques to support the diagnosis of IPID in CXRs. We follow the guidelines of the World Health Organization (WHO) to support the goals of the 2030 Agenda, which includes combating infectious diseases. The research questions involve selecting the best techniques, acquiring data, and creating intelligent models. As objectives, we propose low-cost, high-efficiency, and effective PR and AI methods that range from preprocessing to supporting the diagnosis of IPD in CXRs. The results so far align with the state of the art, and we believe they can contribute to the development of computer-assisted IPD diagnostic systems.
Detecção de posicionamento do cidadão em Projetos de Lei
(Universidade Federal de Goiás, 2024-03-22) Maia, Dyonnatan Ferreira; Silva, Nádia Félix Felipe da; http://lattes.cnpq.br/7864834001694765; Silva, Nádia Félix Felipe da; Pereira, Fabíola Souza Fernande; Fernandes, Deborah Silva Alves
Background: Comments on political projects on the internet reflect the aspirations of a significant portion of the population. The automatic stance detection of these comments regarding specific topics can help better understand public opinion. This study aims to develop a computational model with supervised learning capable of estimating the stance of comments on legislative propositions, considering the challenge of diversity and the constant emergence of new bills. Method: For the domain studied, a specific corpus was constructed by collecting comments from surveys available on the Chamber of Deputies website. The experiments included the evaluation of classic machine learning models, such as Logistic Regression, Naive Bayes, Support Vector Machine, Random Forest, and Multilayer Perceptron, in addition to the fine-tuning of BERT language models. Automatic data annotation was also performed using the zero-shot approach based on prompts from the generative GPT-3.5 model, aiming to overcome the difficulties related to human annotation and the scarcity of annotated data, generating approximately three times the size of the manually annotated corpus. Results: The results indicate that the adjusted BERTimbau model surpassed the classic approaches, achieving an average F1- score of 70.4% on unseen topics. Moreover, the application of automatically annotated data in the initial stage of BERTimbau fine-tuning resulted in performance improvement, reaching an F1-score of 73.3%. The results present deep learning models as options with positive performance for the task under the conditions of this domain. Conclusion: It was observed that the ability to generate contextualized representations, along with the number of topics and comments trained, can directly interfere with performance. This makes automatic annotation and the exploration of topic diversity with Transformer architectures, promising approaches for the task
Recomendação de conteúdo ciente de recursos como estratégia para cache na borda da rede em sistemas 5G
(Universidade Federal de Goiás, 2023-10-03) Monção, Ana Claudia Bastos Loureiro; Corrêa, Sand Luz; http://lattes.cnpq.br/3386409577930822; Cardoso, Kleber Vieira; http://lattes.cnpq.br/0268732896111424; Cardoso, Kleber Vieira; Corrêa, Sand Luz; Soares, Telma Woerle de Lima; Rosa, Thierson Couto; Fonseca, Anelise Munaretto
Recently, the coupling between content caching at the wireless network edge and video recommendation systems has shown promising results to optimize the cache hit and improve the user experience. However, the quality of the UE wireless link and the resource capabilities of the UE are aspects that impact the user experience and that have been neglected in the literature. In this work, we present a resource-aware optimization model for the joint task of caching and recommending videos to mobile users. We also present a heuristic created to solve the problem more quickly. The goal is to maximize the cache hit ratio and the user QoE (concerning content preferences and video representations) under the constraints of UE capabilities and the availability of network resources by the time of the recommendation. We evaluate our proposed model using a video catalog derived from a real-world video content dataset (from the MovieLens project), real- world video representations and actual historical records of Channel Quality Indicators (CQI) representing user mobility. We compare the performance of our proposal with a state-of-the-art caching and recommendation method unaware of computing and network resources. Results show that our approach significantly increases the user’s QoE and still promotes a gain in effective cache hit rate.
Acelerando florestas de decisão paralelas em processadores gráficos para a classificação de texto
(Universidade Federal de Goiás, 2022-09-12) Pires, Julio Cesar Batista; Martins, Wellington Santos; http://lattes.cnpq.br/3041686206689904; Martins, Wellington Santos; Lima, Junio César de; Gaioso, Roussian Di Ramos Alves; Franco, Ricardo Augusto Pereira; Soares, Fabrízzio Alphonsus Alves de Melo Nunes
The amount of readily available on-line text has grown exponentially, requiring efficient methods to automatically manage and sort data. Automatic text classification provides means to organize this data by associating documents with classes. However, the use of more data and sophisticated machine learning algorithms has demanded an increasingly computing power. In this work we accelerate a novel Random Forest-based classifier that has been shown to outperform state-of-art classifiers for textual data. The classifier is obtained by applying the boosting technique in bags of extremely randomized trees (forests) that are built in parallel to improve performance. Experimental results using standard textual datasets show that the GPUbased implementation is able to reduce the execution time by up to 20 times compared to an equivalent sequential implementation.
Estudo comparativo de comitês de sub-redes neurais para o problema de aprender a ranquear
(Universidade Federal de Goiás, 2023-09-01) Ribeiro, Diogo de Freitas; Sousa, Daniel Xavier de; http://lattes.cnpq.br/4603724338719739; Rosa, Thierson Couto; http://lattes.cnpq.br/4414718560764818; Rosa, Thierson Couto; Sousa, Daniel Xavier de; Canuto, Sérgio Daniel Carvalho; Martins, Wellington Santos
Learning to Rank (L2R) is a sub-area of Information Retrieval that aims to use machine learning to optimize the positioning of the most relevant documents in the answer ranking to a specific query. Until recently, the LambdaMART method, which corresponds to an ensemble of regression trees, was considered state-of-the-art in L2R. However, the introduction of AllRank, a deep learning method that incorporates self-attention mechanisms, has overtaken LambdaMART as the most effective approach for L2R tasks. This study, at issued, explored the effectiveness and efficiency of sub-networks ensemble as a complementary method to an already excellent idea, which is the self-attention used in AllRank, thus establishing a new level of innovation and effectiveness in the field of ranking. Different methods for forming sub-networks ensemble, such as MultiSample Dropout, Multi-Sample Dropout (Training and Testing), BatchEnsemble and Masksembles, were implemented and tested on two standard data collections: MSLRWEB10K and YAHOO!. The results of the experiments indicated that some of these ensemble approaches, specifically Masksembles and BatchEnsemble, outperformed the original AllRank in metrics such as NDCG@1, NDCG@5 and NDCG@10, although they were more costly in terms of training and testing time. In conclusion, the research reveals that the application of sub-networks ensemble in L2R models is a promising strategy, especially in scenarios where latency time is not critical. Thus, this work not only advances the state of the art in L2R, but also opens up new possibilities for improvements in effectiveness and efficiency, inspiring future research into the use of sub-networks ensemble in L2R.
Abordagem de seleção de características baseada em AUC com estimativa de probabilidade combinada a técnica de suavização de La Place
(Universidade Federal de Goiás, 2024-09-28) Ribeiro, Guilherme Alberto Sousa; Costa, Nattane Luíza da; http://lattes.cnpq.br/9968129748669015; Barbosa, Rommel Melgaço; http://lattes.cnpq.br/6228227125338610; Barbosa, Rommel Melgaço; Lima, Marcio Dias de; Oliveira, Alexandre César Muniz de; Gonçalves, Christiane; Rodrigues, Diego de Castro
The high dimensionality of many datasets has led to the need for dimensionality reduction algorithms that increase performance, reduce computational effort and simplify data processing in applications focused on machine learning or pattern recognition. Due to the need and importance of reduced data, this paper proposes an investigation of feature selection methods, focusing on methods that use AUC (Area Under the ROC curve). Trends in the use of feature selection methods in general and for methods using AUC as an estimator, applied to microarray data, were evaluated. A new feature selection algorithm, the AUC-based feature selection method with probability estimation and the La PLace smoothing method (AUC-EPS), was then developed. The proposed method calculates the AUC considering all possible values of each feature associated with estimation probability and the La Place smoothing method. Experiments were conducted to compare the proposed technique with the FAST (Feature Assessment by Sliding Thresholds) and ARCO (AUC and Rank Correlation coefficient Optimization) algorithms. Eight datasets related to gene expression in microarrays were used, all of which were used for the crossvalidation experiment and four for the bootstrap experiment. The results showed that the proposed method helped improve the performance of some classifiers and in most cases with a completely different set of features than the other techniques, with some of these features identified by AUC-EPS being critical for disease identification. The work concluded that the proposed method, called AUC-EPS, selects features different from the algorithms FAST and ARCO that help to improve the performance of some classifiers and identify features that are crucial for discriminating cancer.
Abordagem de seleção de características baseada em AUC com estimativa de probabilidade combinada a técnica de suavização de La Place
(Universidade Federal de Goiás, 2023-09-28) Ribeiro, Guilherme Alberto Sousa; Costa, Nattane Luíza da; http://lattes.cnpq.br/9968129748669015; Barbosa, Rommel Melgaço; http://lattes.cnpq.br/6228227125338610; Barbosa, Rommel Melgaço; Lima, Marcio Dias de; Oliveira, Alexandre César Muniz de; Gonçalves, Christiane; Rodrigues, Diego de Castro
The high dimensionality of many datasets has led to the need for dimensionality reduction algorithms that increase performance, reduce computational effort and simplify data processing in applications focused on machine learning or pattern recognition. Due to the need and importance of reduced data, this paper proposes an investigation of feature selection methods, focusing on methods that use AUC (Area Under the ROC curve). Trends in the use of feature selection methods in general and for methods using AUC as an estimator, applied to microarray data, were evaluated. A new feature selection algorithm, the AUC-based feature selection method with probability estimation and the La PLace smoothing method (AUC-EPS), was then developed. The proposed method calculates the AUC considering all possible values of each feature associated with estimation probability and the LaPlace smoothing method. Experiments were conducted to compare the proposed technique with the FAST (Feature Assessment by Sliding Thresholds) and ARCO (AUC and Rank Correlation coefficient Optimization) algorithms. Eight datasets related to gene expression in microarrays were used, all of which were used for the cross-validation experiment and four for the bootstrap experiment. The results showed that the proposed method helped improve the performance of some classifiers and in most cases with a completely different set of features than the other techniques, with some of these features identified by AUC-EPS being critical for disease identification. The work concluded that the proposed method, called AUC-EPS, selects features different from the algorithms FAST and ARCO that help to improve the performance of some classifiers and identify features that are crucial for discriminating cancer.
Secure D2Dcaching framework based on trust management and blockchain for mobile edge caching - a multi domain approach
(Universidade Federal de Goiás, 2023-08-18) Rocha, Acquila Santos; Pinheiro, Billy Anderson; http://lattes.cnpq.br/1882589984835011; Borges, Vinicius da Cunha Martins; http://lattes.cnpq.br/6904676677900593; Borges, Vinicius da Cunha Martins; Pinheiro, Billy Anderson; Cordeiro, Weverton Luis da Costa; Carvalho, Sérgio Teixeira de
Device–to-Device communication (D2D), combined with edge caching and mobile edge computing, is a promising approach that allows offloading data from the wireless mobile network. However, user security is still an open issue in D2D communication. Security vulnerabilities remain possible owing to easy, direct and spontaneous interactions between untrustworthy users and different degrees of mobility. This dissertation encompasses the designing of a multi-layer framework that combines diverse technologies inspired in blockchain to come up with a secure multi domain D2D caching framework. Regarding the intra-domain aspect we establish Secure D2D Caching framework inspired on trUst management and Blockchain (SeCDUB) to improve the security of D2D communication in video caching, through the combination of direct and indirect observations. In addition, blockchain concepts were adapted to the dynamic and restricted scenario of D2D networks to prevent data interception and alteration of indirect observations. This adaptation considered the development of a Clustering Approach (CA) that enables scalable and lightweight blockchain for D2D networks. Two different uncertainty mathematical models were used to infer direct and indirect trust values: Bayesian inference and the Theory Of Dempster Shafer (TDS) respectively. Regarding the inter-domain approach we developed Trust in Multiple Domains (TrustMD) framework. This approach combines edge trust storage with blockchain for distributed storage management in a multi layer architecture, designed to efficiently store trust control data in edge across different domains. Regarding the collected results, we performed simulations to test SecDUB’s intra-domain approach. The proposed clustering approach plays a key role to mitigate the SecDuB overhead as well as the consensus time. TrustMD results demonstrated a significant enhancement in goodput, reaching at best, 95% of the total network throughput, whicle SecDUB achieved approximately 80%. Even though there was a 7% increase in D2D overhead, TrustMD effectively keep control of latency levels, resulting in a slight decrease of 1.3 seconds. Hence, the achieved results indicates that TrustMD efficiently manages securitywithout compromising network performance reducing false negative rate up to 31% on the best case scenario. Actually, the combination of SecDUB and TrustMD offers a scalable and effective security solution that boosts network performance and ensures robust protection.
Junções por similaridade aproximadas em espaços vetoriais densos
(Universidade Federal de Goiás, 2023-08-24) Santana , Douglas Rolins de; Santana; Ribeiro, Leonardo Andrade; http://lattes.cnpq.br/4036932351063584; Ribeiro, Leonardo Andrade; Bedo, Marcos Vinicius Naves; Martins, Wellington Santos
Similarity Join is an operation that returns pairs of objects whose similarity is greater than or equal to a specified threshold, and is essential for tasks such as cleaning, mining, and data integration. A common approach is to use data vector representations, such as the TFIDF method, and measure the similarity between vectors using the cosine function. However, computing the similarity for all pairs of vectors can be computationally prohibitive on large data sets. Traditional algorithms exploit the sparsity of vectors and apply filters to reduce the comparison space. Recently, advances in natural language processing have produced in semantically richer vectors, improving the results quality. However, these vectors have different characteristics from those generated by traditional methods, being dense and of high dimensionality. Preliminary experiments demonstrated that L2AP, the best known algorithm for similarity join, is not efficient for dense vector spaces. Due to the intrinsic characteristics of such vectors, approximate solutions based on specialized indices are predominant for dealing with large datasets. In this context, we investigate how to perform similarity joins using the Hierarchical Navigable Small World (HNSW), a state-of-the-art graph-based index designed for approximate k-nearest neighbor (kNN) queries. We explored the design space of possible solutions, ranging from top-end alternatives to HNSW to deeper integration of similarity join processing into this framework. The experiments carried out demonstrated accelerations of up to 2.48 and 3.47 orders of magnitude in relation to the exact method and the baseline approach, respectively, maintaining recovery rates close to 100%.
Interpretabilidade de modelos de aprendizado de máquina: uma abordagem baseada em árvores de decisão
(Universidade Federal de Goiás, 2023-09-22) Silva, Jurandir Junior de Deus da; Salvini, Rogerio Lopes; http://lattes.cnpq.br/5009392667450875; Salvini, Rogerio Lopes; Silva, Nadia Félix Felipe da; Alonso, Eduardo José Aguilar
Interpretability is defined as the ability of a human to understand why an AI model makes certain decisions. Interpretability can be achieved through the use of interpretable models, such as linear regression and decision trees, and through model-agnostic interpretation methods, which treat any predictive model as a "black box". Another concept related to interpretability is that of Counterfactual Explanations, which show the minimal changes in inputs that would lead to different results, providing a deeper understanding of the model’s decisions. The approach proposed in this work exploits the explanatory power of Decision Trees to create a method that offers more concise explanations and counterfactual explanations. The results of the study indicate that Decision Trees not only explain the “why” of model decisions, but also show how different attribute values could result in alternative outputs.
A comparative study of text classification techniques for hate speech detection
(Universidade Federal de Goiás, 2022-01-27) Silva, Rodolfo Costa Cezar da; Rosa, Thierson Couto; http://lattes.cnpq.br/4414718560764818; Rosa, Thierson Couto; Moura, Edleno Silva de; Silva, Nádia Félix Felipe da
The dissemination of hate speech on the Internet, specially on social media platforms, has been a serious and recurrent problem. In the present study, we compare eleven methods for classifying hate speech, including traditional machine learning methods, neural network-based approaches and transformers, as well as their combination with eight techniques to address the class imbalance problem, which is a recurrent issue in hate speech classification. The data transformation techniques we investigated include data resampling techniques and a modification of a technique based on compound features (c_features).All models have been tested on seven datasets with varying specificity, following a rigorous experimentation protocol that includes cross-validation and the use of appropriate evaluation metrics, as well as validation of the results through appropriate statistical tests for multiple comparisons. To our knowledge, there is no broader comparative study in data enhancing techniques for hate speech detection, nor any work that combine data resampling techniques with transformers. Our extensive experimentation, based on over 2,900measurements, reveal that most data resampling techniques are ineffective to enhance the effectiveness of classifiers, with the exception of ROS which improves most classification methods, including the transformers. For the smallest dataset, ROS provided gains of 60.43% and 33.47% for BERT and RoBERTa, respectively. The experiments revealed that c_features improved all classification methods that they could be combined with. The compound features technique provided satisfactory gains of up to 7.8% for SVM. Finally,we investigate cost-effectiveness for a few of the best classification methods. This analysis provided confirmation that the traditional method Logistic Regression (LR) combined with the use of c_features can provide great effectiveness with low overhead in all datasets considered
Classificação das despesas com pessoal no contexto dos Tribunais de Contas
(Universidade Federal de Goiás, 2023-08-22) Teixeira, Pedro Henrique; Silva, Nadia Félix Felipe da; http://lattes.cnpq.br/7864834001694765; Salvini, Rogerio Lopes; http://lattes.cnpq.br/5009392667450875; Salvini, Rogerio Lopes; Silva, Nadia Félix Felipe da; Fernandes, Deborah Silva Alves; Costa, Nattane Luíza da
The Court of Accounts of the Municipalities of the State of Goiás (TCMGO) uses the expenditure data received monthly from the municipalities of Goiás to check the expenditure related to personnel expenses, as determined by LRF. However, there are indications that the classification of expenses sent by the municipal manager may contain inconsistencies arising from fiscal tricks, creative accounting or material errors, leading TCMGO to make decisions based on incorrect reports, resulting in serious consequences for the inspection process. As a way of dealing with this problem, this work used text classification techniques to identify, based on the description of the expense and instead of the code provided by the municipality, the class of a personnel expense. For this, a corpus was built with 17,116 expense records labeled by domain experts, using binary and multi-class approaches. Data processing procedures were applied to extract attributes from the textual description, as well as assign numerical values to each instance of the data set with the TF-IDF algorithm. In the modeling stage, the algorithms Multinomial Naïve Bayes, Logistic Regression and Support Vector Machine (SVM) were used in supervised classification. SVM proved to be the best algorithm, with F-Score of 0.92 and 0.97, respectively, on the multi-class and binary corpus. However, it was found that the labeling process carried out by human experts is complex, time-consuming and expensive. Therefore, this work developed a method to classify personnel expenses using only 235 labeled samples, improved by unlabeled instances, based on the adaptation of the Self-Training algorithm, producing very promising results, with an average F-Score between 0.86 and 0.89.

Navegar

Navegando INF - Instituto de Informática por Por Programa "Programa de Pós-graduação em Ciência da Computação (INF)"

Resultados por página

Opções de Ordenação