Análise de RNAs longos não codificantes do genoma de Arabidopsis thaliana (L.) Heynh

Nenhuma Miniatura disponível

Data

2017-03-07

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de Goiás

Resumo

Large-scale sequencing of transcripts via RNA-Seq has been changing paradigms by demonstrating that transcription is prevalent throughout the eukaryotic genome. In these organisms, the vast majority of transcripts are non-coding (ncRNA). One type of RNA that has aroused great interest, given its prevalence, is long non-coding RNAs (lncRNAs), which are ncRNA with more than 200 nucleotides. However, little is known about the role and prevalence of these lncRNAs in plant genomes, even in model species such as Arabidopsis thaliana (L.) Heynh. The objective of this work was to identify lncRNAs in the Arabidopsis genome and to characterize their size, structure and nucleotide diversity. The sequences were obtained from previous work that sequenced total RNA from A. thaliana, grown under different light regimes, using Illumina Hiseq 2000 platform. These sequences were mapped into the reference genome with TopHat and assembled with Cufflinks. The assembled transcripts were compared with the genome annotation with Cuffcompare, to identify non-annotated transcripts. A total of 4,305 long putative RNAs were obtained, with 314 (7%) sense in relation to coding transcripts (mRNAs), 392 (9%) intergenic, 2,216 intronic (52%) and 1,383 (32%) antisense mRNAs. The lncRNAs obtained were filtered to eliminate those with coding potential, as well as those related to rRNA, tRNA and miRNA synthesis. A total of 3,710 high-confidence lncRNAs (HC-lncRNA) were obtained, of which 58.6% were not previously annotated. These HC-lncRNA emcompass a low proportion (~ 1%) lncRNAs in the genome of Arabidopsis thaliana. A functional enrichment analysis of Gene Ontology (GO) categories demonstrated that among genes containing lncRNAs there is a high proportion of categories linked to the localization and transport of proteins within the cell, as well as to nucleic acid binding. A gene expression analyses identified only 22 differentially expressed lncRNAs under the different light conditions in which samples were exposed. Using the SNP data from the 1001 genomes project, identified high nucleotide diversity within lncRNAs regions, indicating low conservation of the primary structure of these transcripts. The nucleotide diversity in regions of long noncoding RNAs is lower than in coding regions, but less than a diversity observed in neutral regions such as pseudogenes.

Descrição

Citação

ARAÚJO, V. C. S. Análise de RNAs longos não codificantes do genoma de Arabidopsis thaliana (L.) Heynh. 2017. 80 f. Dissertação (Mestrado em Genética e Biologia Molecular) - Universidade Federal de Goiás, Goiânia, 2017.