Análise de RNAs longos não codificantes do genoma de Arabidopsis thaliana (L.) Heynh
Nenhuma Miniatura disponível
Data
2017-03-07
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de Goiás
Resumo
Large-scale sequencing of transcripts via RNA-Seq has been changing paradigms by
demonstrating that transcription is prevalent throughout the eukaryotic genome. In these
organisms, the vast majority of transcripts are non-coding (ncRNA). One type of RNA that
has aroused great interest, given its prevalence, is long non-coding RNAs (lncRNAs),
which are ncRNA with more than 200 nucleotides. However, little is known about the role
and prevalence of these lncRNAs in plant genomes, even in model species such as
Arabidopsis thaliana (L.) Heynh. The objective of this work was to identify lncRNAs in the
Arabidopsis genome and to characterize their size, structure and nucleotide diversity. The
sequences were obtained from previous work that sequenced total RNA from A. thaliana,
grown under different light regimes, using Illumina Hiseq 2000 platform. These sequences
were mapped into the reference genome with TopHat and assembled with Cufflinks. The
assembled transcripts were compared with the genome annotation with Cuffcompare, to
identify non-annotated transcripts. A total of 4,305 long putative RNAs were obtained, with
314 (7%) sense in relation to coding transcripts (mRNAs), 392 (9%) intergenic, 2,216
intronic (52%) and 1,383 (32%) antisense mRNAs. The lncRNAs obtained were filtered to
eliminate those with coding potential, as well as those related to rRNA, tRNA and miRNA
synthesis. A total of 3,710 high-confidence lncRNAs (HC-lncRNA) were obtained, of which
58.6% were not previously annotated. These HC-lncRNA emcompass a low proportion (~
1%) lncRNAs in the genome of Arabidopsis thaliana. A functional enrichment analysis of
Gene Ontology (GO) categories demonstrated that among genes containing lncRNAs
there is a high proportion of categories linked to the localization and transport of proteins
within the cell, as well as to nucleic acid binding. A gene expression analyses identified
only 22 differentially expressed lncRNAs under the different light conditions in which samples were exposed. Using the SNP data from the 1001 genomes project, identified
high nucleotide diversity within lncRNAs regions, indicating low conservation of the
primary structure of these transcripts. The nucleotide diversity in regions of long noncoding
RNAs is lower than in coding regions, but less than a diversity observed in neutral
regions such as pseudogenes.
Descrição
Palavras-chave
Citação
ARAÚJO, V. C. S. Análise de RNAs longos não codificantes do genoma de Arabidopsis thaliana (L.) Heynh. 2017. 80 f. Dissertação (Mestrado em Genética e Biologia Molecular) - Universidade Federal de Goiás, Goiânia, 2017.