Role of Polya at 3 End of Line and Sine/review Journal
Phylogenetic analysis of mRNA polyadenylation sites reveals a office of transposable elements in evolution of the 3′-terminate of genes
Ju Youn Lee,
ane Graduate Schoolhouse of Biomedical Sciences and 2 Department of Biochemistry and Molecular Biology, New Jersey Medical School, University of Medicine and Dentistry of New Jersey, Newark, NJ 07103, Usa
Search for other works past this writer on:
ane Graduate School of Biomedical Sciences and 2 Section of Biochemistry and Molecular Biological science, New Jersey Medical School, Academy of Medicine and Dentistry of New Jersey, Newark, NJ 07103, United states
Search for other works past this author on:
1 Graduate School of Biomedical Sciences and two Department of Biochemistry and Molecular Biological science, New Bailiwick of jersey Medical Schoolhouse, University of Medicine and Dentistry of New Jersey, Newark, NJ 07103, USA
*To whom correspondence should be addressed. Tel: +ane 973 nine 72 36 15 ; Fax:
+1 973 9 72 55 94
; E-mail: btian@umdnj.edu
Search for other works by this author on:
Revision received:
05 August 2008
Published:
thirty August 2008
Abstruse
mRNA polyadenylation is an essential step for the maturation of most all eukaryotic mRNAs, and is tightly coupled with termination of transcription in defining the 3′-stop of genes. Big numbers of homo and mouse genes harbor alternative polyadenylation sites [poly(A) sites] that pb to mRNA variants containing different 3′-untranslated regions (UTRs) and/or encoding distinct protein sequences. Here, we examined the conservation and divergence of dissimilar types of alternative poly(A) sites across human, mouse, rat and chicken. We plant that the iii′-most poly(A) sites tend to be more conserved than upstream ones, whereas poly(A) sites located upstream of the 3′-near exon, likewise termed intronic poly(A) sites, tend to be much less conserved. Genes with longer evolutionary history are more probable to accept alternative polyadenylation, suggesting gain of poly(A) sites through evolution. Nosotros also found that nonconserved poly(A) sites are associated with transposable elements (TEs) to a much greater extent than conserved ones, admitting less frequently utilized. Dissimilar classes of TEs have unlike characteristics in their clan with poly(A) sites via exaptation of TE sequences into polyadenylation elements. Our results establish a conservation design for alternative poly(A) sites in several vertebrate species, and betoken that the three′-stop of genes can exist dynamically modified by TEs through evolution.
INTRODUCTION
mRNA polyadenylation is an essential pace for the maturation of almost all eukaryotic mRNAs ( 1 ), and is tightly coupled with termination of transcription ( ii ) and other steps of pre-mRNA processing ( 3 , 4 ). It involves an endonucleolytic cleavage at a polyadenylation site [poly(A) site], followed by polymerization of an adenosine tail at the 3′-finish of the cleaved RNA ( 5 ). Poly(A) tails are critical for virtually every aspect of mRNA metabolism, including mRNA transport, translation and mRNA stability ( 6–eight ). Malfunction of polyadenylation has been implicated in several man diseases ( ix , 10 ).
The genomic sequence surrounding a poly(A) site is referred to as the poly(A) site region. Almost cis -elements involved in polyadenylation are located in the −100 to +100 nt region, with poly(A) site set at position 0 ( 11 ). Signals located in the −40 to +forty nt region are usually essential for polyadenylation, and can be considered as core elements, whereas signals located between 41 and 100 nt in upstream or downstream regions have been implicated in the modulation of polyadenylation, and tin be considered as auxiliary elements ( 11 ). The nucleotide composition of human poly(A) site regions is by and large T-rich, with an A-rich sequence located right earlier poly(A) site ( 12 , thirteen ). A hexamer AATAAA or ATTAAA or a close variant, usually referred to as the polyadenylation signal (PAS), is typically located in the −xl to −1 nt region ( 13 , 14 ). T-rich element and TGTG element and its variants are typically located in the +1 to +40 nt region ( xi ). In addition, TGTA, TATA, Chiliad-rich and C-rich elements in various upstream or downstream regions have been implicated in regulation of polyadenylation by experimental and/or bioinformatic studies ( 11 , xv , sixteen ). Phylogenetic analyses have indicated that the cis -chemical element construction of poly(A) site is substantially conserved across amniotes, from man to chicken, but divergent in lower vertebrates, such equally fish (17, Lee,J.Y. and Tian,B., unpublished data).
Over one-half of all human genes take multiple poly(A) sites ( xiii , 18 ), leading to alternative gene products and contributing to the complexity of the mRNA pool in human cells. Multiple poly(A) sites can be located downstream of the stop codon in the three′-most exon ( Effigy 1 ), leading to transcripts with variable 3′-untranslated regions (UTRs), or in internal exons, leading to transcripts with variable poly peptide products and 3′-UTRs. The latter case is also referred to as intronic polyadenylation, as poly(A) site usage is competed against by splicing ( nineteen ). The selection of culling poly(A) sites has been shown to be related to biological factors, such as evolution stage and cell condition, for a number of genes ( 20–24 ). Both the level of polyadenylation factors and tissue-specific usage of cis -elements have been implicated in culling polyadenylation in different tissues ( 21 , 25 , 26 ).
Figure 1.
Schematic of alternative polyadenylation and different types of poly(A) site. Poly(A) sites are classified and named according to their location in a gene. The one alphabetic character code for each type is shown in parenthesis. ( A ) Single poly(A) sites (S). ( B ) Sites located in the iii′-virtually exon are classified into 5′-nearly site (F), middle site (Chiliad) and 3′-well-nigh site (L). ( C ) Sites located upstream of the 3′-near exon are considered intronic, and named composite terminal exon site (C), and skipped or hidden last exon sites (H), based on the gene splicing pattern. pA, poly(A) site; v′ ss, 5′ splice site; AAA, poly(A) tail.
Figure ane.
Schematic of alternative polyadenylation and different types of poly(A) site. Poly(A) sites are classified and named co-ordinate to their location in a gene. The one letter code for each blazon is shown in parenthesis. ( A ) Unmarried poly(A) sites (S). ( B ) Sites located in the iii′-nearly exon are classified into 5′-virtually site (F), middle site (Grand) and three′-most site (L). ( C ) Sites located upstream of the iii′-nearly exon are considered intronic, and named composite final exon site (C), and skipped or hidden concluding exon sites (H), based on the gene splicing pattern. pA, poly(A) site; 5′ ss, 5′ splice site; AAA, poly(A) tail.
Transposable elements (TEs) business relationship for at least 45% of the human genome, and play of import roles in shaping the genome structure through development ( 27 , 28 ). TEs can also regulate factor expression ( 29 , thirty ), by providing cis -elements at promoter regions ( 31 ), giving ascent to new exons ( 32–34 ) or modulating transcription ( 35–37 ). Major TE classes in the human genome are DNA transposons (DNAs), long interspersed elements (LINEs), long last repeat retrotransposons (LTRs) and short interspersed elements (SINEs). Each class has a number of families and subfamilies with singled-out structures and consensus sequences, and are agile in transposition in unlike periods of evolution in dissimilar species ( 38 ). While most TEs in the human being genome accept lost transposition action, some are still active, including the L1 family of LINE, Alu family of SINE and SVA chemical element ( 39 ), leading to genetic variation and causing diseases ( 40 , 41 ). Both L1 and Alu take also been implicated in creating poly(A) sites for certain genes ( 42 , 43 ).
Here, past using whole genome alignments of several amniotes, including human, mouse, rat and craven, we prepare out to systematically address (i) the full general trend of conservation for poly(A) sites at different locations of a gene and (ii) the roles which unlike classes of TEs play in the evolution of poly(A) sites.
MATERIALS AND METHODS
Information sets
Nosotros used poly(A) sites from the PolyA_DB 2 database ( 44 ). These poly(A) sites were mapped by aligning poly(A/T)-tailed cDNA/ESTs with genome sequences using BLAT ( 45 ) and in-house Perl scripts ( 46 ). Briefly, the UniGene database was used to group cDNA/ESTs into genes, NCBI RefSeq and UCSC Known Gene sequences were used to identify the intron/exon structure of a gene. Side by side poly(A) sites (<24 nt from one another) were amassed together. Poly(A) sites were classified according to their locations in the gene. The RepeatMasker plan (version 3.1.eight) and the RepBase database (version October 2006) were used to identify TEs in poly(A) site regions with default settings.
Mapping of orthologous poly(A) sites
To identify orthologous poly(A) sites between two species, we used pair-wise genome alignment files downloaded from the UCSC Genome Bioinformatics Site. We required reciprocal best matches for a pair of orthologous poly(A) sites according to the distance from one site to the other in the genome alignment, and that the 2 sites are located inside a 24 nt window as depicted in Supplementary Figure 1A. We found that irresolute the window size did non pb to significant change of the number of mapped orthologous sites ( Supplementary Effigy 1B), suggesting robustness of this method. In addition, nigh none of the mapped orthologous poly(A) sites belonged to genes that were in different NCBI HomoloGene orthologous groups (data non shown), suggesting high accurateness.
RESULTS
Conservation patterns of poly(A) sites in human, mouse, rat and craven
Alternative polyadenylation is a widespread mechanism for genes to produce transcript variants ( 13 , 47 ). Poly(A) sites can be classified into different types based on their locations in a gene ( Figure 1 ). For simplicity, nosotros as well use one letter lawmaking to refer to a type in this study. A poly(A) site located in a three′-nearly exon that contains only one poly(A) site is named single or constitutive site (S type); poly(A) sites located in 3′-most exons containing multiple poly(A) sites are named F blazon (the first or 5′-most), L type (the last, or 3′-nigh) or M type (centre ones between F and 50). In improver, poly(A) sites located upstream of 3′-nearly exons are considered as intronic sites, which include blended concluding exon sites (C) and skipped or hidden terminal exon sites (H).
To understand how poly(A) sites have evolved, we mapped orthologous poly(A) sites using human, mouse, rat and chicken poly(A) sites and pair-wise genome alignments between these organisms (see Materials and Methods and Supplementary Figure one for detail). Nosotros focused on these aminotes because there are a large number of poly(A/T)-tailed cDNA/ESTs bachelor for mapping poly(A) sites in their genomes and previous bioinformatic studies accept indicated that the cis -element structure of poly(A) site is essentially the aforementioned beyond aminotes (17, Lee,J.Y. and Tian,B., unpublished data). Of 37 591 human sites, 11 255 (xxx%) were found to be conserved in mouse, x 526 (28%) in rat and 922 (2%) in chicken. Equally shown in Figure 2 A, human versus mouse and homo versus rat conservation patterns are largely identical. The S type sites are the most conserved amidst all types, the L type sites are significantly more conserved than F or M type sites and intronic sites are the to the lowest degree conserved ones ( Effigy two A). Of the intronic sites, H type sites are more conserved than C type sites. For conserved sites in iii′-most exons, conservation of poly(A) site type is statistically meaning ( P = two.2 × 10 −16 , Chi-squared test, Figure 2 B), despite that some human sites are mapped to a different type than their mouse orthologs and vice versa. The same conclusions tin be fatigued from analyses of mouse versus human and mouse versus rat sites ( Supplementary Figure two).
Effigy 2.
Conservation of human poly(A) sites in mouse, rat and chicken. ( A ) Percentage of human poly(A) sites of unlike types that are conserved in mouse and rat. P -values (Chi-squared examination) for difference in conservation between F and L types are 3.52 × 10 −67 for human versus mouse, and 3.72 × 10 −56 for man versus rat. Error bars are standard deviation. ( B ) Conservation of poly(A) site type between human and mouse orthologous poly(A) sites ( P < two.2 × x −16 , Chi-squared test). ( C ) Percent of human and mouse poly(A) sites conserved in craven. P -values (Chi-squared test) for difference in conservation between F and L types are 1.34 × 10 −16 for homo versus chicken and four.39 × 10 −7 for mouse versus chicken. ( D ) Percent of genes with alternative poly(A) sites for genes with orthologs in chicken (named 'old', 8140 in total) and genes without orthologs in chicken (named 'new', 4284 in total). P -value (Chi-squared test) for the deviation is viii.39 × 10 −145 .
Effigy two.
Conservation of human poly(A) sites in mouse, rat and chicken. ( A ) Percent of man poly(A) sites of different types that are conserved in mouse and rat. P -values (Chi-squared test) for departure in conservation between F and L types are 3.52 × 10 −67 for human versus mouse, and 3.72 × 10 −56 for homo versus rat. Error bars are standard departure. ( B ) Conservation of poly(A) site type between human and mouse orthologous poly(A) sites ( P < ii.ii × 10 −16 , Chi-squared test). ( C ) Percentage of man and mouse poly(A) sites conserved in chicken. P -values (Chi-squared exam) for divergence in conservation between F and L types are i.34 × ten −16 for human versus chicken and 4.39 × 10 −7 for mouse versus chicken. ( D ) Percent of genes with alternative poly(A) sites for genes with orthologs in craven (named 'old', 8140 in full) and genes without orthologs in chicken (named 'new', 4284 in full). P -value (Chi-squared examination) for the difference is viii.39 × 10 −145 .
The fact that 50 type sites are more conserved than F or G blazon sites indicates that downstream poly(A) sites are better preserved in evolution and gain or loss of poly(A) sites are more likely to take place in upstream poly(A) sites. To further explore this with a broader evolutionary perspective, we carried out man versus chicken and mouse versus chicken poly(A) site comparisons. As shown in Figure 2 C, both comparisons had the aforementioned conservation design. Interestingly, the difference between Fifty and F is more conspicuous than those from comparisons of mammals, suggesting that conservation of 3′-most poly(A) sites are more than discernable in genes with longer evolutionary history. Furthermore, human and mouse S blazon sites are relatively less conserved in craven than in mammals, suggesting that longer development may bring almost more poly(A) sites. To explore this hypothesis, nosotros divided human genes into two groups, ones with orthologs in chicken (named 'onetime' genes) and ones without (named 'new' genes), and examined the frequency of alternative polyadenylation in each grouping. As shown in Effigy 2 D, a significantly higher proportion of erstwhile genes have alternative poly(A) sites than new genes ( P = viii.39 × 10 −145 , Chi-squared exam), indicating that genes, in general, gain poly(A) sites through development.
TEs and poly(A) sites
A big number of human poly(A) sites are not conserved in mouse, a sizable fraction of which is due to lack of genome alignments (data not shown). Since TEs take been implicated in giving ascent to new exon sequences in evolution, we wanted to know how TEs might exist responsible for species-specific poly(A) sites. Using the RepeatMasker program and the RepBase database, we examined poly(A) sites that are associated with iv classes of TEs, i.east. DNAs, LINEs, LTRs and SINEs. A TE can incorporate a poly(A) site or contribute cis -elements to a poly(A) site. For the latter case, nosotros required the distance betwixt a poly(A) site and a TE to be inside 40 nt, as essential cis -elements involved in polyadenylation are typically located in the −40 to +40 nt core region ( 11 ). In sum, 3188 human poly(A) sites from 2565 genes, corresponding to ∼8% of all poly(A) sites and ∼16% of all genes surveyed, were found to be associated with TEs. Equally shown in Effigy 3 A, we constitute that homo poly(A) sites that are non conserved in mouse are associated with TEs to a much greater extent than those conserved ones. In fact, ∼94% of TE-associated sites are nonconserved in mouse. Conversely, ∼five% of mouse poly(A) sites from ∼7% of genes surveyed are associated with TEs, of which ∼93% are not conserved in man (data not shown). This outcome indicates that TEs can significantly contribute to creation or modulation of poly(A) sites in development, and are responsible for species-specific poly(A) sites.
Figure 3.
Poly(A) sites and TEs. ( A ) Pct of human poly(A) sites associated with TEs for different types of conserved and nonconserved sites. Both TEs overlapping with poly(A) site regions in the auxiliary regions (−100 to −41nt and +41 to +100 nt) and cadre region are shown. ( B ) Usage of different types of poly(A) sites. Percent of poly(A) site usage is based on the number of supporting ESTs for a poly(A) site compared with the number of ESTs for all poly(A) sites of the same gene. ( C ) Schematic of three types of association between TE and poly(A) site. The top horizontal line represents a poly(A) site region with the arrow pointing to a poly(A) site. TEs are represented by horizontal bars. Three types of placement of a TE in a poly(A) site region are shown. In type 1, a TE contains a poly(A) site and adjacent upstream and downstream regions; in types 2 and three, simply the upstream or downstream region of a poly(A) site is contained in a TE. The type number is indicated in the graph. ( D ) Number of poly(A) sites associated with four classes of TEs. The three types of clan and TE strand are indicated.
Figure 3.
Poly(A) sites and TEs. ( A ) Percent of human poly(A) sites associated with TEs for dissimilar types of conserved and nonconserved sites. Both TEs overlapping with poly(A) site regions in the auxiliary regions (−100 to −41nt and +41 to +100 nt) and cadre region are shown. ( B ) Usage of different types of poly(A) sites. Percent of poly(A) site usage is based on the number of supporting ESTs for a poly(A) site compared with the number of ESTs for all poly(A) sites of the same cistron. ( C ) Schematic of 3 types of association between TE and poly(A) site. The meridian horizontal line represents a poly(A) site region with the pointer pointing to a poly(A) site. TEs are represented by horizontal bars. 3 types of placement of a TE in a poly(A) site region are shown. In type 1, a TE contains a poly(A) site and side by side upstream and downstream regions; in types 2 and 3, merely the upstream or downstream region of a poly(A) site is contained in a TE. The type number is indicated in the graph. ( D ) Number of poly(A) sites associated with four classes of TEs. The three types of association and TE strand are indicated.
Every bit shown in Figure 3 A, nonconserved intronic poly(A) sites are associated with TEs more than ofttimes than nonconserved sites in 3′-most exons, with the H type sites being associated with TEs to the greatest extent. Interestingly, nonconserved S and Fifty blazon sites are associated with TEs more oftentimes than F and M type sites. Since these sites are the 3′-most sites for genes, this finding indicates that TEs tin can play a significant role in defining the 3′-cease purlieus of a gene. Like trends can exist discerned for poly(A) sites overlapping with TEs in the −100 to −41nt and +41 to +100nt auxiliary regions ( Effigy three A), which by and large comprise regulatory elements for polyadenylation. Some conserved poly(A) sites are as well associated with TEs, indicating selection for their office through evolution.
To understand how TE-associated poly(A) sites are utilized, we examined the usage of dissimilar types of poly(A) sites using the number of EST sequences supporting for poly(A) site. While this method is non considered quantitative plenty for assessing the usage of individual poly(A) sites, information technology can reveal the general usage trend for a gear up of sites ( 21 ). As shown in Figure 3 B, nonconserved sites are much less ofttimes used than conserved sites for both poly(A) sites associated with TEs and those non. TE-associated poly(A) sites appear to be slightly less frequently used than other sites in both conserved and nonconserved groups. Since conserved TE-associated poly(A) sites have longer evolutionary histories than nonconserved ones, this result suggests that TEs are gradually fixed in evolution for their role in polyadenylation, presumably undergoing optimization of polyadenylation activity past mutation.
For the four major classes of TEs in the human genome, the number of TEs-associated with poly(A) sites follows the order LINE > SINE > LTR > DNA ( Table 1 ), which approximately correlates with their occurrence in the man genome ( 27 ). We further examined 3 types of clan based on the location of TE in poly(A) site region, including the whole −twoscore to +40nt core region, the −40 to −1nt core upstream region and the +one to +40nt core downstream region, every bit illustrated in Figure three C. As shown in Figure 3 D, different TE classes are associated with poly(A) sites differently. While most DNAs and LTRs tend to contain whole poly(A) site region, a large fraction of LINEs and SINEs are located either upstream or downstream of poly(A) sites, suggesting contribution of cis -elements, with SINEs beingness more than conspicuous for this tendency. In addition, strong strand biases tin can be discerned for LTRs, LINEs and SINEs.
Table i.
Man poly(A) sites are associated with dissimilar classes of TEs
TE course | No. of TE families | No. of TE subfamilies | No. of poly(A) sites | Pinnacle families | No. of conserved poly(A) sites | No. of nonconserved poly(A) sites |
---|---|---|---|---|---|---|
DNA | 11 | 116 | 572 | MER1_type | 31 | 272 |
MER2_type | 4 | 141 | ||||
LTR | six | 215 | 639 | MaLR | 12 | 280 |
ERV1 | 3 | 216 | ||||
ERVL | 11 | 77 | ||||
ERVK | 0 | 33 | ||||
LINE | 4 | 88 | 1257 | L1 | thirty | 827 |
L2 | fifty | 302 | ||||
SINE | 3 | 28 | 783 | MIR | 37 | 407 |
Alu | 0 | 338 |
TE class | No. of TE families | No. of TE subfamilies | No. of poly(A) sites | Meridian families | No. of conserved poly(A) sites | No. of nonconserved poly(A) sites |
---|---|---|---|---|---|---|
Deoxyribonucleic acid | xi | 116 | 572 | MER1_type | 31 | 272 |
MER2_type | 4 | 141 | ||||
LTR | 6 | 215 | 639 | MaLR | 12 | 280 |
ERV1 | 3 | 216 | ||||
ERVL | 11 | 77 | ||||
ERVK | 0 | 33 | ||||
LINE | iv | 88 | 1257 | L1 | 30 | 827 |
L2 | 50 | 302 | ||||
SINE | iii | 28 | 783 | MIR | 37 | 407 |
Alu | 0 | 338 |
Conservation is based on the human being and mouse comparison. Elevation families are those accounting for >5% of poly(A) sites that are associated with a TE form.
Table 1.
Human poly(A) sites are associated with different classes of TEs
TE class | No. of TE families | No. of TE subfamilies | No. of poly(A) sites | Top families | No. of conserved poly(A) sites | No. of nonconserved poly(A) sites |
---|---|---|---|---|---|---|
DNA | eleven | 116 | 572 | MER1_type | 31 | 272 |
MER2_type | four | 141 | ||||
LTR | 6 | 215 | 639 | MaLR | 12 | 280 |
ERV1 | three | 216 | ||||
ERVL | 11 | 77 | ||||
ERVK | 0 | 33 | ||||
LINE | 4 | 88 | 1257 | L1 | 30 | 827 |
L2 | 50 | 302 | ||||
SINE | 3 | 28 | 783 | MIR | 37 | 407 |
Alu | 0 | 338 |
TE class | No. of TE families | No. of TE subfamilies | No. of poly(A) sites | Peak families | No. of conserved poly(A) sites | No. of nonconserved poly(A) sites |
---|---|---|---|---|---|---|
Dna | 11 | 116 | 572 | MER1_type | 31 | 272 |
MER2_type | 4 | 141 | ||||
LTR | six | 215 | 639 | MaLR | 12 | 280 |
ERV1 | 3 | 216 | ||||
ERVL | 11 | 77 | ||||
ERVK | 0 | 33 | ||||
LINE | 4 | 88 | 1257 | L1 | 30 | 827 |
L2 | fifty | 302 | ||||
SINE | iii | 28 | 783 | MIR | 37 | 407 |
Alu | 0 | 338 |
Conservation is based on the man and mouse comparison. Acme families are those accounting for >5% of poly(A) sites that are associated with a TE class.
Poly(A) sites in terminal regions of DNAs and LTRs tin be adopted by human genes
We found that poly(A) sites associated with DNAs and LTRs are primarily located in terminal regions of these elements, namely the terminal inverted repeats (TIRs) in DNAs and final LTR sequences in LTRs. Even so, as shown in Figure iii D, while the plus and minus strands of TIR are associated with poly(A) sites with similar frequencies, a strong bias to the plus strand of LTR can be discerned. This result is consistent with PAS occurrence and poly(A) site prediction by polyA_SVM ( 48 ) and polyadq ( 49 ) for TIR and LTR sequences, as shown in Supplementary Figure 3A–D, in which top Deoxyribonucleic acid and LTR families and subfamilies with respect to poly(A) site association are analyzed (MER33 subfamily of MER1_type and Tigger 1 subfamily of MER2_type for DNA and MLT1C subfamily of MaLR and MER21C subfamily of ERV1 for LTR). Thus, poly(A) sites in human genes that are associated with DNAs and LTRs are by and large endogenous poly(A) sites in these TE elements that have been adopted through evolution.
A big number of poly(A) sites are derived from both strands of L1
The L1 family of LINE accounts for ∼17% of the man genome, the highest among all TE families and has been active for the last ∼170 million years (MYR) ( fifty ). Not surprisingly, L1 is associated with poly(A) sites with the highest frequency among all TE families. Many internal poly(A) sites of L1 accept been reported, which has been implicated in the modulation of its retrotransposition action ( 42 ). A full-length L1 is composed of 5′-UTR, ORF1, ORF2 and three′-UTR. However, L1 sequences in the human genome are often truncated at the five′-end due to inefficient reverse transcription during retrotransposition ( 51 ). Consistently, the number of poly(A) sites associated with these sequences follows the social club: 3′-end (3′-UTR) > ORF2 > 5′-end (v′-UTR + ORF1) ( Figure iv A). As shown for the examples of elevation L1 subfamilies, ORF2 of L1M5 and 3′-finish region of L1ME4a, poly(A) sites in ORF2 and 3′-end region are diffusely distributed ( Figure 4 B and C), except for several 'hot spots' on the minus strand of the 3′-end region. Interestingly, while ORF2 and 3′-end region incorporate much more AATAA/ATTAAA and other PAS hexamers on the plus strand than the minus strand ( Supplementary Figure 3E and F), presumably due to their A-rich content, more poly(A) sites are associated with minus strands than plus strands, with a ratio of 2 : i ( Figure 4 A). This bias is in good understanding with previous reports that indicated preferential placement of L1 sequences in antisense orientation of host genes with a ratio of ∼ii ( 52 ). Nosotros further analyzed ORF2 and 3′-stop sequences by PolyA_SVM, which uses 15 cis -elements surrounding poly(A) site for prediction ( 48 ). We constitute that more poly(A) sites tin can actually be predicted on the minus strand than on the plus strand (7 versus iii) for ORF2, and same number of sites for the 3′-terminate region ( Supplementary Figure 3E and F). Thus, other cis -elements may exist on the minus strand that lead to college occurrence of poly(A) sites than the plus strand, despite fewer PAS hexamers. Further experimental analysis is needed to confirm this hypothesis. In addition, several regions of L1 do not contain PAS or predicted poly(A) sites, only are associated with poly(A) sites with high frequency, suggesting that they may comprise favorable sequences that can give rise to cis -elements for polyadenylation through mutations.
Figure 4.
Poly(A) sites and L1. ( A ) Number of poly(A) sites associated with plus and minus strands of iii L1 regions, i.e. 5′-end, ORF2 and 3′-end. ( B ) Distribution of poly(A) sites in ORF2 of L1M5 subfamily. The poly(A) sites are indicated by vertical confined and likewise shown in a profile, which is essentially a smoothed histogram of poly(A) site occurrence. The contour is smoothed by a 11 nt window, i.e. value of a position is the average of 11nt surrounding the position. Three clan types (illustrated in Figure three C) are represented by different colors, every bit indicated in the graph. The poly(A) site position for blazon 1 is actual poly(A) site location, whereas the position for types two or 3 is location of the closest nucleotide in TE to its associated poly(A) site. Additional 40 nt are added to both 5′- and iii′-ends to illustrate poly(A) sites located upstream or downstream of TE. Vertical dotted lines are the start and finish of TE. ( C ) Distribution of poly(A) sites in 3′-stop of L1ME4a subfamily.
Effigy four.
Poly(A) sites and L1. ( A ) Number of poly(A) sites associated with plus and minus strands of three L1 regions, i.eastward. 5′-terminate, ORF2 and 3′-end. ( B ) Distribution of poly(A) sites in ORF2 of L1M5 subfamily. The poly(A) sites are indicated by vertical confined and as well shown in a profile, which is essentially a smoothed histogram of poly(A) site occurrence. The profile is smoothed past a 11 nt window, i.e. value of a position is the average of 11nt surrounding the position. Iii association types (illustrated in Figure iii C) are represented past different colors, as indicated in the graph. The poly(A) site position for type 1 is bodily poly(A) site location, whereas the position for types ii or 3 is location of the closest nucleotide in TE to its associated poly(A) site. Additional 40 nt are added to both 5′- and 3′-ends to illustrate poly(A) sites located upstream or downstream of TE. Vertical dotted lines are the get-go and terminate of TE. ( C ) Distribution of poly(A) sites in 3′-end of L1ME4a subfamily.
The homologous 3′-end regions of L2 and MIR incorporate cis -elements for polyadenylation
L2 is the second top LINE associated with poly(A) sites. Most associated poly(A) sites are located within or near its 3′-end region, equally shown for L2a, the top subfamily of L2 ( Effigy 5 A). Interestingly, the last 50 nt region of its plus strand tends to be located upstream of poly(A) site, whereas the minus strand of this region tends to be located downstream of poly(A) site ( Figure 5 A). Consistent with this observation, this region contains an AATAAA PAS and a TGTA element on the plus strand and a TGTG chemical element on the minus strand ( Figure 5 B). Since the 3′-stop region of L2 is highly homologous to the 3′-end region of Mammalian-broad interspersed echo (MIR), a tRNA-derived SINE that is thought to exist active ∼130 MYR agone, in the same period every bit L2 ( 53 , 54 ), it is not surprising to see that MIR has a like trend for poly(A) site association ( Effigy 5 C). For example, MIRb, the top MIR subfamily, contains both ATTAAA and AATAAA PAS and a TGTA element on the plus strand and ii TGTG elements on the minus strand ( Effigy v B). Thus, MIR and L2 can bring either upstream or downstream cis -elements for polyadenylation to the genome, and give rise to new poly(A) sites. Notably, consistent with their evolutionary history, MIR and L2 together account for about half of the conserved TE-associated poly(A) sites, indicating their pregnant contribution to poly(A) site evolution in mammals.
Effigy 5.
Poly(A) sites and L2 and MIR. ( A ) Distribution of poly(A) sites in the 3′-end region of L2a subfamily of L2. ( B ) Alignment of the 3′-end region of L2a with MIRb. AATAAA, ATTAAA, TGTA are shown in green. Identical nucleotides are indicated by asterisks. ( C ) Distribution of poly(A) sites in MIRb subfamily of MIR. Come across the legend of Figure 4 B for description of (A) and (C).
Effigy 5.
Poly(A) sites and L2 and MIR. ( A ) Distribution of poly(A) sites in the 3′-cease region of L2a subfamily of L2. ( B ) Alignment of the 3′-end region of L2a with MIRb. AATAAA, ATTAAA, TGTA are shown in light-green. Identical nucleotides are indicated by asterisks. ( C ) Distribution of poly(A) sites in MIRb subfamily of MIR. See the legend of Figure four B for description of (A) and (C).
Four modes of poly(A) site clan for Alu
Alu has the highest re-create number in the human genome among all TE families, and is the second elevation SINE associated with poly(A) sites, after MIR. Alu sequences are derived from 7SL RNA elements, and are composed of two related monomers separated past a center A-rich region. An Alu sequence has a RNA polymerase Three promoter located at the v′-end, and a poly(A) sequence at the 3′-end that is required for retrotransposition ( 55 ). For the pinnacle subfamily, AluSx, four hot spots can exist discerned ( Figure six A). The five′-end region of AluSx tends to be located downstream of poly(A) sites. This region is rich in CG. Further test of poly(A) sites associated with this region indicated that this region tends to give rising to TG elements via transition of C to T. Interestingly, CG dinucleotides in Alu were constitute to have nigh 10 times college mutation charge per unit than other dinucleotides in the sequence ( 56 , 57 ). Thus, despite that the consensus sequence of the 5′-end region does not have apparent cis -elements for polyadenylation, it has propensity to mutate to poly(A) site downstream elements. A second hot spot is located in the heart region of the plus strand. This region contains the middle A-rich sequence followed past a CG-rich sequence that is highly similar to the five′-end region described in a higher place. Further test indicated that the heart A-rich sequence tends to mutate to PAS and the CG-rich sequence tends to mutate to TG elements. Consistent with these findings, poly(A) sites associated with this region are completely encoded by Alu sequences. The third and fourth hot spots correspond to the plus strand and minus strand of the 3′-stop poly(A) tail sequence, respectively. Not surprisingly, this poly(A) tail sequence can requite rise to upstream PAS hexamers when in the sense orientation, or downstream T-rich elements when in the antisense orientation. Thus, despite lack of cis -elements for polyadenylation in its consensus, Alu sequences provide favorable breeding ground for new poly(A) sites past four mechanisms through mutations, as illustrated in Figure 6 B. Its contribution to the 3′-end definition of human being genes can be highly meaning due to its widespread nature in the man genome.
Figure half-dozen.
Poly(A) sites and Alu. ( A ) Distribution of poly(A) sites in AluSx subfamily of Alu. See the legend of Effigy 4 B for description of the graph. ( B ) Schematic of mechanisms by which different regions of Alu give rise to cis -elements for polyadenylation.
Figure 6.
Poly(A) sites and Alu. ( A ) Distribution of poly(A) sites in AluSx subfamily of Alu. See the legend of Figure 4 B for description of the graph. ( B ) Schematic of mechanisms by which dissimilar regions of Alu requite rise to cis -elements for polyadenylation.
Discussion
In this study, we used whole genome alignments to identify conserved poly(A) sites beyond species. The high sensitivity and selectivity of this approach are supported by the results that using different window sizes for mapping orthologous sites only fabricated small-scale differences, and the mapping result was in good understanding with the gene ortholog information based on coding sequences. Nosotros found that single poly(A) sites are much more conserved than alternative poly(A) sites, which agrees with what was reported by Ara et al. ( 58 ). Still, our finding that the 3′-most poly(A) sites are more conserved than upstream ones is inconsistent with what was reported by Ara et al. in which poly(A) sites distal to the stop codon were plant to be less conserved than those proximal ones. This can exist partly attributable to differences in mapping conserved sites. While both methods use a window for finding conserved poly(A) sites (30 nt in their instance, and 24 in our case), Ara et al. additionally required that PAS to be perfectly aligned. This can brand conserved poly(A) sites proximal to the cease codon more easily detected, as sequence conservation in the iii′-UTR is generally better in the 5′-region than in the 3′-region. This bias can exist further exacerbated by the fact that PAS are located in an AT-rich depression complexity region, for which sequence alignment tools may not perform well in adjustment short fragments, for case, PAS hexamers. By contrast, our method is non bound by this restriction, and does not have the bias to 5′-poly(A) sites. As such, for comparable numbers of poly(A) sites, Ara et al. institute ∼xiii% but we plant ∼30% are conserved between human being and mouse. In add-on, Ara et al. divided culling poly(A) sites into proximal and distal groups, which correspond to F+M and L+M types, respectively, in this study. Thus, the discrepancy betwixt our results and theirs tin can as well be acquired by M blazon poly(A) sites, which is less conserved than F or L.
Previous studies have implicated a number of TEs in bringing poly(A) sites to endogenous genes ( 42 , 43 ). Our comprehensive analysis in this study establishes poly(A) site association patterns for iv classes of TEs. Three modes of TE-mediated poly(A) site creation were detected: (i) some poly(A) sites are encoded by TEs and utilized by endogenous genes, such as poly(A) sites in the TIR region of DNAs, the LTR region of LTRs and diverse regions of L1; (ii) some poly(A) sites were created by combining cis -elements from TEs with those in the genome, such as the 3′-stop regions of L2 and MIR and (iii) some poly(A) sites were derived from TE regions that have loftier propensity to give rise to poly(A) sites past mutations, such as the 5′-stop, centre and 3′-end regions of Alu. The diverse pathways to create poly(A) site suggests that the 3′-end of genes can exist dynamically modified in evolution. Feasibly, this can have a significant bear upon on the evolution of iii′-UTRs and their cis -elements. On this note, TEs in three′-UTRs have been linked to microRNA target sites and AU-rich elements ( 59 , sixty ), and have been involved in regulation of RNA localization via RNA editing ( 61 ).
TEs are associated with nonconserved poly(A) sites more oftentimes than with conserved ones, indicating that they play important roles in setting lineage specific polyadenylation patterns. Still, it is notable that only those TEs that have sufficient caste of similarity to their consensus sequences can be examined in this study, and ancient TEs, which have diverged beyond recognition by electric current computational methods, are non detected. In this regard, the fact that all TE classes analyzed in this report have some level of association with poly(A) sites makes it plausible that many conserved poly(A) sites are also associated with TEs, but their sequence deviation has fabricated them not recognizable by the RepeatMasker program. Given the widespread nature of TEs and their all-encompassing roles in shaping the genomes through evolution, it is conceivable that TEs take played a significant role in poly(A) site evolution and defining the iii′-finish of genes.
FUNDING
National Institutes of Health (R01 GM084089 to B.T.). Funding for open access charge: R01 GM084089.
Conflict of involvement statement . None alleged.
ACKNOWLEDGEMENT
We thank Michael Tsai for technical help at early stage of this projection.
REFERENCES
1
.
A history of poly A sequences: from germination to factors to function
,
Prog. Nucleic Acrid Res. Mol. Biol.
,
2002
, vol.
71
(pg.
285
-
389
)
ii
.
Connections between mRNA 3′ end processing and transcription termination
,
Curr. Opin. Jail cell Biol.
,
2005
, vol.
17
(pg.
257
-
261
)
three
, , .
Integrating mRNA processing with transcription
,
Cell
,
2002
, vol.
108
(pg.
501
-
512
)
4
.
Rules of engagement: co-transcriptional recruitment of pre-mRNA processing factors
,
Curr. Opin. Cell Biol.
,
2005
, vol.
17
(pg.
251
-
256
)
5
, .
Machinery and regulation of mRNA polyadenylation
,
Genes Dev.
,
1997
, vol.
11
(pg.
2755
-
2766
)
6
, .
Interrelationships of the pathways of mRNA disuse and translation in eukaryotic cells
,
Annu. Rev. Biochem.
,
1996
, vol.
65
(pg.
693
-
739
)
vii
, , .
Starting at the outset, middle and end: translation initiation in eukaryotes
,
Prison cell
,
1997
, vol.
89
(pg.
831
-
838
)
8
, , .
Life and decease in the cytoplasm: messages from the 3′ end
,
Curr. Opin. Genet. Dev.
,
1997
, vol.
vii
(pg.
220
-
232
)
9
, , .
A systematic analysis of disease-associated variants in the 3′ regulatory regions of human protein-coding genes I: general principles and overview
,
Hum. Genet.
,
2006
, vol.
120
(pg.
ane
-
21
)
ten
, , .
3′ end mRNA processing: molecular mechanisms and implications for health and disease
,
EMBO J.
,
2008
, vol.
27
(pg.
482
-
498
)
11
, , , .
Bioinformatic identification of candidate cis-regulatory elements involved in human mRNA polyadenylation
,
RNA
,
2005
, vol.
11
(pg.
1485
-
1493
)
12
, .
Sequence determinants in human polyadenylation site choice
,
BMC Genomics
,
2003
, vol.
four
pg.
vii
13
, , , .
A large-calibration analysis of mRNA polyadenylation of human and mouse genes
,
Nucleic Acids Res.
,
2005
, vol.
33
(pg.
201
-
212
)
14
, , , , .
Patterns of variant polyadenylation betoken usage in human genes
,
Genome Res.
,
2000
, vol.
ten
(pg.
1001
-
1010
)
15
.
Eukaryotic mRNA three′ processing: a common ways to different ends
,
Genes Dev.
,
2005
, vol.
19
(pg.
2517
-
2521
)
16
, , .
Formation of mRNA iii′ ends in eukaryotes: mechanism, regulation and interrelationships with other steps in mRNA synthesis
,
Microbiol. Mol. Biol. Rev.
,
1999
, vol.
63
(pg.
405
-
445
)
17
, , .
A multispecies comparison of the metazoan 3′-processing downstream elements and the CstF-64 RNA recognition motif
,
BMC Genomics
,
2006
, vol.
7
pg.
55
eighteen
, .
Computational assay of 3′-ends of ESTs shows 4 classes of alternative polyadenylation in homo, mouse and rat
,
Genome Res.
,
2005
, vol.
15
(pg.
369
-
375
)
19
, , .
Widespread mRNA polyadenylation events in introns indicate dynamic interplay between polyadenylation and splicing
,
Genome Res.
,
2007
, vol.
17
(pg.
156
-
165
)
20
.
Mechanisms controlling production of membrane and secreted immunoglobulin during B cell development
,
Immunol Res.
,
2007
, vol.
37
(pg.
33
-
46
)
21
, , .
Biased alternative polyadenylation in human tissues
,
Genome Biol.
,
2005
, vol.
6
pg.
R100
22
, , , .
Hu proteins regulate polyadenylation by blocking sites containing U-rich sequences
,
J. Biol. Chem.
,
2007
, vol.
282
(pg.
2203
-
2210
)
23
, , , .
Culling polyadenylation of cyclooxygenase-2
,
Nucleic Acids Res.
,
2005
, vol.
33
(pg.
2565
-
2579
)
24
, , .
Regulation of nuclear poly(A) addition controls the expression of immunoglobulin M secretory mRNA
,
EMBO J.
,
2001
, vol.
20
(pg.
6443
-
6452
)
25
, , .
Alternative poly(A) site selection in complex transcription units: means to an end?
,
Nucleic Acids Res.
,
1997
, vol.
25
(pg.
2547
-
2561
)
26
, , .
Differences in polyadenylation site selection between somatic and male germ cells
,
BMC Mol. Biol.
,
2006
, vol.
7
pg.
35
27
, , , , , , , , , , et al.
Initial sequencing and assay of the human genome
,
Nature
,
2001
, vol.
409
(pg.
860
-
921
)
28
.
Interspersed repeats and other mementos of transposable elements in mammalian genomes
,
Curr. Opin. Genet. Dev.
,
1999
, vol.
9
(pg.
657
-
663
)
29
, , , , , .
Touch of transposable elements on the evolution of mammalian gene regulation
,
Cytogenet. Genome. Res.
,
2005
, vol.
110
(pg.
342
-
352
)
30
, , , .
Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions
,
Trends Genet.
,
2003
, vol.
19
(pg.
530
-
536
)
31
, , , , , , , , .
Species-specific endogenous retroviruses shape the transcriptional network of the homo tumor suppressor protein p53
,
Proc. Natl Acad. Sci. United states of america
,
2007
, vol.
104
(pg.
18613
-
18618
)
32
, .
Comparison of multiple vertebrate genomes reveals the birth and evolution of homo exons
,
Proc. Natl Acad. Sci. Usa
,
2006
, vol.
103
(pg.
13427
-
13432
)
33
.
The nascence of new exons: mechanisms and evolutionary consequences
,
RNA
,
2007
, vol.
13
(pg.
1603
-
1608
)
34
, , , , , .
Comparative analysis of transposed element insertion within human and mouse genomes reveals Alu's unique role in shaping the man transcriptome
,
Genome Biol.
,
2007
, vol.
eight
pg.
R127
35
, , , , , , .
Cistron part and expression level influence the insertion/fixation dynamics of distinct transposon families in mammalian introns
,
Genome Biol.
,
2006
, vol.
seven
pg.
R120
36
, , , , , , .
Human Alu RNA is a modular transacting repressor of mRNA transcription during heat shock
,
Mol. Cell
,
2008
, vol.
29
(pg.
499
-
509
)
37
, , .
Transcriptional disruption by the L1 retrotransposon and implications for mammalian transcriptomes
,
Nature
,
2004
, vol.
429
(pg.
268
-
274
)
38
, , , , , , , , , , et al.
A unified nomenclature system for eukaryotic transposable elements
,
Nat. Rev. Genet.
,
2007
, vol.
8
(pg.
973
-
982
)
39
, , , .
Which transposable elements are active in the homo genome?
,
Trends Genet.
,
2007
, vol.
23
(pg.
183
-
191
)
twoscore
, , , , .
Natural genetic variation acquired by transposable elements in humans
,
Genetics
,
2004
, vol.
168
(pg.
933
-
951
)
41
, , .
Mammalian non-LTR retrotransposons: for improve or worse, in sickness and in wellness
,
Genome Res.
,
2008
, vol.
xviii
(pg.
343
-
358
)
42
, .
RNA truncation by premature polyadenylation attenuates man mobile element action
,
Nat Genet.
,
2003
, vol.
35
(pg.
363
-
366
)
43
, , , , , , , .
Human retroelements may introduce intragenic polyadenylation signals
,
Cytogenet. Genome Res.
,
2005
, vol.
110
(pg.
365
-
371
)
44
, , , .
PolyA_DB two: mRNA polyadenylation sites in vertebrate genes
,
Nucleic Acids Res.
,
2007
, vol.
35
(pg.
D165
-
168
)
45
.
BLAT—the Boom-like alignment tool
,
Genome Res.
,
2002
, vol.
12
(pg.
656
-
664
)
46
, , .
Identification of mRNA polyadenylation sites in genomes using cDNA sequences, expressed sequence tags and trace
,
Methods Mol. Biol.
,
2008
, vol.
419
(pg.
23
-
37
)
47
, .
Identification of alternate polyadenylation sites and assay of their tissue distribution using EST data
,
Genome Res.
,
2001
, vol.
11
(pg.
1520
-
1526
)
48
, , .
Prediction of mRNA polyadenylation sites past back up vector machine
,
Bioinformatics
,
2006
, vol.
22
(pg.
2320
-
2325
)
49
, .
Detection of polyadenylation signals in homo DNA sequences
,
Gene
,
1999
, vol.
231
(pg.
77
-
86
)
50
, , .
Molecular evolution and tempo of amplification of human LINE-i retrotransposons since the origin of primates
,
Genome Res.
,
2006
, vol.
16
(pg.
78
-
87
)
51
, .
Progress in understanding the biological science of the human mutagen LINE-1
,
Hum Mutat.
,
2007
, vol.
28
(pg.
527
-
539
)
52
, , , , , .
Molecular archeology of L1 insertions in the man genome
,
Genome Biol.
,
2002
, vol.
3
research0052
53
, .
Distribution of the mammalian-wideinterspersed repeats (MIRs) in the isochores of the man genome
,
FEBS J.
,
1998
, vol.
439
(pg.
63
-
65
)
54
, .
MIRs are classic, tRNA-derived SINEs that amplified before the mammalian radiation
,
Nucleic Acids Res.
,
1995
, vol.
23
(pg.
98
-
102
)
55
, .
Role of poly(A) tail length in Alu retrotransposition
,
Genomics
,
2005
, vol.
86
(pg.
378
-
381
)
56
, .
Sequence conservation in Alu evolution
,
Nucleic Acids Res.
,
1989
, vol.
17
(pg.
2477
-
2491
)
57
, , , , , , .
Structure and variability of recently inserted Alu family members
,
Nucleic Acids Res.
,
1990
, vol.
18
(pg.
6793
-
6798
)
58
, , , , .
Conservation of alternative polyadenylation patterns in mammalian genes
,
BMC Genomics
,
2006
, vol.
7
pg.
189
59
, .
Mammalian microRNAs derived from genomic repeats
,
Trends Genet.
,
2005
, vol.
21
(pg.
322
-
326
)
sixty
, , , .
The association of Alu repeats with the generation of potential AU-rich elements (ARE) at 3′ untranslated regions
,
BMC Genomics
,
2004
, vol.
v
pg.
97
61
, , .
Alu chemical element-mediated factor silencing
,
EMBO J.
,
2008
, vol.
27
(pg.
1694
-
1705
)
© 2008 The Writer(southward)
This is an Open Admission article distributed under the terms of the Creative Eatables Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Supplementary information
Submit a comment
You accept entered an invalid code
Thank you for submitting a comment on this article. Your annotate will be reviewed and published at the journal'south discretion. Please bank check for further notifications by e-mail.
Source: https://academic.oup.com/nar/article/36/17/5581/2410640
Post a Comment for "Role of Polya at 3 End of Line and Sine/review Journal"