Genomic analysis of human microRNA transcripts.

PNAS 104(45):17719 (2007) PMID 17965236 PMCID PMC2077053

MicroRNAs (miRNAs) are important genetic regulators of development, differentiation, growth, and metabolism. The mammalian genome encodes approximately 500 known miRNA genes. Approximately 50% are expressed from non-protein-coding transcripts, whereas the rest are located mostly in the introns of coding genes. Intronic miRNAs are generally transcribed coincidentally with their host genes. However, the nature of the primary transcript of intergenic miRNAs is largely unknown. We have performed a large-scale analysis of transcription start sites, polyadenylation signals, CpG islands, EST data, transcription factor-binding sites, and expression ditag data surrounding intergenic miRNAs in the human genome to improve our understanding of the structure of their primary transcripts. We show that a significant fraction of primary transcripts of intergenic miRNAs are 3-4 kb in length, with clearly defined 5' and 3' boundaries. We provide strong evidence for the complete transcript structure of a small number of human miRNAs.