Please wait a minute...

中国生物工程杂志

CHINA BIOTECHNOLOGY
中国生物工程杂志  2011, Vol. 31 Issue (7): 45-53    
研究报告     
基于短序列测序数据的四倍体拟南芥转录组研究
刘新星, 陈超
中南大学资源加工与生物工程学院 长沙 410083
De Novo Assembly of Allotetraploid Arabidopsis suecica Transcriptome using Short Reads for Gene Discovery and Marker Identification
LIU Xin-xing, CHEN Chao
Resources and Bioengineering School at Central South University, Changsha 410083, China
 全文: PDF(1276 KB)   HTML
摘要:

为了促进对四倍体拟南芥(A.suecica)的研究, 阐明多倍体植物在染色体加倍过程中遗传物质的变化,从而在分子层面上解释多倍体植物的环境适应和进化机制,描述了一套基于第二代测序技术的转录组短序列组装和生物信息学分析方法。通过对23 000 000条来至于Illumina测序平台的序列数据进行SOAPdenovo组装,以及后续的TGICL聚类和Phrap拼接, 共得到125 953条非冗余的转录本序列,其N50和平均长度分别为550bp和331bp。通过BLASTX比对,共有96 057(76.3%)条转录本序列与Nr数据库中的植物蛋白序列具有高度同源性(e-value<10-5),对转录本序列的GO(gene ontology)注释、COG(clusters of orthologous groups)分类以及代谢通路分析也显示A.suecica中的许多基因具有重要的蛋白功能。另外,将A.suecica转录组的GC含量与其相邻物种进行了比较分析,并对简单重复序列(SSRs)进行了鉴定。研究结果表明基于短序列测序数据的多重kmer组装对于转录组分析的可行性,并且为其他相关物种的转录组组装和基因表达分析提供了重要的参考价值。

关键词: Arabidopsis suecica转录组组装SOAPdenovo第二代测序技术    
Abstract:

To facilitate the research on Arabidopsis suecica (A.suecica), a method was presented for de novo assembly of A.suecica transcriptome using short reads produced by Illumina sequencing platform. 23 million sequencing reads were assembled into 125 953 unique sequences with the N50 length of 550 bp and mean size of 331 bp. At the protein level, a total of 96 057 (76.3%) A.suecica transcripts showed significant similarity with transcripts proteins from the other plants in the Nr database. Functional categorization revealed the conservation of genes involved in various biological processes in A.suecica. In addition, simple sequence repeats(SSRs) motifs in the A.suecica transcriptome was identified. The data provides a comprehensive sequence resource available for A.suecica study and demonstrates that the short pair-end reads sequencing allows de novo transcriptome assembly in a allotetraploid species lacking genome information. It is anticipated that the next generation sequencing(NGS) technologies significantly accelerate the research of the transcriptome in both model and non-model organisms. In addition, the strategy for de novo assembly of transcriptome data presented here will be helpful in other similar transcriptome studies.

Key words: Arabidopsis suecica    Transcriptome assembly    SOAPdenovo    NGS(next generation sequencing)
收稿日期: 2011-03-14 出版日期: 2011-07-25
ZTFLH:  Q75  
基金资助:

国家自然科学基金(50774102)、国家"973"计划(2010CB630900) 资助项目

服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
刘新星
陈超

引用本文:

刘新星, 陈超. 基于短序列测序数据的四倍体拟南芥转录组研究[J]. 中国生物工程杂志, 2011, 31(7): 45-53.

LIU Xin-xing, CHEN Chao. De Novo Assembly of Allotetraploid Arabidopsis suecica Transcriptome using Short Reads for Gene Discovery and Marker Identification. China Biotechnology, 2011, 31(7): 45-53.

链接本文:

https://manu60.magtech.com.cn/biotech/CN/        https://manu60.magtech.com.cn/biotech/CN/Y2011/V31/I7/45


[1] Jakobsson M, Hagenblad J, Tavaré S, et al. A unique recent origin of the allotetraploid species Arabidopsis suecica: evidence from nuclear DNA markers. Molecular biology and evolution, 2006, 23(6): 1217-1231.


[2] Koch M A, Matschinger M. Evolution and genetic differentiation among relatives of Arabidopsis thaliana. Proceedings of the National Academy of Sciences, 2007, 104(15): 6272-6277.


[3] Ansorge W J. Next-generation DNA sequencing techniques. New biotechnology, 2009, 25(4): 195-203.


[4] Smith D R, Quinlan A R, Peckham H E, et al. Rapid whole-genome mutational profiling using next-generation sequencing technologies. Genome research, 2008, 18(10): 1638-1642.


[5] Huang W, Marth G. EagleView: a genome assembly viewer for next-generation sequencing technologies. Genome research, 2008, 18(9): 1538-1543.


[6] Blow N. Transcriptomics: The digital generation. Nature, 2009, 458(7235): 239-242.


[7] Wilhelm B T, Landry J R. RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. Methods, 2009, 48(3): 249-257.


[8] Haas B J, Zody M C. Advancing RNA-Seq analysis. Nature biotechnology, 2010, 28(5): 421-423.


[9] Nagalakshmi U, Wang Z, Waern K, et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science, 2008, 320(5881): 1344-1349.

[10] Trapnell C, Williams B A, Pertea G, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology, 2010, 28(5): 511-515.

[11] Miura F, Kawaguchi N, Sese J, et al. A large-scale full-length cDNA analysis to explore the budding yeast transcriptome. Proceedings of the National Academy of Sciences, 2006, 103(47): 17846-17851.

[12] Babik W, Stuglik M, Qi W, et al. Heart transcriptome of the bank vole(Myodes glareolus): towards understanding the evolutionary variation in metabolic rate. BMC genomics, 2010, 11(1): 390-403.

[13] Chang P L, Dilkes B P, McMahon M, et al. Homoeolog-specific retention and use in allotetraploid Arabidopsis suecica depends on parent of origin and network partners. Genome Biology, 2010, 11(12): R125.

[14] Li R, Zhu H, Ruan J, et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome research, 2010, 20(2): 265-272.

[15] Pertea G, Huang X, Liang F,et al. TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics, 2003, 19(5): 651-652.

[16] Jones T, Federspiel N A, Chibana H, et al. The diploid genome sequence of Candida albicans. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(19): 7329-7334.

[17] Vogel J P, Gu Y Q, Twigg P, et al. EST sequencing and phylogenetic analysis of the model grass Brachypodium distachyon. TAG Theoretical and Applied Genetics, 2006, 113(2): 186-195.

[18] Conesa A, Gtz S, García-Gómez J M, et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics, 2005, 21(18): 3674-3676.

[19] Ye J, Fang L, Zheng H, et al. WEGO: a web tool for plotting GO annotations. Nucleic acids research, 2006, 34(suppl 2): W293-W297.

[20] Garg R, Patel R K, Tyagi A K, et al. De Novo Assembly of Chickpea Transcriptome Using Short Reads for Gene Discovery and Marker Identification. DNA research, 2011, 18(1): 53-63.

[21] Li R, Yu C, Li Y, et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics, 2009, 25(15): 1966-1967.

[22] Mortazavi A, Williams B A, McCue K, et al. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods, 2008, 5(7): 621-628.

[23] Varshney R K, Graner A, Sorrells M E. Genic microsatellite markers in plants: features and applications. TRENDS in Biotechnology, 2005, 23(1): 48-55.

[24] Vera J C, Wheat C W, Fescemyer H W, et al. Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Molecular Ecology, 2008, 17(7): 1636-1647.

[25] Li R, Fan W, Tian G, et al. The sequence and de novo assembly of the giant panda genome. Nature, 2009, 463(7279): 311-317.

[26] Birol I, Jackman1 S D, Nielsen1 C B, et al. De novo transcriptome assembly with ABySS. Bioinformatics, 2009, 25(21): 2872-2877.

[1] 满朝来, 李凤, 唐高霞, 甄鑫, 弭晓菊. Akirin基因研究进展[J]. 中国生物工程杂志, 2012, 32(03): 106-109.
[2] 刘菊华, 徐碧玉, 张建平, 贾彩红, 王甲水, 张建斌, 金志强. 香蕉基因组测序及胁迫相关功能基因研究进展[J]. 中国生物工程杂志, 2012, 32(03): 110-114.
[3] 艾瑞婷, 王德平. "十一五"863计划"基于功能基因组和结构基因组的药物分子设计"重点项目课题布局及实施情况分析[J]. 中国生物工程杂志, 2012, 32(01): 124-128.
[4] 李霞, 刘佳佳, 陈建华, 栾明宝, 殷珍珍, 杨栋梁. 产喜树碱喜树内生真菌的筛选及喜树内生真菌的SRAP分析[J]. 中国生物工程杂志, 2011, 31(7): 60-64.
[5] 成晓杰, 仇天雷, 王敏, 张姝, 蔡金国, 高俊莲. 低温沼气发酵微生物区系的筛选及其宏基因组文库构建[J]. 中国生物工程杂志, 2010, 30(11): 50-55.
[6] 陈新, 曾长英, 卢诚, 王文泉. 基于PCR技术的miRNA定量检测方法[J]. 中国生物工程杂志, 2010, 30(11): 88-93.
[7] 毛建平 王全会 周颖 方静 崔玉芳. 桥式PCR,一种简易连接DNA标签序列的方法[J]. 中国生物工程杂志, 2009, 29(11): 66-69.
[8] 汪军玲,王松太. miRNA体内作用靶标鉴定的策略[J]. 中国生物工程杂志, 2009, 29(03): 85-88.
[9] 任林柱,史利军. 狂犬病毒反向遗传学技术的研究及应用进展[J]. 中国生物工程杂志, 2009, 29(03): 89-93.
[10] 杨霞,刘芳,孟良玉,张治洲. 乙二醇和1,2-丙二醇增强富含GC碱基人类基因组DNA模板的PCR扩增[J]. 中国生物工程杂志, 2009, 29(03): 69-73.