综述 |
|
|
|
|
纳米孔测序信号处理及其在DNA数据存储的应用 |
葛奇1,张鹏1,韩明哲2,3,杨晋生1,张大璐4,*(),陈为刚1,3 |
1 天津大学微电子学院 天津 300072 2 天津大学化工学院 天津 300072 3 教育部合成生物学前沿科学中心 天津大学 天津 300072 4 中国生物技术发展中心 北京 100039 |
|
Signal Processing for Nanopore Sequencing and Its Application in DNA Data Storage |
GE Qi1,ZHANG Peng1,HAN Ming-zhe2,3,YANG Jin-sheng1,ZHANG Da-lu4,*(),CHEN Wei-gang1,3 |
1 School of Microelectronics, Tianjin University, Tianjin 300072, China 2 School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China 3 Frontiers Science Center for Synthetic Biology (MOE), Tianjin University, Tianjin 300072, China 4 China National Center for Biotechnology Development, Beijing 100039, China |
引用本文:
葛奇,张鹏,韩明哲,杨晋生,张大璐,陈为刚. 纳米孔测序信号处理及其在DNA数据存储的应用[J]. 中国生物工程杂志, 2021, 41(8): 75-89.
GE Qi,ZHANG Peng,HAN Ming-zhe,YANG Jin-sheng,ZHANG Da-lu,CHEN Wei-gang. Signal Processing for Nanopore Sequencing and Its Application in DNA Data Storage. China Biotechnology, 2021, 41(8): 75-89.
链接本文:
https://manu60.magtech.com.cn/biotech/CN/10.13523/j.cb.2104018
或
https://manu60.magtech.com.cn/biotech/CN/Y2021/V41/I8/75
|
[1] |
Kasianowicz J J, Bezrukov S M. On ‘three decades of nanopore sequencing'. Nature Biotechnology, 2016, 34(5):481-482.
doi: 10.1038/nbt.3570
pmid: 27153275
|
[2] |
National Institute of Standards and Technology(NIST), Semiconductor Research Corporation (SRC). 2018 Semiconductor synthetic biology roadmap. 2016-03-15. https://www.src.org/library/publication/p095387/p095387.pdf
|
[3] |
Semiconductor Industry Association (SIA), Semiconductor Research Corporation (SRC), Decadal plan for semiconductors. 2021-03-15. https://www.src.org/about/decadal-plan/
|
[4] |
伊克巴尔 S M, 巴希尔 R, 德克 C, 等. 刘全俊, 陆祖宏, 谢骁, 等. 译. 纳米孔: 生物分子相互作用传感基础. 北京: 科学出版社, 2013: 1-7.
|
|
Iqbal S M, Bashir R, et al. Nanopores: sensing and fundamental biological interactions. Liu Q J, Lu Z H, Xie X, et al. Beijing: Science Press, 2013: 1-7.
|
[5] |
鞠熀先, 张学记, 约瑟夫 W. 纳米生物传感: 原理, 发展与应用. 雷建平, 吴洁, 鞠熀先. 译. 北京: 科学出版社, 2012: 1-8.
|
|
Ju H X, Zhang X J, Joseph W. NanoBiosensing: principles, development and application. Lei J P, Wu J, Ju H X. Beijing: Science Press, 2012: 1-8.
|
[6] |
余静文, 陈云飞. 基于微纳制造的下一代基因测序系统研究现状与展望. 中国科学: 技术科学, 2017, 47(4):345-354.
|
|
Yu J W, Chen Y F. Research status and prospects of next generation sequencing system based on micro-nano manufacturing. Scientia Sinica (Technologica), 2017, 47(4):345-354.
|
[7] |
陈文辉, 罗军, 赵超. 固态纳米孔: 下一代DNA测序技术: 原理、工艺与挑战. 中国科学: 生命科学, 2014, 44(7):649-662.
|
|
Chen W H, Luo J, Zhao C. Solid-state nanopore: the next-generation sequencing technology-principles, fabrication and challenges. Scientia Sinica (Vitae), 2014, 44(7):649-662.
|
[8] |
张宇, 魏胜, 李民权, 等. 用于单个纳米颗粒检测的固态纳米孔器件的仿真与优化. 传感技术学报, 2015, 28(10):1425-1431.
|
|
Zhang Y, Wei S, Li M Q, et al. Simulation and optimization of solid-state nanopore for single-nanoparticle detection. Chinese Journal of Sensors and Actuators, 2015, 28(10):1425-1431.
|
[9] |
张庞, 唐鹏, 闫汉, 等. 基于LiCl盐浓度梯度的固态纳米孔DNA分子检测. 微纳电子技术, 2021, 58(1):72-79.
|
|
Zhang P, Tang P, Yan H, et al. Detection of DNA molecule with solid-state nanopores based on LiCl salt concentration gradient. Micronanoelectronic Technology, 2021, 58(1):72-79.
|
[10] |
Guo B Y, Zeng T, Wu H C. Recent advances of DNA sequencing via nanopore-based technologies. Science Bulletin, 2015, 60(3):287-295.
doi: 10.1007/s11434-014-0707-6
|
[11] |
丁克俭, 张海燕, 胡红刚, 等. 生物大分子纳米孔分析技术研究进展. 分析化学, 2010, 38(2):280-285.
doi: 10.1016/S1872-2040(09)60022-0
|
|
Ding K J, Zhang H Y, Hu H G, et al. Progress of research on nanopore-macromolecule detection. Chinese Journal of Analytical Chemistry, 2010, 38(2):280-285.
doi: 10.1016/S1872-2040(09)60022-0
|
[12] |
Yuan Z S, Wang C Y, Yi X, et al. Solid-state nanopore. Nanoscale Research Letters, 2018, 13(1):1-10.
doi: 10.1186/s11671-017-2411-3
|
[13] |
Deng T, Li M W, Wang Y F, et al. Development of solid-state nanopore fabrication technologies. Science Bulletin, 2015, 60(3):304-319.
doi: 10.1007/s11434-014-0705-8
|
[14] |
Chen Q, Liu Z W. Fabrication and applications of solid-state nanopores. Sensors, 2019, 19(8):1886.
doi: 10.3390/s19081886
|
[15] |
Luan B Q, Bai J W, Stolovitzky G. Fabricatable nanopore sensors with an atomic thickness. Applied Physics Letters, 2013, 103(18):183501.
doi: 10.1063/1.4826599
|
[16] |
陈剑, 邓涛, 吴次南, 等. 面向新型DNA检测方法的固态纳米孔研究进展. 微纳电子技术, 2013, 50(3):143-150.
|
|
Chen J, Deng T, Wu C N, et al. Research progress of solid-state nanopores for the new DNA detection method. Micronanoelectronic Technology, 2013, 50(3):143-150.
|
[17] |
Garalde D R, Snell E A, Jachimowicz D, et al. Highly parallel direct RNA sequencing on an array of nanopores. Nature Methods, 2018, 15(3):201-206.
doi: 10.1038/nmeth.4577
pmid: 29334379
|
[18] |
Faria N R, Sabino E C, Nunes M R T, et al. Mobile real-time surveillance of Zika virus in Brazil. Genome Medicine, 2016, 8(1):1-4.
doi: 10.1186/s13073-015-0257-9
|
[19] |
Stancu M C, van Roosmalen M J, Renkens I, et al. Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nature Communications, 2017, 8:1326.
doi: 10.1038/s41467-017-01343-4
|
[20] |
Stoloff D H, Wanunu M. Recent trends in nanopores for biotechnology. Current Opinion in Biotechnology, 2013, 24(4):699-704.
doi: 10.1016/j.copbio.2012.11.008
pmid: 23266100
|
[21] |
Wanunu M. Nanopores: a journey towards DNA sequencing. Physics of Life Reviews, 2012, 9(2):125-158.
doi: 10.1016/j.plrev.2012.05.010
pmid: 22658507
|
[22] |
Wescoe Z L, Schreiber J, Akeson M. Nanopores discriminate among five C5-cytosine variants in DNA. Journal of the American Chemical Society, 2014, 136(47):16582-16587.
doi: 10.1021/ja508527b
|
[23] |
Loose M, Malla S, Stout M. Real-time selective sequencing using nanopore technology. Nature Methods, 2016, 13(9):751-754.
doi: 10.1038/nmeth.3930
|
[24] |
Norris A L, Workman R E, Fan Y F, et al. Nanopore sequencing detects structural variants in cancer. Cancer Biology & Therapy, 2016, 17(3):246-253.
|
[25] |
Quick J, Loman N J, Duraffour S, et al. Real-time, portable genome sequencing for Ebola surveillance. Nature, 2016, 530(7589):228-232.
doi: 10.1038/nature16996
|
[26] |
Wang M, Fu A S, Hu B, et al. Nanopore target sequencing for accurate and comprehensive detection of SARS-CoV-2 and other respiratory viruses. medRxiv, 2020. DOI: 10.1101/2020.03.04.20029538.
doi: 10.1101/2020.03.04.20029538
|
[27] |
Chan J F W, Yuan S F, Kok K H, et al. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. The Lancet, 2020, 395(10223):514-523.
doi: 10.1016/S0140-6736(20)30154-9
|
[28] |
Prazsák I, Moldován N, Balázs Z, et al. Long-read sequencing uncovers a complex transcriptome topology in varicella zoster virus. BMC Genomics, 2018, 19(1):873.
doi: 10.1186/s12864-018-5267-8
pmid: 30514211
|
[29] |
Wee Y, Bhyan S B, Liu Y N, et al. The bioinformatics tools for the genome assembly and analysis based on third-generation sequencing. Briefings in Functional Genomics, 2019, 18(1):1-12.
doi: 10.1093/bfgp/ely037
|
[30] |
Jain M, Fiddes I T, Miga K H, et al. Improved data analysis for the MinION nanopore sequencer. Nature Methods, 2015, 12(4):351-356.
doi: 10.1038/NMETH.3290
|
[31] |
Shabardina V, Kischka T, Manske F, et al. NanoPipe: a web server for nanopore MinION sequencing data analysis. GigaScience, 2019, 8(2): giy169.
|
[32] |
Loman N J, Quick J, Simpson J T. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nature Methods, 2015, 12(8):733-735.
doi: 10.1038/nmeth.3444
|
[33] |
Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 2018, 34(18):3094-3100.
doi: 10.1093/bioinformatics/bty191
|
[34] |
Koren S, Walenz B P, Berlin K, et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research, 2017, 27(5):722-736.
doi: 10.1101/gr.215087.116
|
[35] |
Vaser R, Sović I, Nagarajan N, et al. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Research, 2017, 27(5):737-746.
doi: 10.1101/gr.214270.116
|
[36] |
Ferguson J M, Smith M A. SquiggleKit: a toolkit for manipulating nanopore signal data. Bioinformatics, 2019, 35(24):5372-5373.
doi: 10.1093/bioinformatics/btz586
pmid: 31332428
|
[37] |
Wick R R, Judd L M, Holt K E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. bioRxiv, 2019, DOI: 10.1101/543439.
doi: 10.1101/543439
|
[38] |
Leggett R M, Clark M D. A world of opportunities with nanopore sequencing. Journal of Experimental Botany, 2017, 68(20):5419-5429.
doi: 10.1093/jxb/erx289
|
[39] |
Jain M, Koren S, Miga K H, et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nature Biotechnology, 2018, 36(4):338-345.
doi: 10.1038/nbt.4060
|
[40] |
Wang L T, Qu L, Yang L S, et al. NanoReviser: an error-correction tool for nanopore sequencing based on a deep learning algorithm. Frontiers in Genetics, 2020, 11:900. DOI: 10.3389/fgene.2020.00900.
doi: 10.3389/fgene.2020.00900
|
[41] |
David M, Dursi L J, Yao D L, et al. Nanocall: an open source basecaller for Oxford Nanopore sequencing data. Bioinformatics, 2017, 33(1):49-55.
doi: 10.1093/bioinformatics/btw569
|
[42] |
Boža V, Brejová B, Vinaǐ T. DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads. PLoS One, 2017, 12(6):e0178751. DOI: 10.1371/journal.pone.0178751.
doi: 10.1371/journal.pone.0178751
|
[43] |
Stoiber M, Brown J. BasecRAWller: streaming nanopore basecalling directly from raw signal. bioRxiv, 2017, DOI: 10.1101/133058.
doi: 10.1101/133058
|
[44] |
Teng H T, Cao M D, Hall M B, et al. Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning. GigaScience, 2018, 7(5): giy037.
|
[45] |
Rang F J, Kloosterman W P, de Ridder J. From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy. Genome Biology, 2018, 19(1):90.
doi: 10.1186/s13059-018-1462-9
|
[46] |
Goodwin S, McPherson J D, Richard McCombie W. Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics, 2016, 17(6):333-351.
doi: 10.1038/nrg.2016.49
pmid: 27184599
|
[47] |
Li Y, Huang C, Ding L Z, et al. Deep learning in bioinformatics: Introduction, application, and perspective in the big data era. Methods, 2019, 166:4-21.
doi: 10.1016/j.ymeth.2019.04.008
|
[48] |
Makałowski W, Shabardina V. Bioinformatics of nanopore sequencing. Journal of Human Genetics, 2020, 65(1):61-67.
doi: 10.1038/s10038-019-0659-4
pmid: 31451715
|
[49] |
Yue J X, Liti G N. SimuG: a general-purpose genome simulator. Bioinformatics, 2019, 35(21):4442-4444.
doi: 10.1093/bioinformatics/btz424
|
[50] |
Lee H, Gurtowski J, Yoo S, et al. Error correction and assembly complexity of single molecule sequencing reads. bioRxiv, 2014. DOI: 10.1101/006395.
doi: 10.1101/006395
|
[51] |
Baker E A G, Goodwin S, Richard McCombie W, et al. SiLiCO: a simulator of long read sequencing in PacBio and Oxford nanopore. bioRxiv, 2016. DOI: 10.1101/076901.
doi: 10.1101/076901
|
[52] |
Yang C, Chu J, Warren R L, et al. NanoSim: nanopore sequence read simulator based on statistical characterization. GigaScience, 2017, 6(4): gix010.
|
[53] |
Li Y, Han R M, Bi C W, et al. DeepSimulator: a deep simulator for nanopore sequencing. Bioinformatics, 2018, 34(17):2899-2908.
doi: 10.1093/bioinformatics/bty223
|
[54] |
Li Y, Wang S, Bi C W, et al. DeepSimulator1.5: a more powerful, quicker and lighter simulator for nanopore sequencing. Bioinformatics, 2020, 36(8):2578-2580.
doi: 10.1093/bioinformatics/btz963
|
[55] |
Chen W G, Zhang P, Song L F, et al. Simulation of nanopore sequencing signals based on BiGRU. Sensors, 2020, 20(24):7244.
doi: 10.3390/s20247244
|
[56] |
Organick L, Ang S D, Chen Y J, et al. Random access in large-scale DNA data storage. Nature Biotechnology, 2018, 36(3):242-248.
doi: 10.1038/nbt.4079
pmid: 29457795
|
[57] |
Lopez R, Chen Y J, Ang S D, et al. DNA assembly for nanopore data storage readout. Nature Communications, 2019, 10:2933.
doi: 10.1038/s41467-019-10978-4
|
[58] |
Chen W G, Han M Z, Zhou J T, et al. An artificial chromosome for data storage. National Science Review, 2021, 8(5). DOI: 10.1093/nsr/nwab028.
doi: 10.1093/nsr/nwab028
|
[59] |
Ceze L, Nivala J, Strauss K. Molecular digital data storage using DNA. Nature Reviews Genetics, 2019, 20(8):456-466.
doi: 10.1038/s41576-019-0125-3
|
[60] |
Dong Y M, Sun F J, Ping Z, et al. DNA storage: research landscape and future prospects. National Science Review, 2020, 7(6):1092-1107.
doi: 10.1093/nsr/nwaa007
|
[61] |
丁明珠, 李炳志, 王颖, 等. 合成生物学重要研究方向进展. 合成生物学, 2020, 1(1):7-28.
|
|
Ding M Z, Li B Z, Wang Y, et al. Significant research progress in synthetic biology. Synthetic Biology Journal, 2020, 1(1):7-28.
|
[62] |
韩明哲, 陈为刚, 宋理富, 等. DNA信息存储: 生命系统与信息系统的桥梁. 合成生物学, 2021, 2(3):309-322.
|
|
Han M Z, Chen W G, Song L F, et al. DNA information storage: bridging biological and digital world. Synthetic Biology Journal, 2021, 2(3):309-322.
|
[63] |
钱珑, 沈玥, 元英进, 等. DNA数字信息存储: 造梦, 追梦与圆梦. 合成生物学, 2021, 2(3):303-304.
|
|
Qian L, Shen Y, Yuan Y J, et al. DNA digital information storage: dreaming, chasing and realizing. Synthetic Biology Journal, 2021, 2(3):303-304.
|
[64] |
陈为刚, 葛奇, 王盼盼, 等. 细胞内大片段DNA数据存储的多RS码交织编码. 合成生物学, 2021, 2(3):428-443.
|
|
Chen W G, Ge Q, Wang P P, et al. Multiple interleaved RS codes for data storage using up to Mb-scale synthetic DNA in living cells. Synthetic Biology Journal, 2021, 2(3):428-443.
|
[65] |
陈为刚, 黄刚, 李炳志, 等. 音视频文件的DNA信息存储. 中国科学: 生命科学, 2020, 50(1):81-85.
|
|
Chen W G, Huang G, Li B Z, et al. DNA information storage for audio and video files. Scientia Sinica (Vitae), 2020, 50(1):81-85.
|
[66] |
Varongchayakul N, Song J X, Meller A, et al. Single-molecule protein sensing in a nanopore: a tutorial. Chemical Society Reviews, 2018, 47(23):8512-8524.
doi: 10.1039/c8cs00106e
pmid: 30328860
|
[67] |
Jain M, Olsen H E, Paten B, et al. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biology, 2016, 17(1):1-11.
doi: 10.1186/s13059-015-0866-z
|
[68] |
Magi A, Giusti B, Tattini L. Characterization of MinION nanopore data for resequencing analyses. Briefings in Bioinformatics, 2017, 18(6):940-953.
|
[69] |
Goodwin S, Gurtowski J, Ethe-Sayers S, et al. Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Research, 2015, 25(11):1750-1756.
doi: 10.1101/gr.191395.115
pmid: 26447147
|
[70] |
Smith M, Chan R, Gordon P. Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms. PLoS One, 2019, 14(7):e0219495. DOI: 10.1371/journal.pone.0219495.
doi: 10.1371/journal.pone.0219495
|
[71] |
Schreiber J, Karplus K. Analysis of nanopore data using hidden Markov models. Bioinformatics, 2015, 31(12):1897-1903.
doi: 10.1093/bioinformatics/btv046
pmid: 25649617
|
[72] |
Davey M C, MacKay D J C. Reliable communication over channels with insertions, deletions, and substitutions. IEEE Transactions on Information Theory, 2001, 47(2):687-698.
doi: 10.1109/18.910582
|
[73] |
Hawkins J A, Jones S K Jr, Finkelstein I J, et al. Indel-correcting DNA barcodes for high-throughput sequencing. PNAS, 2018, 115(27):E6217-E6226. DOI: 10.1073/pnas.1802640115.
doi: 10.1073/pnas.1802640115
|
[74] |
Chen W G, Wang L X, Han M Z, et al. Sequencing barcode construction and identification methods based on block error-correction codes. Science China Life Sciences, 2020, 63(10):1580-1592.
doi: 10.1007/s11427-019-1651-3
|
[75] |
Chen W G, Wang P P, Wang L X, et al. Low-complexity and highly robust barcodes for error-rich single molecular sequencing. 3 Biotech, 2021, 11(2):1-11.
doi: 10.1007/s13205-020-02554-1
|
[76] |
Mercier H, Bhargava V K, Tarokh V. A survey of error-correcting codes for channels with symbol synchronization errors. IEEE Communications Surveys & Tutorials, 2010, 12(1):87-96.
|
[77] |
Liu Y, Chen W G. Iterative decoding for the concatenated code to correct nonbinary insertions/deletions. 2017 IEEE 85th Vehicular Technology Conference (VTC Spring). Sydney, NSW, Australia. IEEE, 2017: 1-5.
|
[78] |
Liu Y, Chen W G. An iterative decoding scheme for Davey-MacKay construction. China Communications, 2018, 15(6):187-195.
doi: 10.1109/CC.2018.8398515
|
[79] |
Liu Y, Chen W G. Hard-decision iterative decoder for the Davey-MacKay construction with symbol-level inner decoder. Electronics Letters, 2016, 52(12):1026-1028.
doi: 10.1049/ell2.v52.12
|
[80] |
Liu Y, Chen W G. Decoding on adaptively pruned trellis for correcting synchronization errors. China Communications, 2017, 14(7):1-9.
doi: 10.1109/CC.2017.8246482
|
[81] |
柳元, 陈为刚, 杨晋生. 针对非二进制同步错误的高效水印调制方案. 信号处理, 2017, 33(8):1034-1039.
|
|
Liu Y, Chen W G, Yang J S. Efficient watermark modulation schemes for correcting non-binary synchronization errors. Journal of Signal Processing, 2017, 33(8):1034-1039.
|
[82] |
张林林, 陈为刚, 刘敬浩, 等. 纠正同步错误的反转级联水印码的迭代译码. 信号处理, 2017, 33(2):144-151.
|
|
Zhang L L, Chen W G, Liu J H, et al. Iterative decoding of the reverse concatenated watermark code for correcting synchronization errors. Journal of Signal Processing, 2017, 33(2):144-151.
|
[83] |
柳元. 插入/删节错误纠错码的研究. 天津: 天津大学, 2017.
|
|
Liu Y. Research on insertions/deletions correcting codes. Tianjin: Tianjin University, 2017.
|
[84] |
张译方, 陈为刚. 纠正DPPM中插入删节错误的纠错码方案. 信息技术, 2014, 38(8):29-33.
|
|
Zhang Y F, Chen W G. Coding for correcting insertion/deletion errors in differential pulse-position modulation. Information Technology, 2014, 38(8):29-33.
|
[85] |
Chen W G, Liu Y. Efficient transmission schemes for correcting insertions/deletions in DPPM. 2016 IEEE International Conference on Communications (ICC). Kuala Lumpur, Malaysia. IEEE, 2016: 1-6.
|
[86] |
Chen W G, Wang L X, Han C C. Correcting insertions/deletions in DPPM using hidden Markov model. IEEE Access, 2020, 8:46417-46426.
doi: 10.1109/Access.6287639
|
[87] |
Escalona M, Rocha S, Posada D. A comparison of tools for the simulation of genomic next-generation sequencing data. Nature Reviews Genetics, 2016, 17(8):459-469.
doi: 10.1038/nrg.2016.57
pmid: 27320129
|
[88] |
Press W H, Hawkins J A, Jones S K, et al. HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints. PNAS, 2020, 117(31):18489-18496.
|
[89] |
Li H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics, 2016, 32(14):2103-2110.
doi: 10.1093/bioinformatics/btw152
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|