This web is a portal of the Japanese reference genome in the Tohoku Medical Megabank Organization (ToMMo). Since June/2017, Japanese Reference Genome (JRG) v2 and the decoy assembly to GRCh38, decoyJRGv2 are available from the download.
Since 2012, Tohoku University and Iwate Medical University have addressed Tohoku Medical Megabank Project as a reconstruction project from the Great East Japan Earthquake and launched Tohoku Medical Megabank Organization (ToMMo) and Iwate Tohoku Medical Megabank Organization, respectively. Both organizations have been performing the cohort study of 150,000 residents in Miyagi and Iwate Prefectures and reached to 150,000 recruitments on March 2017. Towards the realization of precision medicine/prevention in Japan, Department of Integrative Genomics of ToMMo extracted DNA from blood samples provided by the 1,070 participants of the cohort study and performed whole genome analysis with short-read-type next generation sequencer, HiSeq 2500 (Illumina), which can sequence 324 bases at once, to construct the Japanese Reference Panel (1KJPN). The outcome was released at iJGVD (http://ijgvd.megabank.tohoku.ac.jp/), one of the portal sites of ToMMo on August 29, 2014 and more than 10,000 researchers from 100 countries have accessed the site by April 1st, 2016. We reported that the newly found rare SNVs were suggested to affect diseases and characters in the international journal, Nature Communications on August 21st in 2015. The Japanese Reference Panel has been extended to 2,049 people (2KJPN) and now the information of all the SNVs, approximately 28.0 million in total are available at the website (http://ijgvd.megabank.tohoku.ac.jp).
However, using short-read-type next generation sequencer, the sequence length of one genome fragment is only several hundred bases (324 bases on the average in ToMMo) while the number of sequenced fragments at once reaches hundreds of million. Therefore, it was difficult to identify structural variants including repeated sequences in a human genome.
The goal is to construct the Japanese Reference Genome and to release it with annotative information (e.g. frequency in the Japanese population).
Identification of structural variants shared in Japanese enables us to accurately identify the differences among individuals. Recently, using the long-read-type next generation sequencer, PacBio RSII (Pacific Biosciences), we can continuously sequence more than 10,000 bases on the average. However, high error rate in sequencing was the problem of this sequencer. A human genome consists of 3-billion base pairs. Therefore, we repeatedly sequenced Japanese genomic DNA to obtain sequence information of 300-billion bases (as much as 100 times repeated sequencing of a genome) for one individual. We have overcome high error rate of the sequencer by acquiring a massive amount of sequence information and apply a method of information science called de novo assembly, assembling the sequences from scratch with high accuracy. We finally succeeded construction and release of the present Japanese Reference Genome. This analysis was achieved running the super computer system of ToMMo for several months. The international organization, Genome Reference Consortium regularly manages and revises the international human reference genome sequence and the latest one in April 2016 is GRCh38. We succeeded to identify about 3,500 new insertion sequences (approximately 2.5-million bases in total) to GRCh38 by exhaustive comparison between GRCh38 and the present result of de novo assembly.
Previously, we succeeded to identify about 3,500 new insertion sequences (approximately 2.5-million bases in total) to GRCh38 by exhaustive comparison between GRCh38 and the present result of de novo assemblya and released the results as decoyJRGv1 and JRGv1 on June and August 2016, respectively. In addition, we sequenced the genomes of two more individuals (three in total) and by comparison with GRCh38, we obtained about 9,600 insertion sequences (approximately 6.2-million bases in total) and released the results as decoyJRGv2 and JRGv2.
The Whole Genome Reference Panel of 2,049 Japanese (2KJPN) ToMMo released was constructed using GRCh37, which was the latest international reference genome sequence when the project started. We are planning to re-construct and release the Reference Panel with JRG and decoyJRG, to enable it to be compared with the results of disease analyses using these Japanese Reference Genome sequences.
The information of the Japanese Whole Genome Reference Panel is provided at the sister website,iJGVD