This web is a portal of the Japanese reference genome in the Tohoku Medical Megabank Organization (ToMMo). Since Apr/2016, Japanese Reference Genome (JRG) v1 and the decoy assembly to GRCh38, decoyJRG v1 are available from the download.
Since 2012, Tohoku University and Iwate Medical University have addressed Tohoku Medical Megabank Project as a reconstruction project from the Great East Japan Earthquake and launched Tohoku Medical Megabank Organization (ToMMo) and Iwate Tohoku Medical Megabank Organization, respectively. Both organizations have been performing the cohort study of 150,000 residents in Miyagi and Iwate Prefectures and already have obtained registrations of approximately 130,000 people on April 2016. Towards the realization of precision medicine/prevention in Japan, Department of Integrative Genomics of ToMMo extracted DNA from blood samples provided by the participants of the cohort study and performed whole genome analysis with short-read-type next generation sequencer, HiSeq 2500 (Illumina), which can sequence 324 bases at once, to construct the Japanese Reference Panel (1KJPN). The outcome was released at iJGVD, one of the portal sites of ToMMo on August 29, 2014 and more than 10,000 researchers from 100 countries have accessed the site by April 1st, 2016. We reported that the newly found rare SNVs were suggested to affect diseases and characters in the international journal, Nature Communications on August 21st in 2015. On December 15th, we also released information of all the SNVs at the website, approximately 21.2 million in total.
However, using short-read-type next generation sequencer, the sequence length of one genome fragment is only several hundred bases (324 bases on the average in ToMMo) while the number of sequenced fragments at once reaches hundreds of million. Therefore, it was difficult to identify structural variants including repeated sequences in a human genome.
The goal is to construct the Japanese Reference Genome and to release it with annotative information (e.g. frequency in the Japanese population).
Identification of structural variants shared in Japanese enables us to accurately identify the differences among individuals. Recently, using the long-read-type next generation sequencer, PacBio RSII (Pacific Biosciences), we can continuously sequence more than 10,000 bases on the average. However, high error rate in sequencing was the problem of this sequencer. A human genome consists of 3-billion base pairs. Therefore, we repeatedly sequenced Japanese genomic DNA to obtain sequence information of 300-billion bases (as much as 100 times repeated sequencing of a genome). We have overcome high error rate of the sequencer by acquiring a massive amount of sequence information and apply a method of information science called de novo assembly, assembling the sequences from scratch with high accuracy. We finally succeeded construction and release of the present Japanese Reference Genome. This analysis was achieved running the super computer system of ToMMo for several months. The international organization, Genome Reference Consortium regularly manages and revises the international human reference genome sequence and the latest one in April 2016 is GRCh38. We succeeded to identify about 3,500 new insertion sequences (approximately 2.5-million bases in total) to GRCh38 by exhaustive comparison between GRCh38 and the present result of de novo assembly.
The Whole Genome Reference Panel of 1,070 Japanese ToMMo released was constructed using GRCh37, which was the latest international reference genome sequence at that time. We are planning to re-construct and release the Reference Panel with JRGv1 and decoyJRGv1, to enable it to be compared with the results of disease analyses using these Japanese Reference Genome sequences.
The information of the Japanese Whole Genome Reference Panel is provided at the sister website,iJGVD