Iranome Genomic Variation Database
Access to clinical genetic testing has been growing continuously around the globe since the introduction of the next generation sequencing technology to the field of genetics about a decade ago. Widespread access to genetic testing will have a remarkable impact on realizing the vision of precision medicine to improve the prevention, diagnosis and treatment of human disorders, many of which have genetic etiology. However, many ethnic groups are not represented in current human genome variation databases. The benefits of precision medicine may not be realized for these groups if we do not address this gap. With this in mind, and considering ethical, cultural and social aspects we in collaboration with Dr. Hossein Najmabadi at Social Welfare and Rehabilitation University, Tehran, Iran, established the Iranome database (www.iranome.com) by performing whole exome sequencing on 800 individuals from eight major ethnic groups that live in Iran. The groups include 100 healthy individuals from each of Iranian Arabs, Azeris (Turk), Balochs, Kurds, Lurs, Persians, Persian Gulf Islanders and Turkmen ethnic groups, which represent over 80 million Iranians and to some degree half a billion individuals who live in the Middle East. These ethnic groups are among the most underrepresented populations in currently available human genome variation databases. Principle component analysis indicates that except the Iranian Baloch and Persian Gulf Islander populations, which form their own clusters, the rest of the populations are genetically very close to each other. Combined analysis of this dataset with the 1000 Genome project's data showed that these ethnic groups form the six super population which is genetically distinct from other five previously known super populations of the 1000 Genome project.
In total, we identified 1,575,702 variants within protein coding regions captured by the SureSelect Human All Exon V6 kit, which passed the QC filter. These high quality variants included 1,332,298 SNPs and 243,404 insertions/deletions (indels) and represent one variant in every ~38 bp of the captured 60 Mbp exome interval.
Among these 1,575,702 variants, 52.5% were singletons and 308,311 variants (including 240,256 SNPs and 68,055 indels) had no record in the following public databases: dbSNP catalogue (version 149), ExAC database, gnomAD database, NHLBI ESP6500 database, 1kG Phase3, the Avon Longitudinal Study of Parents and Children (ALSPAC) dataset, UK10K Twins dataset and TOPMed dataset; therefore, they were considered to be novel variants representing 19.6% of the entire detected variants. As expected, the majority of these novel variants were singletons (81%).
On the other hand, an additional 50% (793,806) of the detected variants were observed in public databases but with an allele frequency less than 0.01 (rare variants). Therefore, about 70% of the variants identified in this dataset belong to the category of rare/novel variants. However, among these 1,102,117 rare/novel variants, we identified 37,384 variants with an alternate allele frequency of more than 1% in the Iranome dataset. Therefore, in addition to introducing 308,311 novel variants to the catalogue of human genomic variation, the Iranome database can improve the power of molecular diagnosis by showing an alternative allele frequency of higher than 1% for 37,384 novel or previously known rare variants.
The Iranome database website: www.iranome.com