In 2019, GeneStory launched with a single, audacious goal: build Southeast Asia's most comprehensive ethnically-specific genome reference database. Five years later, that database has grown to over 10,000 Vietnamese genomes — and it has fundamentally changed how we interpret DNA for millions of people.
Why Vietnam Needed Its Own Genome Reference
Genetic analysis is only as accurate as its reference population. When GeneStory's founders assessed the genomic landscape in 2018, they found a glaring gap: virtually all major genome reference databases — the Human Reference Genome, gnomAD, 1000 Genomes Project — were built predominantly from European and East Asian (primarily Chinese, Japanese, Korean) samples. Vietnamese genetic diversity, which reflects a complex history of migration, trade, and cultural exchange, was severely underrepresented.
This gap wasn't merely academic. When clinicians in Vietnam used international genomic tools to interpret patient results, variant classifications that labeled certain genetic changes as "benign" or "pathogenic" were based on allele frequencies from populations that don't genetically resemble Vietnamese individuals. The result? Potential misclassifications of clinically important variants — a patient's truly pathogenic mutation flagged as a common variant, or a rare but benign SNP mistaken for a disease risk marker.
Phase 1: The First 1,000 Genomes (2019–2020)
GeneStory's genome database project launched in partnership with VinBigData, the research arm of Vingroup, Vietnam's largest conglomerate. The initial phase set an ambitious target: sequence 1,000 Vietnamese individuals representing the country's major ethnic and geographic groups — Kinh (the majority group), Tay, Thai, Muong, Khmer, Cham, and several highland minority groups.
Recruitment was conducted through a network of partner hospitals and clinics across five provinces, combined with community outreach programs that explained the research goals in accessible language. Participants underwent blood sampling, provided informed consent, and completed a detailed health and lifestyle questionnaire. All samples were processed at GeneStory's ISO-certified laboratory in Ho Chi Minh City.
Whole-genome sequencing was performed at 30x coverage — the gold standard for clinical-grade genomic analysis — using Illumina NovaSeq 6000 platforms. By the end of 2020, GeneStory had sequenced 1,247 genomes, discovering over 15 million variant positions, of which nearly 2.4 million had not been previously reported in any international database.
Phase 2: Scale and Diversity (2021–2023)
The second phase expanded both sample size and ethnic diversity. Working with the Ministry of Health of Vietnam, GeneStory established partnerships with 28 provincial health centers, enabling recruitment from every geographic region of the country. Priority was given to ethnic minority groups, whose genetic distinctiveness from the Kinh majority was poorly characterized.
Key scientific milestones during this phase included:
- Identification of 47 novel disease-associated variants specific to the Vietnamese population
- Characterization of pharmacogenomic allele frequencies for all major CYP450 enzymes in Vietnamese sub-populations
- Publication of the first population-structure analysis of Vietnamese genetic diversity in the journal Nature Communications
- Establishment of a Vietnamese-specific variant frequency database (VietGenDB) that is now accessible to clinical researchers nationwide
By 2023, the database had reached 6,800 individuals — the largest collection of Vietnamese whole-genome sequences in the world.
Phase 3: Reaching 10,000 and Beyond (2024–Present)
The final push to 10,000 genomes focused on three priorities: increasing representation of northern and central highland ethnic groups, adding longitudinal health data to create a prospective cohort, and integrating the database with clinical electronic health records (EHRs) from partner hospitals.
The longitudinal component is particularly significant. Unlike static genome databases, GeneStory's cohort follows participants over time, linking their genetic data to actual health outcomes. This genotype-phenotype correlation data — currently including over 45,000 clinical data points — enables GeneStory's AI models to validate risk predictions against real disease incidence in Vietnamese individuals, continuously improving the accuracy of health reports.
What 10,000 Genomes Makes Possible
The practical impact of a Vietnam-specific reference database is profound. Every GeneStory health report is now interpreted against allele frequencies from individuals who share the same genetic background as our customers. This means:
- Variant classifications are calibrated to Vietnamese population frequencies, dramatically reducing false positive and false negative risk assessments
- Pharmacogenomic reports reflect enzyme activity predictions validated in Vietnamese metabolism studies
- Ancestry analyses use a reference panel that accurately represents Southeast Asian population structure
- Disease risk scores are validated against outcomes observed in Vietnamese patients — not European clinical trial cohorts
The Road Ahead
GeneStory's database goals extend well beyond Vietnam. Our next phase targets 50,000 genomes by 2028, encompassing additional Southeast Asian populations in partnership with research institutions in Thailand, Indonesia, and Singapore. The scientific vision is a Southeast Asian Genome Consortium — a shared resource that gives the region's 680 million people access to genomic medicine calibrated for their own populations.
For a country where genomic medicine is still emerging, Vietnam's 10,000-genome milestone represents something extraordinary: the scientific infrastructure to bring precision health to an entire nation — on its own genetic terms.