How GeneStory's AI Turns 3 Billion Base Pairs Into a Personalized Health Report

February 18, 2026  |  10 min read  |  AI & Technology

AI Genomics

Your DNA contains approximately 3.2 billion base pairs — an information density equivalent to 1.5 gigabytes of raw data. Every cell in your body carries this complete genomic blueprint, yet the vast majority of it is identical to every other human being who has ever lived. What makes you uniquely you — medically, ancestrally, metabolically — lies in the roughly 4–5 million positions where your genome differs from the standard reference. Turning those differences into a meaningful, actionable health report in 21 days requires a machine learning pipeline of remarkable sophistication.

Step 1: Sample Collection and DNA Extraction

The genomic journey begins with a saliva collection kit. Customers provide a small saliva sample — approximately 2ml — containing millions of cheek epithelial cells, each containing a complete copy of their genome. The sample is stabilized in a proprietary buffer solution and shipped to GeneStory's ISO 15189-certified laboratory in Ho Chi Minh City.

In the lab, DNA is extracted from the cellular material using automated liquid handling systems. The extraction process must yield high-quality, high-molecular-weight genomic DNA — fragmented or degraded DNA produces unreliable sequencing results. Quality control checkpoints at this stage assess DNA concentration, purity (A260/A280 ratio), and fragment length distribution. Only samples passing all quality thresholds proceed to sequencing.

Step 2: Genome Sequencing

GeneStory uses two sequencing approaches depending on the product tier. The flagship whole-genome sequencing (WGS) service uses Illumina NovaSeq 6000 platforms to sequence the complete genome at 30x coverage — meaning each position in the genome is read an average of 30 times. This level of coverage achieves >99.9% genotype accuracy across the genome.

The genotyping array service uses high-density SNP arrays (approximately 700,000 variant positions) as a more accessible entry point, offering excellent coverage of clinically relevant known variants at substantially lower cost. The VietGenDB reference panel enables accurate imputation of ungenotyped variants from array data, effectively extending coverage to over 10 million variant positions.

Step 3: Bioinformatics Processing

Raw sequencing data (FASTQ files for WGS, intensity files for arrays) enters GeneStory's automated bioinformatics pipeline. For WGS data, this pipeline follows GATK (Genome Analysis Toolkit) best practice workflows:

  • Read alignment — Sequencing reads are mapped to the GRCh38 reference genome using BWA-MEM2, a fast and accurate aligner optimized for short-read sequencing data
  • Duplicate marking — PCR-amplified duplicates are identified and removed using Picard MarkDuplicates
  • Base quality score recalibration — Systematic sequencing errors are corrected using empirical recalibration models
  • Variant calling — HaplotypeCaller identifies SNPs and small insertions/deletions (indels) with high sensitivity and specificity
  • Variant filtration — Variant quality score recalibration (VQSR) models distinguish true variants from sequencing artifacts

Step 4: Vietnamese-Calibrated Variant Annotation

Raw variant calls have limited clinical meaning without context. GeneStory's proprietary annotation engine adds layers of biological and clinical meaning to each variant:

  • VietGenDB annotation — variant frequencies are reported relative to GeneStory's Vietnamese reference population (not gnomAD European frequencies)
  • Clinical databases — ClinVar, OMIM, and PharmGKB classifications are integrated for known clinically significant variants
  • Functional annotation — gene impact, protein consequence, and conservation scores (CADD, REVEL) help assess likely functional effect
  • Pathway annotation — variants are mapped to biological pathways (KEGG, Reactome) to identify systemic patterns in individual genetic profiles

Step 5: AI-Powered Risk Modeling

This is where GeneStory's machine learning infrastructure takes over. Rather than applying simple look-up tables of known variants, GeneStory's AI models combine multiple classes of genetic signal to compute personalized risk scores across 300+ health traits.

For complex traits — diabetes, cardiovascular disease, depression — genetic risk is distributed across thousands of common variants, each with tiny individual effect sizes. Polygenic risk scores (PRS) aggregate these signals into a single number representing an individual's genetic predisposition. GeneStory has developed Vietnam-specific PRS models for 35 complex diseases, trained and validated on Vietnamese cohort data — a critical distinction from PRS models trained on European GWAS studies, which perform poorly when applied to Asian populations.

The AI models also incorporate gene-gene interactions (epistasis), gene-environment interaction estimates, and protective variant identification — building a three-dimensional picture of genetic risk that goes well beyond simple variant-disease look-up.

Step 6: Clinical Curation and Report Generation

All algorithmically generated findings pass through a clinical curation layer staffed by GeneStory's team of board-certified clinical geneticists, genetic counselors, and medical reviewers. Findings classified as clinically significant — particularly hereditary disease risk variants — undergo expert review before inclusion in the customer report.

The final report is generated by GeneStory's report rendering engine, which translates technical genomic findings into clear, jargon-free language calibrated for a health-aware but non-specialist audience. Visualizations, risk percentile charts, and actionable recommendations are automatically generated and then reviewed for clinical accuracy by the curation team.

The 21-Day Commitment

From sample receipt to report delivery, GeneStory guarantees a 21-day turnaround. This timeline reflects careful balancing of scientific rigor — particularly the curation review stage — with the urgency of health information. The 21-day window is among the fastest in the industry for a clinically validated, expert-curated genomic health report.

This pipeline — combining cutting-edge sequencing technology, Vietnamese-specific bioinformatics, sophisticated AI risk modeling, and expert clinical curation — is what separates a GeneStory health report from a simple novelty DNA test. It's precision medicine infrastructure built for the genetic reality of Vietnam.

AI & Technology Machine Learning Bioinformatics Whole Genome Sequencing

More From Our Blog

Genome Database
Research

Building Vietnam's Largest Genome Reference Database: Our Journey

April 7, 2026  |  8 min read
Read More
Pharmacogenomics
Pharmacogenomics

Why Your Genes Determine How You Respond to Common Medications

April 14, 2026  |  6 min read
Read More
Cancer Prevention
Disease Prevention

Proactive Cancer Prevention: How Genetic Screening Is Changing Oncology in Vietnam

March 3, 2026  |  9 min read
Read More
Ready to Learn More?

Decode Your DNA Today

Join thousands of Vietnamese individuals who have unlocked their health potential with GeneStory.