Your DNA contains approximately 3.2 billion base pairs — an information density equivalent to 1.5 gigabytes of raw data. Every cell in your body carries this complete genomic blueprint, yet the vast majority of it is identical to every other human being who has ever lived. What makes you uniquely you — medically, ancestrally, metabolically — lies in the roughly 4–5 million positions where your genome differs from the standard reference. Turning those differences into a meaningful, actionable health report in 21 days requires a machine learning pipeline of remarkable sophistication.
Step 1: Sample Collection and DNA Extraction
The genomic journey begins with a saliva collection kit. Customers provide a small saliva sample — approximately 2ml — containing millions of cheek epithelial cells, each containing a complete copy of their genome. The sample is stabilized in a proprietary buffer solution and shipped to GeneStory's ISO 15189-certified laboratory in Ho Chi Minh City.
In the lab, DNA is extracted from the cellular material using automated liquid handling systems. The extraction process must yield high-quality, high-molecular-weight genomic DNA — fragmented or degraded DNA produces unreliable sequencing results. Quality control checkpoints at this stage assess DNA concentration, purity (A260/A280 ratio), and fragment length distribution. Only samples passing all quality thresholds proceed to sequencing.
Step 2: Genome Sequencing
GeneStory uses two sequencing approaches depending on the product tier. The flagship whole-genome sequencing (WGS) service uses Illumina NovaSeq 6000 platforms to sequence the complete genome at 30x coverage — meaning each position in the genome is read an average of 30 times. This level of coverage achieves >99.9% genotype accuracy across the genome.
The genotyping array service uses high-density SNP arrays (approximately 700,000 variant positions) as a more accessible entry point, offering excellent coverage of clinically relevant known variants at substantially lower cost. The VietGenDB reference panel enables accurate imputation of ungenotyped variants from array data, effectively extending coverage to over 10 million variant positions.
Step 3: Bioinformatics Processing
Raw sequencing data (FASTQ files for WGS, intensity files for arrays) enters GeneStory's automated bioinformatics pipeline. For WGS data, this pipeline follows GATK (Genome Analysis Toolkit) best practice workflows:
- Read alignment — Sequencing reads are mapped to the GRCh38 reference genome using BWA-MEM2, a fast and accurate aligner optimized for short-read sequencing data
- Duplicate marking — PCR-amplified duplicates are identified and removed using Picard MarkDuplicates
- Base quality score recalibration — Systematic sequencing errors are corrected using empirical recalibration models
- Variant calling — HaplotypeCaller identifies SNPs and small insertions/deletions (indels) with high sensitivity and specificity
- Variant filtration — Variant quality score recalibration (VQSR) models distinguish true variants from sequencing artifacts
Step 4: Vietnamese-Calibrated Variant Annotation
Raw variant calls have limited clinical meaning without context. GeneStory's proprietary annotation engine adds layers of biological and clinical meaning to each variant:
- VietGenDB annotation — variant frequencies are reported relative to GeneStory's Vietnamese reference population (not gnomAD European frequencies)
- Clinical databases — ClinVar, OMIM, and PharmGKB classifications are integrated for known clinically significant variants
- Functional annotation — gene impact, protein consequence, and conservation scores (CADD, REVEL) help assess likely functional effect
- Pathway annotation — variants are mapped to biological pathways (KEGG, Reactome) to identify systemic patterns in individual genetic profiles
Step 5: AI-Powered Risk Modeling
This is where GeneStory's machine learning infrastructure takes over. Rather than applying simple look-up tables of known variants, GeneStory's AI models combine multiple classes of genetic signal to compute personalized risk scores across 300+ health traits.
For complex traits — diabetes, cardiovascular disease, depression — genetic risk is distributed across thousands of common variants, each with tiny individual effect sizes. Polygenic risk scores (PRS) aggregate these signals into a single number representing an individual's genetic predisposition. GeneStory has developed Vietnam-specific PRS models for 35 complex diseases, trained and validated on Vietnamese cohort data — a critical distinction from PRS models trained on European GWAS studies, which perform poorly when applied to Asian populations.
The AI models also incorporate gene-gene interactions (epistasis), gene-environment interaction estimates, and protective variant identification — building a three-dimensional picture of genetic risk that goes well beyond simple variant-disease look-up.
Step 6: Clinical Curation and Report Generation
All algorithmically generated findings pass through a clinical curation layer staffed by GeneStory's team of board-certified clinical geneticists, genetic counselors, and medical reviewers. Findings classified as clinically significant — particularly hereditary disease risk variants — undergo expert review before inclusion in the customer report.
The final report is generated by GeneStory's report rendering engine, which translates technical genomic findings into clear, jargon-free language calibrated for a health-aware but non-specialist audience. Visualizations, risk percentile charts, and actionable recommendations are automatically generated and then reviewed for clinical accuracy by the curation team.
The 21-Day Commitment
From sample receipt to report delivery, GeneStory guarantees a 21-day turnaround. This timeline reflects careful balancing of scientific rigor — particularly the curation review stage — with the urgency of health information. The 21-day window is among the fastest in the industry for a clinically validated, expert-curated genomic health report.
This pipeline — combining cutting-edge sequencing technology, Vietnamese-specific bioinformatics, sophisticated AI risk modeling, and expert clinical curation — is what separates a GeneStory health report from a simple novelty DNA test. It's precision medicine infrastructure built for the genetic reality of Vietnam.