NIH Records
Model Accuracy
Margin of Error
The estimated ±13.02% margin of error reflects clinical bias, historical misdiagnoses, label noise, and environmental variability across patient profiles.
Global Context
According to the World Health Organization, approximately 40% of the population will face a cancer diagnosis at least once during their lifetime.
Cancer arises when mutations disable DNA-repair and growth-control genes, leading to unchecked cell proliferation and malignant tumor development.
Disease Genetics
Unlike monogenic disorders, most forms of cancer are polygenic and multifactorial.
Accurate prediction therefore requires a broader analysis of genetic context interacting with environmental exposure, stress, lifestyle, and triggering conditions.
Methodology & Workflow
HealthRegen is designed as a calibrated preventive pipeline: it learns from large clinical datasets, detects early statistical signals of predisposition, requests missing data when needed, and improves through validated outcomes.
Training on Large-Scale Data
The model was trained on 100,000 anonymized records from the National Institutes of Health (NIH), covering patients aged 21 to 65 and balanced across sex and race.
Pattern Detection
The neural network detects macro- and micro-patterns that may signal cancer predisposition, identifying statistically relevant neighboring features and temporal trends across datasets.
Patient Input & Risk Computation
The system receives patient records, timing, exposures, lifestyle, and family history, then computes a calibrated probabilistic risk estimate instead of a rigid binary answer.
Missing Data Recovery
If the profile is incomplete, the platform requests additional fields and redirects the process toward the next relevant medical checks instead of forcing an unreliable assessment.
Validation & Model Update
When real outcomes become available, the system learns from confirmed results, analyzes bias and label noise, and updates the model to improve calibration and stability.
Observed Results
Patients in the Test Cohort
Accuracy in Predicting Cancer Predisposition
Potential Improvement from Raw Digitized Records
On a 50,000-patient test cohort, the model reached 67.94% accuracy in predicting cancer predisposition. The estimated ±13.02% margin reflects clinical bias, misdiagnoses, and environmental variation. Integrating raw digitized medical records would likely reduce this error and improve calibration and stability by at least 7%.
Clinical Integration
The platform is designed to support medical decision-making, not replace it. In practice, it can:
- Request missing fields in a patient profile.
- Flag possible predispositions that justify immediate screening.
- Guide clinicians toward the next relevant medical checks.
- Provide feedback on modifiable risk factors such as stress, lifestyle, and environmental exposure.
Limitations & Ethics
Prediction is not diagnosis. Output quality depends directly on the quality, completeness, and representativeness of the input data.
Models must be continuously monitored for demographic bias, label noise, dataset imbalance, and environmental mismatch between training populations and real-world patients.
Model Performance Over Time
The evolution of {Health}Re·gen model accuracy, together with upper and lower uncertainty bounds.
Programs that the project was developed through
July 1 – August 9, 2024
June 22 – July 4, 2025
ING Hubs Invention Fair (Sep 24, 2025)
ReCoNnect Consortium Innovation Prize (Sep 25, 2025)
$100,000/year Scholarship
Conclusion
Calibrated cancer-risk prediction based on routine medical data is a safer and more actionable approach than irreversible genetic intervention for diffuse, context-dependent risk scenarios.
Read the Full Research Context