CSB – The Human Phenotype Project: Building the Infrastructure for Preventative Medicine

Insight Hub

>view archive

2026.02.27

The Human Phenotype Project: Building the Infrastructure for Preventative Medicine

By Rotem Gura Sadovsky and Eran Segal

This month, scientists, technologists, and investors from around the world gathered in Abu Dhabi for the HPP Global conference, put together by Nature Conferences. The group discussed a transformation: the future of medicine will be predictive, not reactive. At the center of that conversation was the Human Phenotype Project (HPP): a long-term project that could fundamentally reshape how we manage our health.

HPP and the World’s Deepest Multi-Omic Datasets

The HPP is, at its core, a data initiative. However, unlike most human cohorts, it makes a deliberate trade-off. Instead of collecting minimal data from millions of people, it collects extraordinarily deep data from 100,000 individuals and follows them for 25 years.

Most well-known cohorts, such as the UK Biobank, focus on scale. They gather a few data types such as clinical measurements and genomic data from very large populations. HPP takes a different approach. It emphasizes depth over breadth, collecting dozens of data modalities at synchronized time points.

Every two years, participants undergo snapshot measurements that include genomics, proteomics, metabolomics, microbiome analysis, imaging, and a wide range of physiological assessments — from gait analysis to hand grip strength. Between these visits, continuous measurement tools capture life as it is actually lived: sleep tracking, continuous glucose monitoring, and detailed diet logs collected over a period of two weeks.

This deep, multimodal and longitudinal structure is what makes HPP unique.

Importantly, HPP is not a disease cohort. It tracks healthy individuals aged between 40-70 and follows them prospectively. The goal is to predict disease before it emerges. The central question is: can we detect the early biological signals that precede clinical symptoms, sometimes by years?

HPP began in Israel under the leadership of Professor Eran Segal at the Weizmann Institute and is operated by Pheno.AI, a group of data and AI experts. It has since expanded to Japan and soon to the UAE, and inspired other organizations to start similar cohorts, like the University of Copenhagen’s DELPHI cohort.

At the Intersection of Emerging Technologies

HPP Global made clear that HPP does not exist in isolation. It sits at the intersection of several advancing domains.

Several speakers described the development of AI systems capable of integrating multimodal biomedical data. Foundation models are emerging. Some learn the patterns of individual data modalities, and others unify many modalities, including electronic health records, imaging, molecular measurements, and lifestyle data.

One example presented was Gluformer, a generative model developed by Eran Segal’s lab and Pheno.AI and trained on continuous glucose monitoring data, learning the temporal dynamics of blood glucose. By modeling patterns such as meal responses and circadian rhythms, Gluformer can simulate future glucose trajectories. Such simulations can provide predictions for how individuals may respond to different GLP-1 agonists, common drugs that have increased in popularity in recent years but that don’t work in all patients.

While Gluformer is trained primarily on a single data modality, Healthformer, created by the same groups, aims to learn a unified representation of human health by integrating molecular, clinical, and wearable data. It predicts the next clinical event on a patient’s journey, like a large language model predicts the next word in a sentence. The promise in such a model is the prediction of disease course and detection of presymptomatic signals that may indicate upcoming disease.

Shifting from preventative health to drug prescription, Marinka Zitnik from Harvard University presented TxAgent, a system that integrates hundreds of computational tools alongside FDA drug information to reason about drug interactions, contraindications, and optimize prescriptions and medication strategies.

Meanwhile, at the high level of scientific operation and discovery, Hiroaki Kitano from Japan’s Okinawa Institute of Science and Technology (OIST) described a vision of an AI scientist capable of autonomously designing and executing high-throughput experiments.

Yet these modern AI systems, like traditional computational tools, depend fundamentally on high-quality data and on integration with existing systems.

From Models to Medicine

Carlos Bustamante from Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) emphasized the importance of representative genomic data to ensure models generalize beyond European ancestry populations. Anna Goldenberg from University of Toronto highlighted the operational difficulty of integrating AI tools into clinical workflows. Jenna Wiens from the University of Michigan discussed the promise and pitfalls of AI-driven clinical decision support — where superior pattern recognition must be balanced with human judgment, and how interface design can determine whether AI improves or degrades outcomes.

Wearables were another major theme. Laurent Servais from University of Oxford described how leg motion tracking in children enabled the creation of a new FDA-approved clinical endpoint in muscular disease. Ruth Loos from the University of Copenhagen presented a rich data collection protocol that includes multiple wearables for sweat-based biomarker sensing, blood pressure and glucose levels, and several accelerometers that capture different types of physical activity, including biking which most smart watches do not capture.

Presentations from Google Research, Google DeepMind, and Microsoft Research underscored the growing interest of major tech companies in the future of health and biomedicine. Each showcased ambitious efforts to apply large-scale AI systems to medical data, from diagnostic imaging and clinical decision support to multimodal foundation models trained on diverse health signals.

Stay tuned for more insights from the HPP Global Conference in the coming weeks.

About the Authors

Rotem Gura Sadovsky is a data science leader and the head of data strategy at Corundum Systems Biology. Previously, Rotem operated in computational biology and product management functions in early-stage biotech startups. As one of the first data scientists at Finch Therapeutics, he played a leading role in developing live bacterial products for clinical use. Rotem’s domain expertise spans biomolecular omics technologies, clinical data from human cohorts, and biomarker discovery. He holds a PhD in computational and systems biology from MIT.

Eran Segal is a professor at the Weizmann Institute of Science, leading a laboratory recognized for its work in machine learning, computational biology, and the analysis of heterogeneous high-throughput genomic data. His research focuses on the microbiome, nutrition, genetics, and their impact on health and disease. Leveraging large-scale cohort data, his work aims to enable personalized medicine. He has authored over 150 publications and received multiple awards, including the Overton Prize. He previously conducted research at The Rockefeller University. He holds a B.Sc. from Tel Aviv University and a Ph.D. from Stanford University.