Insight Hub
2026.03.13
Mapping Populations vs. Decoding Individuals: Cohort Design for Preventative Medicine

By Rieka Chijiiwa, PhD

The inaugural HPP Global 2026 conference in Abu Dhabi signaled a paradigm shift in medical research: “scale” is no longer defined simply by the number of participants, but by the intersection of population breadth and deep biological resolution. For years, a key methodology maximized participant numbers to achieve the statistical power necessary for generalizable epidemiological insights. However, the discussions at the conference revealed that scale is now multidimensional.

This evolution is driven by the critical realization that “gold standard” clinical tests do not predict individual health outcomes years into the future. While traditional snapshots often miss the early signs of transition from health to disease, new AI models trained on deep biological data are beginning to bridge this gap by establishing personal baselines.

Evaluating health trajectories against an individual’s own biological norm, rather than a statistical average, is now essential for moving from reactive treatment to proactive prevention. We are seeing a divergence into two distinct but complementary cohort designs, each optimized for different types of scientific discovery.

Population-Scale Cohorts: Identifying Generalizable Risk

The most visible projects today are the ultra-large, population-scale cohorts. Initiatives like the UK’s Our Future Health (aiming for 5 million participants) or China’s CHAI (China Healthy Aging Investigation) project1 (managing data from approximately 9.7 million individuals) act as long-term national health resources integrated with existing healthcare systems.

The primary design principle here is breadth, as maximizing participant numbers is essential for achieving the statistical power needed to uncover rare genetic variants and subtle lifestyle changes across diverse groups. To minimize costs, these studies often rely on “light phenotyping.” They primarily use DNA samples, questionnaires, and direct linkage to Electronic Health Records (EHR), which are built through standard engagement with the local healthcare system. In the UK, this allows for a retrospective evaluation of potentially decades of medical history through the National Health Service (NHS).

Predictive Value

These cohorts are built for statistical power. They excel at identifying rare genetic variants or subtle lifestyle factors that contribute to common diseases. Their scale allows researchers to investigate subgroups that smaller studies miss, such as specific ethnic minorities or rare disease populations.

The primary output here is general risk. As an example of the challenges involved, another speaker, Prof. Carlos Bustamante, cautioned against overstating this generalizability. He noted that because historical genomic data has been extraordinarily lopsided toward European ancestries, existing models often perform well for those populations but can lose significant accuracy when applied to individuals of East Asian or African descent.2

Deep Phenotyping Cohorts: Tracking the Narrative of Biology

While population cohorts look at the “who,” deep phenotyping cohorts focus on the “how” and “when.” These studies, including the Human Phenotype Project (HPP) and Denmark’s DELPHI, prioritize biological resolution over sheer participant numbers. Instead of millions, they follow smaller groups, typically 10,000 to 100,000 people, but collect an extraordinary amount of data per person.

Participants undergo dozens of data modalities, including proteomics and microbiome analysis, to move beyond the limitations of traditional “snapshot” medicine. In conventional studies, a single test in a clinic can be easily skewed by temporary factors, such as a patient performing poorly simply because they are exhausted from travel or a busy day of activity. Deep phenotyping avoids this by using continuous data to capture “life as it is actually lived,” ensuring that we see a person’s true baseline rather than a momentary fluctuation.

Predictive Value

The strength of this model is personalized trajectories. Because the data is longitudinal and granular, it can detect pre-symptomatic shifts. For example, GluFormer identifies pre-diabetic individuals up to 12 years in advance by detecting baseline shifts that the traditional HbA1c test fails to capture.3 While population cohorts tell us who might get sick, deep phenotyping aims to show us when and why an individual is beginning to transition from health to disease.

The Trade-off: Statistical Power vs. Biological Resolution

The choice between these two designs involves a clear trade-off in resources and objectives.

  1. Cost and Complexity: Population cohorts are expensive due to their sheer size and the logistical challenge of managing millions of people. Deep phenotyping is expensive because of the high cost per participant for complex molecular analysis.
  2. Broad Association vs. Biological Resolution: Large cohorts are ideal for identifying what is associated with a disease across a diverse population. Deep cohorts provide the high-resolution data needed to explore how those diseases develop within an individual, moving from general associations toward simulating biological outcomes.
  3. Periodic Snapshots vs. Continuous Monitoring: While both models are longitudinal, population studies often rely on periodic clinical snapshots anchored in medical history. Deep phenotyping incorporates continuous monitoring through wearables to capture the dynamic variability of life as it is actually lived.

Designing a Complementary Ecosystem

The insights from HPP Global 2026 suggest that we should not choose one model over the other. Instead, the future of health requires both approaches, combining different cohort strategies so that each addresses the limitations of the other.

Population-scale cohorts identify the high-risk groups within a society, while deep phenotyping cohorts allow us to zoom in on the specific biological events that drive those risks for the individual. By integrating these two models, we move beyond simple statistical associations and toward a system of health intelligence that can truly simulate and prevent future illness.

Works Cited

  1. Chen XM, Zhang K, Zhou HY, Gao Y, Liu X, Xu S, Jin S, Sun Z, Yin Y, Zhang J, et al. A full life cycle biological clock based on routine clinical data and its impact in health and diseases. Nature Medicine 31:4225–4235, 2025.
  2. Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, Daly MJ, Bustamante CD, Kenny EE. Human demographic history impacts genetic risk prediction across diverse populations. American Journal of Human Genetics 100(4):635–649, 2017.
  3. Lutsker G, Sapir G, Shilo S, Merino J, Godneva A, Greenfield JR, Samocha-Bonet D, Dhir R, Gude F, Mannor S, Meirom E, Xing EP, Chechik G, Rossman H, Segal E. A foundation model for continuous glucose monitoring data. Nature 637:347–354, 2026.