Code
flowchart LR
A[Research Objective] --> B[Search Strategy]
B --> C[Study Discovery]
C --> D[Study Evaluation]flowchart LR A[Research Objective] --> B[Search Strategy] B --> C[Study Discovery] C --> D[Study Evaluation]
Public repositories contain millions of samples and thousands of studies spanning a wide range of biological questions, populations, technologies, and experimental designs. The challenge is rarely the lack of data. Instead, the challenge is identifying studies that are relevant, trustworthy, and suitable for a specific objective.
Study discovery is the process of transforming a research objective into a set of candidate studies that can support downstream analysis. Before any data are downloaded, researchers must determine which studies are most appropriate for their intended purpose.
Effective data acquisition begins with selecting the right studies.
For example, a project focused on constructing a healthy reference microbiome cannot rely on a simple keyword search alone. Researchers must determine:
The quality of study discovery directly influences the quality of the resulting dataset.
Every study discovery process begins with a clearly defined objective.
flowchart LR
A[Research Objective] --> B[Search Strategy]
B --> C[Study Discovery]
C --> D[Study Evaluation]flowchart LR A[Research Objective] --> B[Search Strategy] B --> C[Study Discovery] C --> D[Study Evaluation]
Include:
Examples of exclusion criteria include:
| Repository | Typical Use |
|---|---|
| NCBI | Broad study discovery |
| GEO | Functional genomics |
| ENA | Sequence archives |
| MGnify | Microbiome studies |
| GWAS Catalog | Variant studies |
Relevant metadata may include:
Build a healthy reference gut microbiome.
flowchart TD
A[Healthy Reference Goal] --> B[Search Public Repositories]
B --> C[Identify Candidate Studies]
C --> D[Review Metadata]
D --> E[Apply Inclusion and Exclusion Criteria]
E --> F[Shortlist Studies]flowchart TD A[Healthy Reference Goal] --> B[Search Public Repositories] B --> C[Identify Candidate Studies] C --> D[Review Metadata] D --> E[Apply Inclusion and Exclusion Criteria] E --> F[Shortlist Studies]
In the next chapter, we examine accession systems and the relationships between BioProjects, BioSamples, Experiments, and Runs.