Accession Systems

Published

Jun 2026

  • ID: DAS-003
  • Type: Foundations
  • Audience: Omics Data Scientists, Bioinformaticians, and Research Teams
  • Theme: Understanding How Public Data Are Organized

Public repositories contain enormous amounts of biological data, but locating the correct files often requires navigating multiple layers of identifiers and accession systems.

A single study may contain hundreds or thousands of samples, multiple experiments, and numerous sequencing runs. Understanding how these entities relate to one another is essential for efficient data acquisition and reproducible dataset assembly.

This chapter introduces the accession systems commonly encountered in public omics repositories and explains how they connect different components of a study.

Why Accession Systems Matter

Accession identifiers provide a structured way to organize, track, retrieve, and reference biological data.

They help researchers answer questions such as:

  • Which project generated these data?
  • Which sample produced these sequences?
  • Which experiment was performed?
  • Which files should be downloaded?

Without accession systems, large-scale public repositories would be difficult to navigate and maintain.

The Study Hierarchy

A common hierarchy used by major sequence repositories is:

Code
flowchart TD

A[BioProject]
--> B[BioSample]

B --> C[Experiment]

C --> D[Run]

flowchart TD

A[BioProject]
--> B[BioSample]

B --> C[Experiment]

C --> D[Run]

Each level describes a different aspect of the data acquisition process.

Common Accession Prefixes

Level Description Example Prefix
BioProject Research project PRJNA
BioSample Biological specimen SAMN
Experiment Sequencing experiment SRX
Run Sequencing output SRR

BioProject

A BioProject represents the overall research initiative.

Examples:

  • Healthy gut microbiome study
  • RNA-Seq cancer study
  • Agricultural genomics project

Common accession prefix:

PRJNA123456

BioSample

A BioSample describes an individual biological specimen.

Examples:

  • Stool sample
  • Blood sample
  • Tissue biopsy

BioSample records often contain valuable metadata such as age, sex, disease status, body site, and geographic origin.

Common accession prefix:

SAMN12345678

Experiment

An Experiment describes how a sample was processed and sequenced.

Examples include:

  • 16S rRNA sequencing
  • Shotgun metagenomics
  • RNA-Seq
  • Single-cell sequencing

Common accession prefix:

SRX1234567

Run

A Run represents the actual sequencing output generated from an experiment.

Runs typically correspond to downloadable sequence files.

Common accession prefix:

SRR12345678

Following the Breadcrumbs

Researchers often begin with a BioProject and progressively navigate toward downloadable sequencing files.

Code
flowchart LR

PRJNA[BioProject]
--> SAMN[BioSamples]

SAMN --> SRX[Experiments]

SRX --> SRR[Runs]

SRR --> FASTQ[FASTQ Files]

flowchart LR

PRJNA[BioProject]
--> SAMN[BioSamples]

SAMN --> SRX[Experiments]

SRX --> SRR[Runs]

SRR --> FASTQ[FASTQ Files]

This progression is common across many public repositories.

Repository Variations

Although accession systems differ slightly between repositories, the underlying concepts are similar.

Repository Study Sample Data
NCBI BioProject BioSample SRA
GEO GSE GSM Supplementary Files
ENA Project Sample Runs
DDBJ BioProject BioSample DRA

Example Workflow

Suppose we identify a microbiome study relevant to our project.

Code
flowchart TD

A[BioProject]
--> B[BioSample Metadata]

B --> C[Experiment]

C --> D[Run Accessions]

D --> E[Data Download]

flowchart TD

A[BioProject]
--> B[BioSample Metadata]

B --> C[Experiment]

C --> D[Run Accessions]

D --> E[Data Download]

Researchers often begin with a BioProject and work downward toward downloadable run files.

AlphaBiomics Example

A healthy reference microbiome workflow may proceed as follows:

Healthy Reference Objective
        ↓
Identify BioProject
        ↓
Retrieve BioSamples
        ↓
Review Metadata
        ↓
Select Eligible Samples
        ↓
Collect Run Accessions
        ↓
Download Data

This process illustrates why accession systems are central to reproducible data acquisition.

Looking Ahead

Accession systems tell us where data reside, but metadata tell us whether those data are suitable for our objective.

In the next chapter, we explore metadata acquisition and how metadata drive study selection, sample filtering, and reference dataset construction.