Projects are a basic unit of data organization in the HCA Data Coordination Platform (HCA DCP). Project contributors contribute raw sequencing and associated files along with rich metadata describing:
The HCA Data Explorer lists all projects on its home page along with key project metadata. The project list is filterable by the metadata values.
Figure 1: The Data Explorer's projects tab lists the projects making up the DCP.
Selecting a project title on the project list takes you to the project's detail page.
Figure 2: A project detail page showing the various information and downloads available for the project.
The project detail page contains:
the project title and description
contributor information, collaborating organizations, and project contacts
any publications or accessions associated with the project
project details such as species, organ and library construction method
counts of input, analysis and matrix files
a project metadata download
a project expression matrix download (if available)
For each project, the HCA DCP maintains a project specific tsv file containing the full project metadata. The tsv contains a row for each file in the project and columns for each metadata property. Meanings of the metadata properties are listed in the HCA Metadata Dictionary.
The metadata tsv file gives a flattened representation of the projects metadata graph that can be sorted and filtered using standard spreadsheet or data manipulation tools.
The "Project Downloads" section of the project details page contains a link to download the project metadata file.
Figure 3: The "Project Downloads" section of the project details page.
Metadata file sizes vary across projects but will generally be between 1 and 100 megabytes.
The tsv file is named after the project and includes the date and time the file was created. For example:
CD4+ cytotoxic T lymphocytes 2019-07-19 19.09.tsv
A partial example of a tsv file is listed below:
Figure 4: A partial view of a project's metadata tsv file.
For projects with supported library construction approaches, the project detail page will also contain a link to download expression matrices pre-generated for the project by the HCA Matrix Service.
The rows in the expression matrix represent cells, columns give the expression value for the column's gene.
Figure 5: A partial view of a project's expression matrix in csv format.