Project downloads are now available from the project page. Download the files directly to your local machine via curl or export them to the cloud-based bioinformatics platform Terra by selecting the download next to the new "Terra Workspace" icon.
A new HumanCellAtlas slack channel (#dcp-help) is available to help you troubleshoot Data Portal questions with both the Data Portal team and community and give you the latest updates on Data Portal events and data.
New Raw Data, Metadata, and Contributor-generated Matrices (14)
To enable easier discovery of normal tissues, we've modified the Specimen Disease search facet on the Explore page so that the "Normal" option is now at the top of the drop-down.
Selecting Files by "Content Description"
The File facet has a new Content Description column to enable searching for files by the type of data they contain.
New Managed Access and Seed Network projects
July 23, 2021
The DCP has added its first managed access project.
Contributor-generated cell-by-gene count matrices and associated non-personally identifiable metadata are now available to the whole HCA community from within the DCP.
Additionally, DCP now has the first project data from the HCA Seed Networks, which are further developing the Human Cell Atlas by combining the expertise of experimental scientists, software engineers, and physicians.
Managed access raw data is available from the Synapse database upon successful completion of a data use certificate. To find supplementary links to the managed access datasets in Synapse, navigate to the project detail page in the Data Portal.
Raw sequencing data for 16 new projects are now available in the DCP Data Browser. These projects include single-cell data derived from:
Human and mouse
10x 3’, 10x 5’, Smart-seq2 technologies
Small intestine, aorta, brain, skeletal muscle, blood, pancreas, tonsil, lung, skin, immune system, kidney, and eye
Airway basal stem cells exposed to SARS-CoV-2
Disease states, including Crohn’s Disease, aneurysm, Multiple Sclerosis, HIV, Type 2 Diabetes, and glioblastoma
Processed data now available for 26 HCA 10x datasets
April 02, 2021
The DCP 2.0 Preview now has standardized BAMs and count matrices (Loom file format) available for 26 HCA projects, including 15 new projects. These projects contain both human and mouse single-cell and single-nucleus data generated with 3’ 10x V2 and V3 sequencing technology. This data was processed using the latest version of the Optimus pipeline (see the Optimus Overview).
In addition to individual sample count matrices, each newly processed project also has standardized, project-level DCP-generated matrices that are stratified by organ, species, and library construction method. These matrices are minimally filtered to include only cells with more than 100 UMIs. You can download the project-level DCP-generated Matrices from the Data Browser (see image below) or from the individual Project page (see the Exploring Projects guide)
New raw data and Contributor-Generated Matrices
Raw sequence files and contributor-generated matrices (CGMs) for an additional 11 new projects are available for download from the DCP 2.0 Preview. These projects include human adult, embryonic, and fetal data derived from 10 organs - brain, blood, bone marrow, colon, thymus, pancreas, placenta, mouth, liver, and spleen. Most projects employed 10x sequencing technology, but some additionally used cite-seq and drop-seq.
With the addition of these new data, the DCP now has 55 total projects with over 7 million cells derived from 40 organs and 580 donors. We are grateful for the contributions and continued support of the 179 global labs who made this data available.
DCP 2.0 launches; new projects, contributor generated matrices and DCP 2.0 infrastructure
December 11, 2020
In the spirit of bringing HCA data to the community as quickly as possible, we are releasing the new DCP 2.0 data and infrastructure incrementally.
This initial launch -- the first of several planned to roll out new functionality, pipelines, and data -- includes:
Raw data and standardized metadata for 5.8 million cells
Contributor-generated matrices, embeddings, and annotations for existing and 16 new DCP projects
New DCP 2.0 infrastructure (details below)
What’s included in the new projects?
The new projects include a mix of human and mouse data from a variety of organs including adipose tissue, heart, hindlimb, spleen skin, yolk sac, diaphragm, tongue, trachea, and more. These data encompass:
HCA Matrix Service; DCP 1.0 project matrices remain available on the individual Project Matrices page (see the Exploring Projects guide)
HCA Data Storage Service (DSS) and API; data is now stored in the Terra Data Repo (TDR), an alternative storage and metadata management service that supports managed access
What’s Coming Next?
Between now and the end of February look for:
Standardized analysis results (BAM and index files) made by the latest HCA 10x and SmartSeq-2 pipelines for all projects
DCP-generated count matrices stratified by species, library construction method, and organ for each project
After these standardized analyses are complete, we will begin releasing new projects and analyses on a monthly basis. We are also working towards incorporating controlled-access projects.
Once processing is complete, we will retire DCP 1.0 View.
Coming Soon - DCP 2.0
October 9, 2020
We are excited to announce the launch of the DCP 2.0 this fall. This has been a collaborative effort over the past 6 months to take on board the HCA community feedback and understand the unmet needs in order to align the goals of the DCP with the wider HCA community goals.
We have made significant changes, including GA4GH-compliance, more data—50% more than previously—and, where available, contributor-generated count matrices, embeddings, and cell type annotations.
Changes Coming to Key Components
To accomplish this, we will be making the following changes and migrations:
The current Data Storage Service (DSS) will be retired and we will migrate to the Terra Data Repo, which supports managed access, for storage and metadata management. Both Ingest and the Data Browser will access the Terra Data Repo via self-service APIs, making for minimal disruption to the HCA scientific community.
The matrix service API will be retired. The per-project matrices that are most commonly used will be available in static form directly from the data browser.
As a higher-level replacement for the DSS API, the Data Browser API is being prepared for use by the wider developer community and will be officially documented and supported by the HCA DCP team.
The HCA-CLI will be retired and the Data Browser will provide a bulk download capability via curl (or similar) commands. Because the internal organization of the data store is subject to change with upcoming work on the metadata, direct calls against it are discouraged in favor of the new Data Browser API.
Transition Path to DCP 2.0
More information about the transition to DCP 2.0 will be announced in the coming weeks and we will endeavor to make this transition as seamless as possible to the HCA research and developer communities.
To enable a smooth transition, the DCP 1 data browser, APIs and data will remain available until January 1, 2021.
We will continue to integrate third-party portals and applications into the HCA ecosystem by linking from and integrating directly into the HCA Data Browser.