Depositing your data into MaveDB

Creating a complete entry in MaveDB requires several pieces of data and metadata. This document includes a checklist of what is required to deposit a study and a description of the required metadata. It also includes details about the optional metadata that we recommend be included to maximize the usability of your data.

For more information on how a dataset in MaveDB is structured, including descriptions of experiment sets, experiments, and score sets, please see record types.

Data upload wizard

MaveDB provides a step-by-step data upload wizard to guide you through the process of depositing your data. To begin the upload process, log in to MaveDB and select either the “New experiment” or “New score set” option in the toolbar at the top of the page.

Uploading an experiment

What is an experiment?

Experiments describe the data generated from performing a MAVE on a target. This includes all steps of the experimental procedure up to and including high-throughput sequencing. Library construction, assay design, and sequencing strategy are all described in the experiment record (example experiment).

See also

Data analysis steps including read filtering, read counting, and score calculation are described in a score set.

Publications that perform more than one functional assay should be represented as multiple experiments organized under a single experiment set, and each functional assay should be described in its own experiment record. This still applies to experimental designs where the differences between assays were relatively minor, such as varying the temperature or the concentration of a small molecule.

Replicate assays should not be reported as separate experiments, instead the number and nature of the replicates should be clearly stated in the experiment’s methods section.

What is required to upload an experiment?

For each experiment and score set, you are required to provide the following metadata:

You are not required to provide all optional metadata fields, but we strongly encourage you to do so to maximize the usability of your data:

For experiments, you are not required but strongly encouraged to provide:

Uploading a score set

What is a score set?

Score sets are records that describe the scores generated from the raw data described in their associated experiment and the principal way that users will interact with the data deposited into MaveDB. This includes all steps following the high-throughput sequencing step, including read filtering, read counting, and score calculations (example score set). All score sets must be associated with an existing experiment.

Multiple score sets should be used when distinct methods were used to calculate scores for raw data described by the experiment. The most common use case for multiple score sets is when scores are calculated at nucleotide resolution and amino acid resolution for deep mutational scanning data.

When uploading results based on imputation or complex normalization, it’s recommended to upload a more raw form of the scores (e.g. enrichment ratios) as a normal score set, and then use meta-analysis score sets to describe the imputed or normalized results.

What is required to upload a score set?

For each experiment and score set, you are required to provide the following metadata:

For score sets, you are additionally required to provide:

  • Targets associated with the score set, including their sequences or accessions and any related metadata.

  • `Licenses`_ and data usage guidelines (if needed) for the score set.

  • `Score table`_ containing the variant scores.

You are encouraged to also provide:

Publishing your data

When first created, records are given a temporary accession number beginning with tmp: instead of urn:mavedb:.

Temporary records are only visible to the uploader and to any contributors that have been added to the record. If you want to access these datasets via API, you can create an API key on your user profile page to enable authentication.

When the record is ready to be made public, you can click on the padlock icon to “publish” the record.

Warning

Once a record is published, it cannot be unpublished. If you need to fix errors in a dataset after publication, see: Deprecating score sets.

Once the data is published, several fields (including the target, scores, and counts) will become un-editable. However, most free-text metadata can still be revised after the dataset becomes public, and changes to these fields are tracked.

Note

Score sets are the only record type that can be published. When the first score set associated with an experiment is published, the associated experiment and experiment set will also become public.