External integrations¶

MaveDB integrates with several external resources to enrich hosted datasets and facilitate data sharing, interpretation, and clinical use.

Many of these integrations rely on variant mapping to link MAVE variants to genomic coordinates and transcripts. Through this mapping process and the adoption of GA4GH data standards, MAVE data from MaveDB is now accessible across a range of widely used genomic platforms.

ClinGen¶

ClinGen Allele Registry¶

After mapping, MaveDB submits variant data to the ClinGen Allele Registry to obtain ClinGen Allele IDs (CAIDs) for each variant. CAIDs provide a stable and unique identifier for genetic variants, enabling consistent referencing across databases and publications. These identifiers power the MaveMD variant search, allowing users to look up variants using HGVS strings, ClinVar Variation IDs, dbSNP RSIDs, or ClinGen CAIDs and find matching MAVE functional data.

ClinGen Linked Data Hub¶

When a dataset is published, MaveDB automatically submits the mapped variants to the ClinGen Linked Data Hub (LDH). Each variant is represented as a MaveDBMapping document that includes the MAVE score, score set accession, provenance information, and a link back to the MaveDB score set page. These documents are linked to ClinGen canonical allele identifiers, enabling users to access MAVE functional evidence for a variant alongside other curation data within the ClinGen ecosystem.

Users can also navigate from the ClinGen Allele Registry directly to the corresponding MaveMD variant page when functional data is available, providing a bidirectional link between the two resources.

The LDH API supports querying MaveDB data by ClinGen Allele ID (CAid), Protein Allele ID (PAid), or by MaveDB score set accession for bulk retrieval.

ClinVar¶

Using the ClinGen Allele IDs obtained during the mapping process, MaveDB cross-references variants with the ClinVar database to identify any existing clinical annotations. This integration serves two purposes:

Visualizations — ClinVar significance classifications are used to power visualizations that show how well an assay segregates known pathogenic and benign variants. In the Clinical Controls view on MaveMD variant pages, ClinVar variants are color-coded by their classification (pathogenic/likely pathogenic in red, benign/likely benign in blue), revealing how functional scores correspond to independently established clinical classifications.
Score calibrations — Known pathogenic and benign ClinVar variants serve as clinical control variants for score calibrations, which transform MAVE functional scores into ACMG/AMP-compatible evidence strength assignments. Users can select different ClinVar database snapshots to see how calibrations relate to classifications at different points in time.

MaveDB displays ClinVar significance classifications and star status alongside variant effect scores and provides direct links to ClinVar entries.

Note

A future version of this software will allow users to submit variant interpretations back to ClinVar directly from MaveDB, streamlining the process of sharing functional evidence with the clinical genomics community.

gnomAD¶

Mapped variants in MaveDB are cross-referenced with the gnomAD database to retrieve population allele frequency data. This integration provides context about the prevalence of variants in diverse human populations, which is an important factor in clinical variant interpretation alongside functional evidence.

Ensembl VEP¶

MaveDB uses the Ensembl Variant Effect Predictor (VEP) to annotate mapped variants with predicted functional consequences, including effects on protein coding sequences, splicing, and regulatory regions. These VEP annotations are displayed alongside variant effect scores on score set pages, providing additional context for interpreting the functional impact of each variant.

IGVF Catalog¶

MaveDB is integrated with the Impact of Genetic Variation on Function (IGVF) Consortium's Catalog, providing access to MAVE data as it is generated by the consortium. This integration ensures that new functional datasets become discoverable through MaveDB as they are produced.

Sequence resolution services¶

MaveDB relies on two open-source services to resolve and validate sequences for accession-based targets and variant mapping.

CDOT¶

CDOT provides transcript data — including exon boundaries, coding regions, and transcript-to-genome alignments — for RefSeq and Ensembl accessions. MaveDB uses CDOT to retrieve transcript models when processing accession-based targets and during variant mapping.

UTA¶

The Universal Transcript Archive (UTA) provides a comprehensive database of transcript alignments and annotations. MaveDB uses UTA to resolve transcript-to-genome alignments during variant mapping.

SeqRepo¶

SeqRepo is a local sequence repository that provides fast access to biological sequences by accession identifier. MaveDB uses SeqRepo to retrieve and cache nucleotide and protein sequences for target validation and variant mapping.