mavehgvs API documentation

Variant objects

Each variant can be parsed into a variant object, which populates and exposes named fields for each piece of the variant string.

class mavehgvs.position.VariantPosition(pos_str: str)

Class for storing a variant position.

The class includes special fields for variants using the extended position syntax. .. attribute:: position

The position as an integer. Negative positions are only expected for 5’ UTR positions.

type:

Optional[int]

amino_acid

The amino acid at this position for protein variants.

Type:

Optional[str]

intronic_position

The number of bases into the intron for intronic positions. None for non-intronic positions.

Nucleotides in the 5’ half of the intron have positive intronic_position and their position is that of the last base of the 5’ exon. Nucleotides in the 3’ half of the intron have negative intronic_position and their position is that of the first base of the 3’ exon.

Type:

Optional[int]

utr

True if the position is in the UTR. None for all other positions.

Type:

Optional[bool]

__eq__(other: VariantPosition) bool

Equality comparison operator.

Note that the amino acid portion of a protein position is not used in this comparison.

Other comparison operators will be filled in using functools.total_ordering().

Parameters:

other (VariantPosition) – The other VariantPosition to compare to.

Returns:

True if this position is the same as the other position; else False.

Return type:

bool

__ge__(other, NotImplemented=NotImplemented)

Return a >= b. Computed by @total_ordering from (not a < b).

__gt__(other, NotImplemented=NotImplemented)

Return a > b. Computed by @total_ordering from (not a < b) and (a != b).

__hash__ = None
__init__(pos_str: str) None

Parse a position string into a VariantPosition object.

Parameters:

pos_str (str) – The string to convert to a VariantPosition object.

__le__(other, NotImplemented=NotImplemented)

Return a <= b. Computed by @total_ordering from (a < b) or (a == b).

__lt__(other: VariantPosition) bool

Less than comparison operator.

Other comparison operators will be filled in using functools.total_ordering().

Parameters:

other (VariantPosition) – The other VariantPosition to compare to.

Returns:

True if this position evaluates as strictly less than the other position; else False.

Return type:

bool

__ne__(other: VariantPosition) bool

Not equal comparison operator.

Note that the amino acid portion of a protein position is not used in this comparison.

Other comparison operators will be filled in using functools.total_ordering().

Parameters:

other (VariantPosition) – The other VariantPosition to compare to.

Returns:

True if this position is not the same as the other position; else False.

Return type:

bool

__repr__() str

The object representation is equivalent to the input string.

Returns:

The object representation.

Return type:

str

__weakref__

list of weak references to the object (if defined)

fullmatch(pos=0, endpos=9223372036854775807)

Callable[[str, int, int], Optional[Match[str]]]: fullmatch callable for parsing positions

Returns an re.Match object if the full string matches one of the position groups in pos_extended.

is_adjacent(other: VariantPosition) bool

Return whether this variant and another are immediately adjacent in sequence space.

The following special cases are not handled correctly:

  • The special case involving the last variant in a transcript sequence and the first base in the 3’ UTR will be evaluated as not adjacent, as the object does not have sequence length information.

  • The special case involving the two middle bases in an intron where the numbering switches from positive with respect to the 5’ end of the intron to negative with respect to the 3’ end of the intron will be evaluated as not adjacent, as the object does not have intron length information.

  • This ignores the special case where there is an intron between the last base of the 5’ UTR and the first base of the coding sequence because it is not biologically relevant to the best of my knowledge.

Parameters:

other (VariantPosition) – The object to calculate adjacency to.

Returns:

True if the positions describe adjacent bases in sequence space; else False.

Return type:

bool

is_extended() bool

Return whether this position was described using the extended syntax.

Returns:

True if the position was described using the extended syntax; else False.

Return type:

bool

is_intronic() bool

Return whether this is an intronic position.

Returns:

True if the object describes a position in an intron; else False.

Return type:

bool

is_protein() bool

Return whether this is a protein position

Returns:

True if the object describes a position with an amino acid component; else False.

Return type:

bool

is_utr() bool

Return whether this is a UTR position.

Returns:

True if the object describes a position in the UTR; else False.

Return type:

bool

exception mavehgvs.exceptions.MaveHgvsParseError

Exception to use when a MAVE-HGVS string is not valid.

Utility functions for handling variants

mavehgvs.util.parse_variant_strings(variants: Iterable[str], targetseq: str | None = None, expected_prefix: str | None = None) Tuple[List[Variant | None], List[str | None]]

Parse a list of MAVE-HGVS strings into Variant objects or error messages.

Parameters:
  • variants (Iterable[str]) – Iterable of MAVE-HGVS strings to parse.

  • targetseq (Optional[str]) – If provided, all variants will be validated for agreement with this sequence. See the documentation for Variant for further details.

  • expected_prefix (Optional[str]) – If provided, all variants will be expected to have the same single-letter prefix. Variants that do not have this prefix will be treated as invalid.

Returns:

Returns a pair of lists containing variants or error messages.

Both lists have the same length as the input list. The first list contains Variant objects if the string was successfully parsed; else None. The second list contains None if the string was successfully parsed; else the error message.

Return type:

Tuple[List[Optional[Variant]], List[Optional[str]]]

Utility functions for regular expression patterns

Utility functions for working with mavehgvs regex pattern strings.

mavehgvs.patterns.util.combine_patterns(patterns: Sequence[str], groupname: str | None = None) str

Combine multiple pattern strings into a single pattern string.

Because multiple identical group names are not allowed in a pattern, the resulting object renames all named match groups such they are prefixed with the first match group name in the pattern. For example, (?P<substitution>(?P<position>[1-9][0-9]*)... becomes (?P<substitution>(?P<substitution_position>[1-9][0-9]*)....

The function assumes that all input patterns are enclosed in parentheses.

Parameters:
  • patterns (Sequence[str]) – Sequence of pattern strings to combine.

  • groupname (Optional[str]) – Name for the capture group surrounding the resulting pattern. If this is None, a non-capturing group will be used instead.

Returns:

Pattern string that matches any of the input patterns. Match groups are renamed as described above to attempt to ensure uniqueness across the combined pattern.

Return type:

str

mavehgvs.patterns.util.remove_named_groups(pattern: str, noncapturing: bool = True) str

Function that replaces named match groups in a regular expression pattern.

Named groups are replaced with either regular parentheses or non-capturing parentheses.

Parameters:
  • pattern (str) – The pattern string to strip match groups from.

  • noncapturing (bool) – If True, the named grouping parentheses are replaced by non-capturing parentheses. If False, regular parentheses are used.

Returns:

The pattern string without named match groups.

Return type:

str

DNA pattern strings

mavehgvs.patterns.dna.dna_del_c: str = '(?P<dna_del_c>(?:(?:(?P<start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)'

Pattern matching a DNA deletion with numeric, intronic, or UTR positions.

Type:

str

mavehgvs.patterns.dna.dna_del_gmo: str = '(?P<dna_del_gmo>(?:(?:(?P<start>[1-9][0-9]*)_(?P<end>[1-9][0-9]*))|(?P<position>[1-9][0-9]*))del)'

Pattern matching a DNA deletion with only numeric positions for genomic-style variants.

Type:

str

mavehgvs.patterns.dna.dna_del_n: str = '(?P<dna_del_n>(?:(?:(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)'

Pattern matching a DNA deletion with numeric or intron positions for non-coding variants.

Type:

str

mavehgvs.patterns.dna.dna_delins_c: str = '(?P<dna_delins_c>(?:(?:(?P<start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<seq>[ACGT]+))'

Pattern matching a DNA deletion-insertion with numeric, intronic, or UTR positions.

Type:

str

mavehgvs.patterns.dna.dna_delins_gmo: str = '(?P<dna_delins_gmo>(?:(?:(?P<start>[1-9][0-9]*)_(?P<end>[1-9][0-9]*))|(?P<position>[1-9][0-9]*))delins(?P<seq>[ACGT]+))'

Pattern matching a DNA deletion-insertion with only numeric positions for genomic-style variants.

Type:

str

mavehgvs.patterns.dna.dna_delins_n: str = '(?P<dna_delins_n>(?:(?:(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<seq>[ACGT]+))'

Pattern matching a DNA deletion-insertion with numeric or intron positions for non-coding variants.

Type:

str

mavehgvs.patterns.dna.dna_dup_c: str = '(?P<dna_dup_c>(?:(?:(?P<start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)'

Pattern matching a DNA duplication with numeric, intronic, or UTR positions.

Type:

str

mavehgvs.patterns.dna.dna_dup_gmo: str = '(?P<dna_dup_gmo>(?:(?:(?P<start>[1-9][0-9]*)_(?P<end>[1-9][0-9]*))|(?P<position>[1-9][0-9]*))dup)'

Pattern matching a DNA duplication with only numeric positions for genomic-style variants.

Type:

str

mavehgvs.patterns.dna.dna_dup_n: str = '(?P<dna_dup_n>(?:(?:(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)'

Pattern matching a DNA duplication with numeric or intron positions for non-coding variants.

Type:

str

mavehgvs.patterns.dna.dna_equal_c: str = '(?P<dna_equal_c>(?:(?:(?P<start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?P<equal>=))'

Pattern matching DNA equality with numeric, intronic, or UTR positions.

Type:

str

mavehgvs.patterns.dna.dna_equal_gmo: str = '(?P<dna_equal_gmo>(?:(?:(?P<start>[1-9][0-9]*)_(?P<end>[1-9][0-9]*))|(?P<position>[1-9][0-9]*))?(?P<equal>=))'

Pattern matching a DNA substitution with only numeric positions for genomic-style variants.

Type:

str

mavehgvs.patterns.dna.dna_equal_n: str = '(?P<dna_equal_n>(?P<equal>=))'

Pattern matching DNA equality with no position support.

Type:

str

mavehgvs.patterns.dna.dna_ins_c: str = '(?P<dna_ins_c>(?P<start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<seq>[ACGT]+))'

Pattern matching a DNA insertion with numeric, intronic, or UTR positions.

Type:

str

mavehgvs.patterns.dna.dna_ins_gmo: str = '(?P<dna_ins_gmo>(?P<start>[1-9][0-9]*)_(?P<end>[1-9][0-9]*)ins(?P<seq>[ACGT]+))'

Pattern matching a DNA insertion with only numeric positions for genomic-style variants.

Type:

str

mavehgvs.patterns.dna.dna_ins_n: str = '(?P<dna_ins_n>(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<seq>[ACGT]+))'

Pattern matching a DNA insertion with numeric or intron positions for non-coding variants.

Type:

str

mavehgvs.patterns.dna.dna_multi_variant: str = '(?P<dna_c_multi>c\\.\\[(?:(?:(?:(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?:=))|(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?:[ACGT])>(?:[ACGT]))|(?:(?:(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?:(?:(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?:[ACGT]+))|(?:(?:(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?:[ACGT]+)))(?:;(?:(?:(?:(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?:=))|(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?:[ACGT])>(?:[ACGT]))|(?:(?:(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?:(?:(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?:[ACGT]+))|(?:(?:(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?:[ACGT]+)))){1,}\\])|(?P<dna_n_multi>n\\.\\[(?:(?:(?:=))|(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?:[ACGT])>(?:[ACGT]))|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?:[ACGT]+))|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?:[ACGT]+)))(?:;(?:(?:(?:=))|(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?:[ACGT])>(?:[ACGT]))|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?:[ACGT]+))|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?:[ACGT]+)))){1,}\\])| (?P<dna_gmo_multi>[gmo]\\.\\[(?:(?:(?:(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*))|(?:[1-9][0-9]*))?(?:=))|(?:(?:[1-9][0-9]*)(?:[ACGT])>(?:[ACGT]))|(?:(?:(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*))|(?:[1-9][0-9]*))del)|(?:(?:(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*))|(?:[1-9][0-9]*))dup)|(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*)ins(?:[ACGT]+))|(?:(?:(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*))|(?:[1-9][0-9]*))delins(?:[ACGT]+)))(?:;(?:(?:(?:(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*))|(?:[1-9][0-9]*))?(?:=))|(?:(?:[1-9][0-9]*)(?:[ACGT])>(?:[ACGT]))|(?:(?:(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*))|(?:[1-9][0-9]*))del)|(?:(?:(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*))|(?:[1-9][0-9]*))dup)|(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*)ins(?:[ACGT]+))|(?:(?:(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*))|(?:[1-9][0-9]*))delins(?:[ACGT]+)))){1,})\\]'

Pattern matching any complete DNA multi-variant, including the prefix character.

Named capture groups have been removed from the variant patterns because of non-uniqueness. Another applications of single-variant regular expressions is needed to recover the named groups from each individual variant in the multi-variant.

Type:

str

mavehgvs.patterns.dna.dna_nt: str = '[ACGT]'

Pattern matching any uppercase DNA base.

This does not include IUPAC ambiguity characters.

Type:

str

mavehgvs.patterns.dna.dna_single_variant: str = '(?P<dna_c>c\\.(?:(?P<dna_equal_c>(?:(?:(?P<dna_equal_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_equal_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_equal_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?P<dna_equal_c_equal>=))|(?P<dna_sub_c>(?P<dna_sub_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<dna_sub_c_ref>[ACGT])>(?P<dna_sub_c_new>[ACGT]))|(?P<dna_del_c>(?:(?:(?P<dna_del_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_del_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_del_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?P<dna_dup_c>(?:(?:(?P<dna_dup_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_dup_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_dup_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?P<dna_ins_c>(?P<dna_ins_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_ins_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<dna_ins_c_seq>[ACGT]+))|(?P<dna_delins_c>(?:(?:(?P<dna_delins_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_delins_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_delins_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<dna_delins_c_seq>[ACGT]+))))|(?P<dna_n>n\\.(?:(?P<dna_equal_n>(?P<dna_equal_n_equal>=))|(?P<dna_sub_n>(?P<dna_sub_n_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<dna_sub_n_ref>[ACGT])>(?P<dna_sub_n_new>[ACGT]))|(?P<dna_del_n>(?:(?:(?P<dna_del_n_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_del_n_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_del_n_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?P<dna_dup_n>(?:(?:(?P<dna_dup_n_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_dup_n_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_dup_n_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?P<dna_ins_n>(?P<dna_ins_n_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_ins_n_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<dna_ins_n_seq>[ACGT]+))|(?P<dna_delins_n>(?:(?:(?P<dna_delins_n_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_delins_n_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_delins_n_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<dna_delins_n_seq>[ACGT]+))))|(?P<dna_gmo>[gmo]\\.(?:(?P<dna_equal_gmo>(?:(?:(?P<dna_equal_gmo_start>[1-9][0-9]*)_(?P<dna_equal_gmo_end>[1-9][0-9]*))|(?P<dna_equal_gmo_position>[1-9][0-9]*))?(?P<dna_equal_gmo_equal>=))|(?P<dna_sub_gmo>(?P<dna_sub_gmo_position>[1-9][0-9]*)(?P<dna_sub_gmo_ref>[ACGT])>(?P<dna_sub_gmo_new>[ACGT]))|(?P<dna_del_gmo>(?:(?:(?P<dna_del_gmo_start>[1-9][0-9]*)_(?P<dna_del_gmo_end>[1-9][0-9]*))|(?P<dna_del_gmo_position>[1-9][0-9]*))del)|(?P<dna_dup_gmo>(?:(?:(?P<dna_dup_gmo_start>[1-9][0-9]*)_(?P<dna_dup_gmo_end>[1-9][0-9]*))|(?P<dna_dup_gmo_position>[1-9][0-9]*))dup)|(?P<dna_ins_gmo>(?P<dna_ins_gmo_start>[1-9][0-9]*)_(?P<dna_ins_gmo_end>[1-9][0-9]*)ins(?P<dna_ins_gmo_seq>[ACGT]+))|(?P<dna_delins_gmo>(?:(?:(?P<dna_delins_gmo_start>[1-9][0-9]*)_(?P<dna_delins_gmo_end>[1-9][0-9]*))|(?P<dna_delins_gmo_position>[1-9][0-9]*))delins(?P<dna_delins_gmo_seq>[ACGT]+))))'

Pattern matching any complete single DNA variant, including the prefix character.

Type:

str

mavehgvs.patterns.dna.dna_sub_c: str = '(?P<dna_sub_c>(?P<position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<ref>[ACGT])>(?P<new>[ACGT]))'

Pattern matching a DNA substitution with numeric, intronic, or UTR positions.

Type:

str

mavehgvs.patterns.dna.dna_sub_gmo: str = '(?P<dna_sub_gmo>(?P<position>[1-9][0-9]*)(?P<ref>[ACGT])>(?P<new>[ACGT]))'

Pattern matching a DNA substitution with only numeric positions for genomic-style variants.

Type:

str

mavehgvs.patterns.dna.dna_sub_n: str = '(?P<dna_sub_n>(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<ref>[ACGT])>(?P<new>[ACGT]))'

Pattern matching a DNA substitution with numeric or intron positions for non-coding variants.

Type:

str

mavehgvs.patterns.dna.dna_variant_c: str = '(?:(?P<dna_equal_c>(?:(?:(?P<dna_equal_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_equal_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_equal_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?P<dna_equal_c_equal>=))|(?P<dna_sub_c>(?P<dna_sub_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<dna_sub_c_ref>[ACGT])>(?P<dna_sub_c_new>[ACGT]))|(?P<dna_del_c>(?:(?:(?P<dna_del_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_del_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_del_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?P<dna_dup_c>(?:(?:(?P<dna_dup_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_dup_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_dup_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?P<dna_ins_c>(?P<dna_ins_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_ins_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<dna_ins_c_seq>[ACGT]+))|(?P<dna_delins_c>(?:(?:(?P<dna_delins_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_delins_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_delins_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<dna_delins_c_seq>[ACGT]+)))'

Pattern matching any of the coding DNA variants.

Type:

str

mavehgvs.patterns.dna.dna_variant_gmo: str = '(?:(?P<dna_equal_gmo>(?:(?:(?P<dna_equal_gmo_start>[1-9][0-9]*)_(?P<dna_equal_gmo_end>[1-9][0-9]*))|(?P<dna_equal_gmo_position>[1-9][0-9]*))?(?P<dna_equal_gmo_equal>=))|(?P<dna_sub_gmo>(?P<dna_sub_gmo_position>[1-9][0-9]*)(?P<dna_sub_gmo_ref>[ACGT])>(?P<dna_sub_gmo_new>[ACGT]))|(?P<dna_del_gmo>(?:(?:(?P<dna_del_gmo_start>[1-9][0-9]*)_(?P<dna_del_gmo_end>[1-9][0-9]*))|(?P<dna_del_gmo_position>[1-9][0-9]*))del)|(?P<dna_dup_gmo>(?:(?:(?P<dna_dup_gmo_start>[1-9][0-9]*)_(?P<dna_dup_gmo_end>[1-9][0-9]*))|(?P<dna_dup_gmo_position>[1-9][0-9]*))dup)|(?P<dna_ins_gmo>(?P<dna_ins_gmo_start>[1-9][0-9]*)_(?P<dna_ins_gmo_end>[1-9][0-9]*)ins(?P<dna_ins_gmo_seq>[ACGT]+))|(?P<dna_delins_gmo>(?:(?:(?P<dna_delins_gmo_start>[1-9][0-9]*)_(?P<dna_delins_gmo_end>[1-9][0-9]*))|(?P<dna_delins_gmo_position>[1-9][0-9]*))delins(?P<dna_delins_gmo_seq>[ACGT]+)))'

Pattern matching any of the genomic-style DNA variants.

Type:

str

mavehgvs.patterns.dna.dna_variant_n: str = '(?:(?P<dna_equal_n>(?P<dna_equal_n_equal>=))|(?P<dna_sub_n>(?P<dna_sub_n_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<dna_sub_n_ref>[ACGT])>(?P<dna_sub_n_new>[ACGT]))|(?P<dna_del_n>(?:(?:(?P<dna_del_n_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_del_n_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_del_n_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?P<dna_dup_n>(?:(?:(?P<dna_dup_n_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_dup_n_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_dup_n_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?P<dna_ins_n>(?P<dna_ins_n_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_ins_n_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<dna_ins_n_seq>[ACGT]+))|(?P<dna_delins_n>(?:(?:(?P<dna_delins_n_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_delins_n_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_delins_n_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<dna_delins_n_seq>[ACGT]+)))'

Pattern matching any of the non-coding DNA variants.

Type:

str

RNA pattern strings

mavehgvs.patterns.rna.rna_del: str = '(?P<rna_del>(?:(?:(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)'

Pattern matching a RNA deletion with numeric or relative-to-transcript positions.

Type:

str

mavehgvs.patterns.rna.rna_delins: str = '(?P<rna_delins>(?:(?:(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<seq>[acgu]+))'

Pattern matching a RNA deletion-insertion with numeric or relative-to-transcript positions.

Type:

str

mavehgvs.patterns.rna.rna_dup: str = '(?P<rna_dup>(?:(?:(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)'

Pattern matching a RNA duplication with numeric or relative-to-transcript positions.

Type:

str

mavehgvs.patterns.rna.rna_equal: str = '(?P<rna_equal>(?:(?:(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?P<equal>=))'

Pattern matching RNA equality with numeric or relative-to-transcript positions.

Type:

str

mavehgvs.patterns.rna.rna_ins: str = '(?P<rna_ins>(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<seq>[acgu]+))'

Pattern matching a RNA insertion with numeric or relative-to-transcript positions.

Type:

str

mavehgvs.patterns.rna.rna_multi_variant: str = '(?P<rna_multi>r\\.\\[(?:(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?:=))|(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?:[acgu])>(?:[acgu]))|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?:[acgu]+))|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?:[acgu]+)))(?:;(?:(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?:=))|(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?:[acgu])>(?:[acgu]))|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?:[acgu]+))|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?:[acgu]+)))){1,}\\])'

Pattern matching any complete RNA multi-variant, including the prefix character.

Named capture groups have been removed from the variant patterns because of non-uniqueness. Another applications of single-variant regular expressions is needed to recover the named groups from each individual variant in the multi-variant.

Type:

str

mavehgvs.patterns.rna.rna_nt: str = '[acgu]'

Pattern matching any lowercase RNA base.

This does not include IUPAC ambiguity characters.

Type:

str

mavehgvs.patterns.rna.rna_single_variant: str = '(?P<rna>r\\.(?:(?P<rna_equal>(?:(?:(?P<rna_equal_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_equal_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<rna_equal_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?P<rna_equal_equal>=))|(?P<rna_sub>(?P<rna_sub_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<rna_sub_ref>[acgu])>(?P<rna_sub_new>[acgu]))|(?P<rna_del>(?:(?:(?P<rna_del_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_del_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<rna_del_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?P<rna_dup>(?:(?:(?P<rna_dup_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_dup_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<rna_dup_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?P<rna_ins>(?P<rna_ins_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_ins_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<rna_ins_seq>[acgu]+))|(?P<rna_delins>(?:(?:(?P<rna_delins_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_delins_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<rna_delins_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<rna_delins_seq>[acgu]+))))'

Pattern matching any complete RNA variant, including the prefix character.

Type:

str

mavehgvs.patterns.rna.rna_sub: str = '(?P<rna_sub>(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<ref>[acgu])>(?P<new>[acgu]))'

Pattern matching a RNA substitution with numeric or relative-to-transcript positions.

Type:

str

mavehgvs.patterns.rna.rna_variant: str = '(?:(?P<rna_equal>(?:(?:(?P<rna_equal_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_equal_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<rna_equal_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?P<rna_equal_equal>=))|(?P<rna_sub>(?P<rna_sub_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<rna_sub_ref>[acgu])>(?P<rna_sub_new>[acgu]))|(?P<rna_del>(?:(?:(?P<rna_del_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_del_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<rna_del_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?P<rna_dup>(?:(?:(?P<rna_dup_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_dup_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<rna_dup_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?P<rna_ins>(?P<rna_ins_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_ins_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<rna_ins_seq>[acgu]+))|(?P<rna_delins>(?:(?:(?P<rna_delins_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_delins_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<rna_delins_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<rna_delins_seq>[acgu]+)))'

Pattern matching any single RNA variant event.

Type:

str

Protein pattern strings

mavehgvs.patterns.protein.aa_pos: str = '(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)'

Pattern matching an amino acid code followed by a position.

Type:

str

mavehgvs.patterns.protein.amino_acid: str = '(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)'

Pattern matching any amino acid or Ter.

This does not include ambiguous amino acids such as Glx and Xaa.

Type:

str

mavehgvs.patterns.protein.pro_del: str = '(?P<pro_del>(?:(?P<start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del)|(?:(?P<position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del))'

Pattern matching a protein deletion.

Type:

str

mavehgvs.patterns.protein.pro_delins: str = '(?P<pro_delins>(?:(?:(?P<start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))|(?P<position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))delins(?P<seq>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+))'

Pattern matching a protein deletion-insertion.

Type:

str

mavehgvs.patterns.protein.pro_dup: str = '(?P<pro_dup>(?:(?P<start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup)|(?:(?P<position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup))'

Pattern matching a protein duplication.

Type:

str

mavehgvs.patterns.protein.pro_equal: str = '(?P<pro_equal>(?:(?P<position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))?(?P<equal>=))|(?P<equal_sy>\\(=\\)))'

Pattern matching protein equality or synonymous variant.

Type:

str

mavehgvs.patterns.protein.pro_fs: str = '(?P<pro_fs>(?P<position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))fs)'

Pattern matching a protein substitution.

Type:

str

mavehgvs.patterns.protein.pro_ins: str = '(?P<pro_ins>(?P<start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))ins(?P<seq>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+))'

Pattern matching a protein insertion.

Type:

str

mavehgvs.patterns.protein.pro_multi_variant: str = '(?P<pro_multi>p\\.\\[(?:(?:(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))?(?:=))|(?:\\(=\\)))|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)))|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))fs)|(?:(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del)|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del))|(?:(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup)|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup))|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))ins(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+))|(?:(?:(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))|(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))delins(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+)))(?:;(?:(?:(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))?(?:=))|(?:\\(=\\)))|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)))|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))fs)|(?:(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del)|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del))|(?:(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup)|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup))|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))ins(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+))|(?:(?:(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))|(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))delins(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+)))){1,}\\])'

Pattern matching any complete protein multi-variant, including the prefix character.

Named capture groups have been removed from the variant patterns because of non-uniqueness. Another applications of single-variant regular expressions is needed to recover the named groups from each individual variant in the multi-variant.

Type:

str

mavehgvs.patterns.protein.pro_single_variant: str = '(?P<pro>p\\.(?:(?P<pro_equal>(?:(?P<pro_equal_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))?(?P<pro_equal_equal>=))|(?P<pro_equal_equal_sy>\\(=\\)))|(?P<pro_sub>(?P<pro_sub_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))(?P<pro_sub_new>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)))|(?P<pro_fs>(?P<pro_fs_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))fs)|(?P<pro_del>(?:(?P<pro_del_start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<pro_del_end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del)|(?:(?P<pro_del_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del))|(?P<pro_dup>(?:(?P<pro_dup_start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<pro_dup_end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup)|(?:(?P<pro_dup_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup))|(?P<pro_ins>(?P<pro_ins_start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<pro_ins_end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))ins(?P<pro_ins_seq>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+))|(?P<pro_delins>(?:(?:(?P<pro_delins_start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<pro_delins_end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))|(?P<pro_delins_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))delins(?P<pro_delins_seq>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+))))'

Pattern matching any complete protein variant, including the prefix character.

Type:

str

mavehgvs.patterns.protein.pro_sub: str = '(?P<pro_sub>(?P<position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))(?P<new>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)))'

Pattern matching a protein substitution.

Type:

str

mavehgvs.patterns.protein.pro_variant: str = '(?:(?P<pro_equal>(?:(?P<pro_equal_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))?(?P<pro_equal_equal>=))|(?P<pro_equal_equal_sy>\\(=\\)))|(?P<pro_sub>(?P<pro_sub_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))(?P<pro_sub_new>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)))|(?P<pro_fs>(?P<pro_fs_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))fs)|(?P<pro_del>(?:(?P<pro_del_start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<pro_del_end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del)|(?:(?P<pro_del_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del))|(?P<pro_dup>(?:(?P<pro_dup_start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<pro_dup_end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup)|(?:(?P<pro_dup_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup))|(?P<pro_ins>(?P<pro_ins_start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<pro_ins_end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))ins(?P<pro_ins_seq>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+))|(?P<pro_delins>(?:(?:(?P<pro_delins_start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<pro_delins_end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))|(?P<pro_delins_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))delins(?P<pro_delins_seq>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+)))'

Pattern matching any single protein variant event.

Type:

str