mavehgvs API documentation¶
Variant objects¶
Each variant can be parsed into a variant object, which populates and exposes named fields for each piece of the variant string.
- class mavehgvs.position.VariantPosition(pos_str: str)¶
Class for storing a variant position.
The class includes special fields for variants using the extended position syntax. .. attribute:: position
The position as an integer. Negative positions are only expected for 5’ UTR positions.
- type:
Optional[int]
- intronic_position¶
The number of bases into the intron for intronic positions. None for non-intronic positions.
Nucleotides in the 5’ half of the intron have positive
intronic_position
and their position is that of the last base of the 5’ exon. Nucleotides in the 3’ half of the intron have negativeintronic_position
and their position is that of the first base of the 3’ exon.- Type:
Optional[int]
- __eq__(other: VariantPosition) bool ¶
Equality comparison operator.
Note that the amino acid portion of a protein position is not used in this comparison.
Other comparison operators will be filled in using
functools.total_ordering()
.- Parameters:
other (VariantPosition) – The other VariantPosition to compare to.
- Returns:
True if this position is the same as the other position; else False.
- Return type:
- __ge__(other, NotImplemented=NotImplemented)¶
Return a >= b. Computed by @total_ordering from (not a < b).
- __gt__(other, NotImplemented=NotImplemented)¶
Return a > b. Computed by @total_ordering from (not a < b) and (a != b).
- __hash__ = None¶
- __init__(pos_str: str) None ¶
Parse a position string into a VariantPosition object.
- Parameters:
pos_str (str) – The string to convert to a VariantPosition object.
- __le__(other, NotImplemented=NotImplemented)¶
Return a <= b. Computed by @total_ordering from (a < b) or (a == b).
- __lt__(other: VariantPosition) bool ¶
Less than comparison operator.
Other comparison operators will be filled in using
functools.total_ordering()
.- Parameters:
other (VariantPosition) – The other VariantPosition to compare to.
- Returns:
True if this position evaluates as strictly less than the other position; else False.
- Return type:
- __ne__(other: VariantPosition) bool ¶
Not equal comparison operator.
Note that the amino acid portion of a protein position is not used in this comparison.
Other comparison operators will be filled in using
functools.total_ordering()
.- Parameters:
other (VariantPosition) – The other VariantPosition to compare to.
- Returns:
True if this position is not the same as the other position; else False.
- Return type:
- __repr__() str ¶
The object representation is equivalent to the input string.
- Returns:
The object representation.
- Return type:
- __weakref__¶
list of weak references to the object (if defined)
- fullmatch(pos=0, endpos=9223372036854775807)¶
Callable[[str, int, int], Optional[Match[str]]]: fullmatch callable for parsing positions
Returns an
re.Match
object if the full string matches one of the position groups inpos_extended
.
- is_adjacent(other: VariantPosition) bool ¶
Return whether this variant and another are immediately adjacent in sequence space.
The following special cases are not handled correctly:
The special case involving the last variant in a transcript sequence and the first base in the 3’ UTR will be evaluated as not adjacent, as the object does not have sequence length information.
The special case involving the two middle bases in an intron where the numbering switches from positive with respect to the 5’ end of the intron to negative with respect to the 3’ end of the intron will be evaluated as not adjacent, as the object does not have intron length information.
This ignores the special case where there is an intron between the last base of the 5’ UTR and the first base of the coding sequence because it is not biologically relevant to the best of my knowledge.
- Parameters:
other (VariantPosition) – The object to calculate adjacency to.
- Returns:
True if the positions describe adjacent bases in sequence space; else False.
- Return type:
- is_extended() bool ¶
Return whether this position was described using the extended syntax.
- Returns:
True if the position was described using the extended syntax; else False.
- Return type:
- is_intronic() bool ¶
Return whether this is an intronic position.
- Returns:
True if the object describes a position in an intron; else False.
- Return type:
- exception mavehgvs.exceptions.MaveHgvsParseError¶
Exception to use when a MAVE-HGVS string is not valid.
Utility functions for handling variants¶
- mavehgvs.util.parse_variant_strings(variants: Iterable[str], targetseq: str | None = None, expected_prefix: str | None = None) Tuple[List[Variant | None], List[str | None]] ¶
Parse a list of MAVE-HGVS strings into Variant objects or error messages.
- Parameters:
variants (Iterable[str]) – Iterable of MAVE-HGVS strings to parse.
targetseq (Optional[str]) – If provided, all variants will be validated for agreement with this sequence. See the documentation for
Variant
for further details.expected_prefix (Optional[str]) – If provided, all variants will be expected to have the same single-letter prefix. Variants that do not have this prefix will be treated as invalid.
- Returns:
Returns a pair of lists containing variants or error messages.
Both lists have the same length as the input list. The first list contains Variant objects if the string was successfully parsed; else None. The second list contains None if the string was successfully parsed; else the error message.
- Return type:
Tuple[List[Optional[Variant]], List[Optional[str]]]
Utility functions for regular expression patterns¶
Utility functions for working with mavehgvs regex pattern strings.
- mavehgvs.patterns.util.combine_patterns(patterns: Sequence[str], groupname: str | None = None) str ¶
Combine multiple pattern strings into a single pattern string.
Because multiple identical group names are not allowed in a pattern, the resulting object renames all named match groups such they are prefixed with the first match group name in the pattern. For example,
(?P<substitution>(?P<position>[1-9][0-9]*)...
becomes(?P<substitution>(?P<substitution_position>[1-9][0-9]*)...
.The function assumes that all input patterns are enclosed in parentheses.
- Parameters:
- Returns:
Pattern string that matches any of the input patterns. Match groups are renamed as described above to attempt to ensure uniqueness across the combined pattern.
- Return type:
DNA pattern strings¶
- mavehgvs.patterns.dna.dna_del_c: str = '(?P<dna_del_c>(?:(?:(?P<start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)'¶
Pattern matching a DNA deletion with numeric, intronic, or UTR positions.
- Type:
- mavehgvs.patterns.dna.dna_del_gmo: str = '(?P<dna_del_gmo>(?:(?:(?P<start>[1-9][0-9]*)_(?P<end>[1-9][0-9]*))|(?P<position>[1-9][0-9]*))del)'¶
Pattern matching a DNA deletion with only numeric positions for genomic-style variants.
- Type:
- mavehgvs.patterns.dna.dna_del_n: str = '(?P<dna_del_n>(?:(?:(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)'¶
Pattern matching a DNA deletion with numeric or intron positions for non-coding variants.
- Type:
- mavehgvs.patterns.dna.dna_delins_c: str = '(?P<dna_delins_c>(?:(?:(?P<start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<seq>[ACGT]+))'¶
Pattern matching a DNA deletion-insertion with numeric, intronic, or UTR positions.
- Type:
- mavehgvs.patterns.dna.dna_delins_gmo: str = '(?P<dna_delins_gmo>(?:(?:(?P<start>[1-9][0-9]*)_(?P<end>[1-9][0-9]*))|(?P<position>[1-9][0-9]*))delins(?P<seq>[ACGT]+))'¶
Pattern matching a DNA deletion-insertion with only numeric positions for genomic-style variants.
- Type:
- mavehgvs.patterns.dna.dna_delins_n: str = '(?P<dna_delins_n>(?:(?:(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<seq>[ACGT]+))'¶
Pattern matching a DNA deletion-insertion with numeric or intron positions for non-coding variants.
- Type:
- mavehgvs.patterns.dna.dna_dup_c: str = '(?P<dna_dup_c>(?:(?:(?P<start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)'¶
Pattern matching a DNA duplication with numeric, intronic, or UTR positions.
- Type:
- mavehgvs.patterns.dna.dna_dup_gmo: str = '(?P<dna_dup_gmo>(?:(?:(?P<start>[1-9][0-9]*)_(?P<end>[1-9][0-9]*))|(?P<position>[1-9][0-9]*))dup)'¶
Pattern matching a DNA duplication with only numeric positions for genomic-style variants.
- Type:
- mavehgvs.patterns.dna.dna_dup_n: str = '(?P<dna_dup_n>(?:(?:(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)'¶
Pattern matching a DNA duplication with numeric or intron positions for non-coding variants.
- Type:
- mavehgvs.patterns.dna.dna_equal_c: str = '(?P<dna_equal_c>(?:(?:(?P<start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?P<equal>=))'¶
Pattern matching DNA equality with numeric, intronic, or UTR positions.
- Type:
- mavehgvs.patterns.dna.dna_equal_gmo: str = '(?P<dna_equal_gmo>(?:(?:(?P<start>[1-9][0-9]*)_(?P<end>[1-9][0-9]*))|(?P<position>[1-9][0-9]*))?(?P<equal>=))'¶
Pattern matching a DNA substitution with only numeric positions for genomic-style variants.
- Type:
- mavehgvs.patterns.dna.dna_equal_n: str = '(?P<dna_equal_n>(?P<equal>=))'¶
Pattern matching DNA equality with no position support.
- Type:
- mavehgvs.patterns.dna.dna_ins_c: str = '(?P<dna_ins_c>(?P<start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<seq>[ACGT]+))'¶
Pattern matching a DNA insertion with numeric, intronic, or UTR positions.
- Type:
- mavehgvs.patterns.dna.dna_ins_gmo: str = '(?P<dna_ins_gmo>(?P<start>[1-9][0-9]*)_(?P<end>[1-9][0-9]*)ins(?P<seq>[ACGT]+))'¶
Pattern matching a DNA insertion with only numeric positions for genomic-style variants.
- Type:
- mavehgvs.patterns.dna.dna_ins_n: str = '(?P<dna_ins_n>(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<seq>[ACGT]+))'¶
Pattern matching a DNA insertion with numeric or intron positions for non-coding variants.
- Type:
- mavehgvs.patterns.dna.dna_multi_variant: str = '(?P<dna_c_multi>c\\.\\[(?:(?:(?:(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?:=))|(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?:[ACGT])>(?:[ACGT]))|(?:(?:(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?:(?:(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?:[ACGT]+))|(?:(?:(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?:[ACGT]+)))(?:;(?:(?:(?:(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?:=))|(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?:[ACGT])>(?:[ACGT]))|(?:(?:(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?:(?:(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?:[ACGT]+))|(?:(?:(?:(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?:[ACGT]+)))){1,}\\])|(?P<dna_n_multi>n\\.\\[(?:(?:(?:=))|(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?:[ACGT])>(?:[ACGT]))|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?:[ACGT]+))|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?:[ACGT]+)))(?:;(?:(?:(?:=))|(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?:[ACGT])>(?:[ACGT]))|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?:[ACGT]+))|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?:[ACGT]+)))){1,}\\])| (?P<dna_gmo_multi>[gmo]\\.\\[(?:(?:(?:(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*))|(?:[1-9][0-9]*))?(?:=))|(?:(?:[1-9][0-9]*)(?:[ACGT])>(?:[ACGT]))|(?:(?:(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*))|(?:[1-9][0-9]*))del)|(?:(?:(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*))|(?:[1-9][0-9]*))dup)|(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*)ins(?:[ACGT]+))|(?:(?:(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*))|(?:[1-9][0-9]*))delins(?:[ACGT]+)))(?:;(?:(?:(?:(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*))|(?:[1-9][0-9]*))?(?:=))|(?:(?:[1-9][0-9]*)(?:[ACGT])>(?:[ACGT]))|(?:(?:(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*))|(?:[1-9][0-9]*))del)|(?:(?:(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*))|(?:[1-9][0-9]*))dup)|(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*)ins(?:[ACGT]+))|(?:(?:(?:(?:[1-9][0-9]*)_(?:[1-9][0-9]*))|(?:[1-9][0-9]*))delins(?:[ACGT]+)))){1,})\\]'¶
Pattern matching any complete DNA multi-variant, including the prefix character.
Named capture groups have been removed from the variant patterns because of non-uniqueness. Another applications of single-variant regular expressions is needed to recover the named groups from each individual variant in the multi-variant.
- Type:
- mavehgvs.patterns.dna.dna_nt: str = '[ACGT]'¶
Pattern matching any uppercase DNA base.
This does not include IUPAC ambiguity characters.
- Type:
- mavehgvs.patterns.dna.dna_single_variant: str = '(?P<dna_c>c\\.(?:(?P<dna_equal_c>(?:(?:(?P<dna_equal_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_equal_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_equal_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?P<dna_equal_c_equal>=))|(?P<dna_sub_c>(?P<dna_sub_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<dna_sub_c_ref>[ACGT])>(?P<dna_sub_c_new>[ACGT]))|(?P<dna_del_c>(?:(?:(?P<dna_del_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_del_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_del_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?P<dna_dup_c>(?:(?:(?P<dna_dup_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_dup_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_dup_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?P<dna_ins_c>(?P<dna_ins_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_ins_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<dna_ins_c_seq>[ACGT]+))|(?P<dna_delins_c>(?:(?:(?P<dna_delins_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_delins_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_delins_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<dna_delins_c_seq>[ACGT]+))))|(?P<dna_n>n\\.(?:(?P<dna_equal_n>(?P<dna_equal_n_equal>=))|(?P<dna_sub_n>(?P<dna_sub_n_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<dna_sub_n_ref>[ACGT])>(?P<dna_sub_n_new>[ACGT]))|(?P<dna_del_n>(?:(?:(?P<dna_del_n_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_del_n_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_del_n_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?P<dna_dup_n>(?:(?:(?P<dna_dup_n_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_dup_n_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_dup_n_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?P<dna_ins_n>(?P<dna_ins_n_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_ins_n_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<dna_ins_n_seq>[ACGT]+))|(?P<dna_delins_n>(?:(?:(?P<dna_delins_n_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_delins_n_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_delins_n_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<dna_delins_n_seq>[ACGT]+))))|(?P<dna_gmo>[gmo]\\.(?:(?P<dna_equal_gmo>(?:(?:(?P<dna_equal_gmo_start>[1-9][0-9]*)_(?P<dna_equal_gmo_end>[1-9][0-9]*))|(?P<dna_equal_gmo_position>[1-9][0-9]*))?(?P<dna_equal_gmo_equal>=))|(?P<dna_sub_gmo>(?P<dna_sub_gmo_position>[1-9][0-9]*)(?P<dna_sub_gmo_ref>[ACGT])>(?P<dna_sub_gmo_new>[ACGT]))|(?P<dna_del_gmo>(?:(?:(?P<dna_del_gmo_start>[1-9][0-9]*)_(?P<dna_del_gmo_end>[1-9][0-9]*))|(?P<dna_del_gmo_position>[1-9][0-9]*))del)|(?P<dna_dup_gmo>(?:(?:(?P<dna_dup_gmo_start>[1-9][0-9]*)_(?P<dna_dup_gmo_end>[1-9][0-9]*))|(?P<dna_dup_gmo_position>[1-9][0-9]*))dup)|(?P<dna_ins_gmo>(?P<dna_ins_gmo_start>[1-9][0-9]*)_(?P<dna_ins_gmo_end>[1-9][0-9]*)ins(?P<dna_ins_gmo_seq>[ACGT]+))|(?P<dna_delins_gmo>(?:(?:(?P<dna_delins_gmo_start>[1-9][0-9]*)_(?P<dna_delins_gmo_end>[1-9][0-9]*))|(?P<dna_delins_gmo_position>[1-9][0-9]*))delins(?P<dna_delins_gmo_seq>[ACGT]+))))'¶
Pattern matching any complete single DNA variant, including the prefix character.
- Type:
- mavehgvs.patterns.dna.dna_sub_c: str = '(?P<dna_sub_c>(?P<position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<ref>[ACGT])>(?P<new>[ACGT]))'¶
Pattern matching a DNA substitution with numeric, intronic, or UTR positions.
- Type:
- mavehgvs.patterns.dna.dna_sub_gmo: str = '(?P<dna_sub_gmo>(?P<position>[1-9][0-9]*)(?P<ref>[ACGT])>(?P<new>[ACGT]))'¶
Pattern matching a DNA substitution with only numeric positions for genomic-style variants.
- Type:
- mavehgvs.patterns.dna.dna_sub_n: str = '(?P<dna_sub_n>(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<ref>[ACGT])>(?P<new>[ACGT]))'¶
Pattern matching a DNA substitution with numeric or intron positions for non-coding variants.
- Type:
- mavehgvs.patterns.dna.dna_variant_c: str = '(?:(?P<dna_equal_c>(?:(?:(?P<dna_equal_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_equal_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_equal_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?P<dna_equal_c_equal>=))|(?P<dna_sub_c>(?P<dna_sub_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<dna_sub_c_ref>[ACGT])>(?P<dna_sub_c_new>[ACGT]))|(?P<dna_del_c>(?:(?:(?P<dna_del_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_del_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_del_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?P<dna_dup_c>(?:(?:(?P<dna_dup_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_dup_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_dup_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?P<dna_ins_c>(?P<dna_ins_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_ins_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<dna_ins_c_seq>[ACGT]+))|(?P<dna_delins_c>(?:(?:(?P<dna_delins_c_start>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_delins_c_end>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_delins_c_position>[*-]?[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<dna_delins_c_seq>[ACGT]+)))'¶
Pattern matching any of the coding DNA variants.
- Type:
- mavehgvs.patterns.dna.dna_variant_gmo: str = '(?:(?P<dna_equal_gmo>(?:(?:(?P<dna_equal_gmo_start>[1-9][0-9]*)_(?P<dna_equal_gmo_end>[1-9][0-9]*))|(?P<dna_equal_gmo_position>[1-9][0-9]*))?(?P<dna_equal_gmo_equal>=))|(?P<dna_sub_gmo>(?P<dna_sub_gmo_position>[1-9][0-9]*)(?P<dna_sub_gmo_ref>[ACGT])>(?P<dna_sub_gmo_new>[ACGT]))|(?P<dna_del_gmo>(?:(?:(?P<dna_del_gmo_start>[1-9][0-9]*)_(?P<dna_del_gmo_end>[1-9][0-9]*))|(?P<dna_del_gmo_position>[1-9][0-9]*))del)|(?P<dna_dup_gmo>(?:(?:(?P<dna_dup_gmo_start>[1-9][0-9]*)_(?P<dna_dup_gmo_end>[1-9][0-9]*))|(?P<dna_dup_gmo_position>[1-9][0-9]*))dup)|(?P<dna_ins_gmo>(?P<dna_ins_gmo_start>[1-9][0-9]*)_(?P<dna_ins_gmo_end>[1-9][0-9]*)ins(?P<dna_ins_gmo_seq>[ACGT]+))|(?P<dna_delins_gmo>(?:(?:(?P<dna_delins_gmo_start>[1-9][0-9]*)_(?P<dna_delins_gmo_end>[1-9][0-9]*))|(?P<dna_delins_gmo_position>[1-9][0-9]*))delins(?P<dna_delins_gmo_seq>[ACGT]+)))'¶
Pattern matching any of the genomic-style DNA variants.
- Type:
- mavehgvs.patterns.dna.dna_variant_n: str = '(?:(?P<dna_equal_n>(?P<dna_equal_n_equal>=))|(?P<dna_sub_n>(?P<dna_sub_n_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<dna_sub_n_ref>[ACGT])>(?P<dna_sub_n_new>[ACGT]))|(?P<dna_del_n>(?:(?:(?P<dna_del_n_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_del_n_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_del_n_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?P<dna_dup_n>(?:(?:(?P<dna_dup_n_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_dup_n_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_dup_n_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?P<dna_ins_n>(?P<dna_ins_n_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_ins_n_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<dna_ins_n_seq>[ACGT]+))|(?P<dna_delins_n>(?:(?:(?P<dna_delins_n_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<dna_delins_n_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<dna_delins_n_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<dna_delins_n_seq>[ACGT]+)))'¶
Pattern matching any of the non-coding DNA variants.
- Type:
RNA pattern strings¶
- mavehgvs.patterns.rna.rna_del: str = '(?P<rna_del>(?:(?:(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)'¶
Pattern matching a RNA deletion with numeric or relative-to-transcript positions.
- Type:
- mavehgvs.patterns.rna.rna_delins: str = '(?P<rna_delins>(?:(?:(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<seq>[acgu]+))'¶
Pattern matching a RNA deletion-insertion with numeric or relative-to-transcript positions.
- Type:
- mavehgvs.patterns.rna.rna_dup: str = '(?P<rna_dup>(?:(?:(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)'¶
Pattern matching a RNA duplication with numeric or relative-to-transcript positions.
- Type:
- mavehgvs.patterns.rna.rna_equal: str = '(?P<rna_equal>(?:(?:(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?P<equal>=))'¶
Pattern matching RNA equality with numeric or relative-to-transcript positions.
- Type:
- mavehgvs.patterns.rna.rna_ins: str = '(?P<rna_ins>(?P<start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<seq>[acgu]+))'¶
Pattern matching a RNA insertion with numeric or relative-to-transcript positions.
- Type:
- mavehgvs.patterns.rna.rna_multi_variant: str = '(?P<rna_multi>r\\.\\[(?:(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?:=))|(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?:[acgu])>(?:[acgu]))|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?:[acgu]+))|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?:[acgu]+)))(?:;(?:(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?:=))|(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?:[acgu])>(?:[acgu]))|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?:[acgu]+))|(?:(?:(?:(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?:[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?:[acgu]+)))){1,}\\])'¶
Pattern matching any complete RNA multi-variant, including the prefix character.
Named capture groups have been removed from the variant patterns because of non-uniqueness. Another applications of single-variant regular expressions is needed to recover the named groups from each individual variant in the multi-variant.
- Type:
- mavehgvs.patterns.rna.rna_nt: str = '[acgu]'¶
Pattern matching any lowercase RNA base.
This does not include IUPAC ambiguity characters.
- Type:
- mavehgvs.patterns.rna.rna_single_variant: str = '(?P<rna>r\\.(?:(?P<rna_equal>(?:(?:(?P<rna_equal_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_equal_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<rna_equal_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?P<rna_equal_equal>=))|(?P<rna_sub>(?P<rna_sub_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<rna_sub_ref>[acgu])>(?P<rna_sub_new>[acgu]))|(?P<rna_del>(?:(?:(?P<rna_del_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_del_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<rna_del_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?P<rna_dup>(?:(?:(?P<rna_dup_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_dup_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<rna_dup_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?P<rna_ins>(?P<rna_ins_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_ins_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<rna_ins_seq>[acgu]+))|(?P<rna_delins>(?:(?:(?P<rna_delins_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_delins_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<rna_delins_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<rna_delins_seq>[acgu]+))))'¶
Pattern matching any complete RNA variant, including the prefix character.
- Type:
- mavehgvs.patterns.rna.rna_sub: str = '(?P<rna_sub>(?P<position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<ref>[acgu])>(?P<new>[acgu]))'¶
Pattern matching a RNA substitution with numeric or relative-to-transcript positions.
- Type:
- mavehgvs.patterns.rna.rna_variant: str = '(?:(?P<rna_equal>(?:(?:(?P<rna_equal_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_equal_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<rna_equal_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))?(?P<rna_equal_equal>=))|(?P<rna_sub>(?P<rna_sub_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)(?P<rna_sub_ref>[acgu])>(?P<rna_sub_new>[acgu]))|(?P<rna_del>(?:(?:(?P<rna_del_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_del_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<rna_del_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))del)|(?P<rna_dup>(?:(?:(?P<rna_dup_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_dup_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<rna_dup_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))dup)|(?P<rna_ins>(?P<rna_ins_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_ins_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)ins(?P<rna_ins_seq>[acgu]+))|(?P<rna_delins>(?:(?:(?P<rna_delins_start>[1-9][0-9]*(?:[+-][1-9][0-9]*)?)_(?P<rna_delins_end>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))|(?P<rna_delins_position>[1-9][0-9]*(?:[+-][1-9][0-9]*)?))delins(?P<rna_delins_seq>[acgu]+)))'¶
Pattern matching any single RNA variant event.
- Type:
Protein pattern strings¶
- mavehgvs.patterns.protein.aa_pos: str = '(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)'¶
Pattern matching an amino acid code followed by a position.
- Type:
- mavehgvs.patterns.protein.amino_acid: str = '(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)'¶
Pattern matching any amino acid or Ter.
This does not include ambiguous amino acids such as Glx and Xaa.
- Type:
- mavehgvs.patterns.protein.pro_del: str = '(?P<pro_del>(?:(?P<start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del)|(?:(?P<position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del))'¶
Pattern matching a protein deletion.
- Type:
- mavehgvs.patterns.protein.pro_delins: str = '(?P<pro_delins>(?:(?:(?P<start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))|(?P<position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))delins(?P<seq>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+))'¶
Pattern matching a protein deletion-insertion.
- Type:
- mavehgvs.patterns.protein.pro_dup: str = '(?P<pro_dup>(?:(?P<start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup)|(?:(?P<position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup))'¶
Pattern matching a protein duplication.
- Type:
- mavehgvs.patterns.protein.pro_equal: str = '(?P<pro_equal>(?:(?P<position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))?(?P<equal>=))|(?P<equal_sy>\\(=\\)))'¶
Pattern matching protein equality or synonymous variant.
- Type:
- mavehgvs.patterns.protein.pro_fs: str = '(?P<pro_fs>(?P<position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))fs)'¶
Pattern matching a protein substitution.
- Type:
- mavehgvs.patterns.protein.pro_ins: str = '(?P<pro_ins>(?P<start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))ins(?P<seq>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+))'¶
Pattern matching a protein insertion.
- Type:
- mavehgvs.patterns.protein.pro_multi_variant: str = '(?P<pro_multi>p\\.\\[(?:(?:(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))?(?:=))|(?:\\(=\\)))|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)))|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))fs)|(?:(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del)|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del))|(?:(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup)|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup))|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))ins(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+))|(?:(?:(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))|(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))delins(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+)))(?:;(?:(?:(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))?(?:=))|(?:\\(=\\)))|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)))|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))fs)|(?:(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del)|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del))|(?:(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup)|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup))|(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))ins(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+))|(?:(?:(?:(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))|(?:(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))delins(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+)))){1,}\\])'¶
Pattern matching any complete protein multi-variant, including the prefix character.
Named capture groups have been removed from the variant patterns because of non-uniqueness. Another applications of single-variant regular expressions is needed to recover the named groups from each individual variant in the multi-variant.
- Type:
- mavehgvs.patterns.protein.pro_single_variant: str = '(?P<pro>p\\.(?:(?P<pro_equal>(?:(?P<pro_equal_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))?(?P<pro_equal_equal>=))|(?P<pro_equal_equal_sy>\\(=\\)))|(?P<pro_sub>(?P<pro_sub_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))(?P<pro_sub_new>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)))|(?P<pro_fs>(?P<pro_fs_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))fs)|(?P<pro_del>(?:(?P<pro_del_start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<pro_del_end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del)|(?:(?P<pro_del_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del))|(?P<pro_dup>(?:(?P<pro_dup_start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<pro_dup_end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup)|(?:(?P<pro_dup_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup))|(?P<pro_ins>(?P<pro_ins_start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<pro_ins_end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))ins(?P<pro_ins_seq>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+))|(?P<pro_delins>(?:(?:(?P<pro_delins_start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<pro_delins_end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))|(?P<pro_delins_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))delins(?P<pro_delins_seq>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+))))'¶
Pattern matching any complete protein variant, including the prefix character.
- Type:
- mavehgvs.patterns.protein.pro_sub: str = '(?P<pro_sub>(?P<position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))(?P<new>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)))'¶
Pattern matching a protein substitution.
- Type:
- mavehgvs.patterns.protein.pro_variant: str = '(?:(?P<pro_equal>(?:(?P<pro_equal_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))?(?P<pro_equal_equal>=))|(?P<pro_equal_equal_sy>\\(=\\)))|(?P<pro_sub>(?P<pro_sub_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))(?P<pro_sub_new>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)))|(?P<pro_fs>(?P<pro_fs_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))fs)|(?P<pro_del>(?:(?P<pro_del_start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<pro_del_end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del)|(?:(?P<pro_del_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))del))|(?P<pro_dup>(?:(?P<pro_dup_start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<pro_dup_end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup)|(?:(?P<pro_dup_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))dup))|(?P<pro_ins>(?P<pro_ins_start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<pro_ins_end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))ins(?P<pro_ins_seq>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+))|(?P<pro_delins>(?:(?:(?P<pro_delins_start>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*))_(?P<pro_delins_end>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))|(?P<pro_delins_position>(?:(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)[1-9][0-9]*)))delins(?P<pro_delins_seq>(?:Ala|Arg|Asn|Asp|Cys|Gln|Glu|Gly|His|Ile|Leu|Lys|Met|Phe|Pro|Ser|Thr|Trp|Tyr|Val|Ter)+)))'¶
Pattern matching any single protein variant event.
- Type: