Draft:Multiplexed Assays of Variant Effect

From Wikipedia, the free encyclopedia

Multiplexed Assays of Variant Effect (MAVEs) are used to determine sequence-function relationships, identify variants that alter protein structure, stability, or activity, and characterize regulatory elements.^[1]^[2]^[3]^[4]. MAVEs have become important in clinical genetics, particularly for reclassifying variants of uncertain significance (VUSs) in human disease genes. Guidelines on functional evidence, such as data generated by MAVEs, have been increasingly published in the 2020s^[5]^[6]. Beyond this, MAVEs have numerous applications in cancer research, medicine, and Functional genomics, enabling researchers to study both protein-coding and non-coding regions^[7]. A typical MAVE workflow involves the generation of a library of variants in a gene or regulatory element of interest which is introduced into an experimental model system^[8]. Then, the model is subjected to a phenotypic selection and high-throughput DNA sequencing is used to quantify the frequency of each variant before and after selection. The changes in frequency are used to determine the functional score of each variant^[9].

Review waiting, please be patient.

This may take 8 weeks or more, since drafts are reviewed in no specific order. There are 3,121 pending submissions waiting for review.

If the submission is accepted, then this page will be moved into the article space.
If the submission is declined, then the reason will be posted here.
In the meantime, you can continue to improve this submission by editing normally.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Reviewer tools

Instructions · What links here · Multiplexed Assays of Variant Effect (talk: + · bio) · (log) · Copyvios report · reFill · Citation Bot · (Search: Google, Wikipedia) · Submitted 24 days ago by Jessikavalera (talk: D · +) · Last edited 8 days ago by Jessikavalera

Comment: needs references , notable per https://www.nature.com/articles/s41576-025-00870-x , thanks Ozzie10aaaa (talk) 16:40, 22 February 2026 (UTC)

Background

The MAVE workflow exists to address the challenge of linking genotypic variants with their corresponding phenotypic outcomes^[7]. This correlation poses a challenge when a single gene has a wide range of differences between individuals^[1]. While some variants are identifiable as pathogenic based on clinical data, many others are not^[10]. These unknown variants are termed variants of uncertain significance (VUSs). Often, VUSs arise due to a lack of evidence and a small sample size^[10]. Genotype-to-phenotype predictions are not yet reliable enough for the type of causative understanding required to solve VUSs^[1]. Historically, researchers have relied on clinical data, rather than genotype-to-phenotype predictors, to determine the pathogenicity of a variant^[7]^[1].

However, recent advances in technology have allowed the MAVE workflow to become established^[7]. This provides functional data that can be used alongside clinical or familial data to predict variant significance^[1]. Along with understanding VUSs, MAVE evidence has been used in determining the effects of cancer mutations, understanding non-coding regulatory elements, and identifying protein weaknesses as drug targets^[2]^[3]^[8]^[11].

Workflow

Select a model system

Selecting an appropriate model system for MAVEs requires choosing an experimental model that enables high-throughput, functional screening of genetic variants. A wide variety of models have been used in MAVEs including phage, bacteria, yeast, and human cell lines^[7]. To ensure the successful selection of an appropriate model system, the system should accurately reflect the biological processes of the organism of interest and allow for robust, quantitative measurements of variant function^[12].

Integrate the variant library into the model system

After selecting a model system, a large library of genetic variants is integrated such that each cell expresses a single variant to maintain the link between sequence and phenotype. The variant library is generated by introducing mutations across the target gene, protein, or regulatory element (e.g., promoter, enhancer, splice sites, etc.)^[8]. The library can be constructed either through random, error-prone PCR-based mutagenesis or synthesized oligo arrays programmes with mutations of interest^[8]. The variant library is then introduced into the model system. Common approaches for library introduction include lentiviral transduction or recombinase-based "landing-pad" systems^[7]^[8].

Alternatively, variants can be introduced directly into the endogenous genomic locus. Recent advances in CRISPR-Cas9 genome editing have enabled precise modification of target genes while preserving their native context. New editing techniques involve using a modified Cas9 that nicks one DNA strand, attached to an enzyme that chemically converts one base into another, allowing for precise point mutations without making a full double-strand break^[13]. Prime editing also uses a modified Cas9 that nicks one DNA strand but is attached to reverse transcriptase^[14]. A prime editing guide RNA (pegRNA) directs Cas9 to the correct location but also carries new genetic information to be inserted^[4]^[7]. This new sequence is then copied into the cell's DNA, allowing precise introduction of specific substitutions, insertions, or deletion^[14].

Apply phenotypic selection of interest with a functional assay

Cells expressing the library of interest are assayed for a phenotype of interest, such as drug resistance or reporter activation, to separate the variants based on their function. For example, to find variants associated with drug resistance, cells are grown in the presence of a specific drug and sequenced before drug addition and at several time points thereafter to determine how common each variant is in the population. A variant that becomes more common over time indicates that it is beneficial for survival and growth in the presence of the drug^[7]^[15].

Sequence cells and compute variant scores

After phenotypic selection, cells are sequenced using next-generation sequencing (NGS) methods, such as Illumina and PacBio, to determine the frequency of each variant before and after selection. Enrichment scores are calculated for each variant to determine its impact on function, such that variants that increase in frequency are typically interpreted as beneficial under the assay conditions, whereas variants that decrease are considered detrimental. Computational tools used for variant scoring include Enrich2, DiMSum, and mutscan^[16].

Scientific Principles

Model systems

MAVEs can be performed in a variety of experimental model systems, each with its own advantages and limitations. Phage and bacterial systems allow for the creation of very large variant libraries but lack eukaryotic protein folding and post-translational modifications^[7]. Yeast provides a eukaryotic context with many conserved cellular processes and human cancer cell lines offer a more physiologically relevant human environment, preserving native protein interactions and modifications^[7]. Stem cell-derived specialized cells most closely model specific human tissues, but are technically demanding and less stable^[7].

Phenotypic selection

Phenotype selection is a process in the MAVE workflow that is used to select for a desired phenotypic outcome^[7]. The proposed disease mechanism is taken into account to ensure the relevance of the resulting functional evidence^[13]. Some phenotypic outcomes include an assessment of model system viability. This can include survival, morphology, or competitive survival assays of a model system^[7]. Another type of phenotypic selection is drug sensitivity assays, which assess model survival under drug selection. Furthermore, reporter assays are often used to measure protein levels or transcriptional activity^[2]. molecular barcodes, as seen in MPRAs, are often utilized^[16]^[17]. Finally, proteins produced in these experiments can be purified and assayed. This can be achieved by performing a protein abundance assay, where the total amount of protein present is assessed^[18]. Alternatively, researchers can examine a protein’s functionality by, for example, testing the rate of substrate conversion^[19]. Due to the varying nature of these phenotypic analyses, researchers can receive different results based on their choice of phenotypic selection methods^[7].

Variant score calculation

A critical component of MAVEs is assigning functional scores to variants and generating a variant effect map. There are various computational tools used to calculate scores that convert FASTQ and BAM files from sequencing data into counts for each variant and assess its enrichment or depletion following phenotypic selection^[16]. Software packages including Enrich2, TileSeqMave, and popDMS, allow researchers to analyze variants at multiple time points^[16]^[20]^[21]. In contrast, other tools, such as mutscan, are limited to comparing only a single pre- and post-selected population^[22]. Variant scores are calculated using statistical approaches, including Bayesian methods, frequentist methods, and maximum likelihood estimation^[16]. Additionally, tools also differ in how they combine biological replicates and use methods such as simple measuring, weighted means based on error estimates, or complex statistical modelling^[16]. Alternatively, researchers can use barcode-based workflows that link unique barcodes to each variant^[23]^[24]^[25]. This method uses long-read sequencing to obtain reads that include both the barcode and the complete variant sequence, and a barcode-variant mapping file is generated for downstream scoring software^[16]^[23]^[24]^[25].

Types of MAVEs

Deep mutation scanning

Deep mutational scanning (DMS) is a high-throughput technique that combines saturation mutagenesis, functional selection, and NGS to map the functional effects of thousands of amino acid mutations in a protein simultaneously^[17]^[18]. DMS has a wide variety of applications, which include helping to identify beneficial, neutral, or detrimental mutations to engineer proteins with enhanced properties as well as understand the impact of genetic variants, particularly those that cause disease^[1]. Finally, it reveals the functional consequences, including catalytic activity and conformational stability to protein sequences^[3].

Massively parallel reporter assays

Massively Parallel Reporter Assays (MPRAs) are used to quantify the impact of thousands of DNA sequence variants on gene expression in a single experiment. This method is commonly used to study noncoding variants, including those located in promoters, enhancers, and other regulatory elements^[2]^[4]. Each DNA sequence in the library is linked to a unique molecular barcode, which is cloned upstream or downstream of a reporter gene. After the library is introduced into the cells, the activity of each sequence is measured by quantifying the abundance of its barcode in the reporter RNA using NGS, providing a quantitative measure of the regulatory activity of each sequence, and allowing for the identification of variants that alter gene expression and regulatory function^[19]^[26].

Applications

MAVEs are commonly used to study genotypic to phenotypic effects of a genomic region of interest, such as a gene or non-coding region^[17]. A protein-coding gene may be studied in the germline, or in somatic mutations as seen in cancer^[7]. Additionally, MAVEs can be used to understand the regulatory effects of non-coding regions of the genome^[2]. Furthermore, MAVEs can be used to identify potential drug targets in exogenous pathogens^[27].

Clinical genetics

In clinical genetic testing, about 20% of genetic tests identify a variant of uncertain significance (VUS)^[28]. A VUS, as defined by the American College of Medical Genetics and the Association for Molecular Pathology (ACMG-AMP) guidelines, is a mutation with unconfirmed pathogenic effects.[c] MAVEs can provide insight into these variants by producing functional evidence^[9]. To help solve VUSs, functional evidence may be paired with other data such as population frequency data, disease segregation data, nearby mutations with known effects, and clinical data^[29]. These methods help develop a better understanding of genotypic to phenotypic effects^[17]. Furthermore, this data allows researchers to create genotype-to-phenotype maps that can be useful for identifying future VUSs^[9].

An example of this application is a study published in 2020 by Sun et al.^[30]. The study focused on cystathionine beta-synthase (CBS), which causes classical homocystinuria. Researchers used mutations that had been observed in clinical settings and introduced these VUSs into yeast, most of which were rare in the human population. They used a variety of phenotypic selection techniques to map the phenotypic effects of these mutations onto the protein. In comparison testing, the protein map outperformed all of the computational predictors. This is useful not only for attempting to solve VUSs in the immediate study, but it also provides a tool for solving future VUSs^[7]^[30].

Cancer research

Tumour sequencing has become more common as sequencing technology has become more accessible^[31]. It is possible to sequence both a tumour and a germline sample to determine tumour-specific mutations, but this alone does not tell you if the changes are driver or passenger mutations^[32]. Driver mutations promote cancer survival and growth, and passenger mutations are neutral^[32]. MAVEs can provide functional evidence for cancer-promoting effects of unknown mutations, helping to determine if they are driver or passenger mutations^[33].

One such study used the MAVE workflow to examine a cancerous gain-of-function mutant (PIK3CA)^[11]. In this experiment, a MAVE was used to analyse mutations throughout the gene. Researchers were then able to map driver mutations to critical domains in the resulting protein^[1].

Regulatory genomics

A majority of the human genome consists of non-coding DNA^[34]. Functional elements, namely regulatory elements, exist in this region^[2]. MAVEs have been used to study several non-coding elements including gene promoters, enhancers, and silencers^[9]. Massively Parallel Reporter Assays (MPRAs) are commonly used to study these non-coding variants^[2]^[4].

For example, a study published in 2022 examined a cohort of patients with rare genetic diseases suspected to have mutations in non-coding regions^[35]. Variants were identified through sequencing and the study then followed an MPRA workflow. The experiment identified 8 non-coding variants for future functional study^[35].

Drug development

MAVEs function at the gene variance level, and they can be done for any species or even viruses^[17]. A clinical application of this is the ability to use MAVEs to identify vulnerabilities in pathogenic species by determining the mechanisms of protein function^[18]. This increased understanding of a pathogen can be further extended to provide insight into potential treatments, such as antiviral drugs^[9].

During the COVID-19 pandemic in 2020, researchers used a MAVE workflow to examine and identify weaknesses in the receptor binding domain (RBD) of the SARS-CoV-2 spike protein^[27]. This study used yeast as a model system and performed phenotypic assays such as observing binding to human ACE2, a key mechanism of infection^[27]. However, this study found no increase in ACE2 binding affinity between SARS-CoV-2 and previous coronaviruses^[27].

History

Early MAVE-like experiments studied the genotype-to-phenotype relationship^[17]. However, they lacked the High-throughput sequencing methods, which led to smaller-scale experiments^[17]. This early Experimental design paved the way for MAVE technology, which emerged as Next-generation sequencing (NGS) was popularised and library introduction methods improved^[9]. Experts have argued that the history of MAVEs can be divided into three eras based on available technology and experimental guidelines: the early era with proto-MAVEs, the middle era, and the current era^[7].

Proto-MAVEs (pre 2010)

Scientists have been trying to understand the genotype-to-phenotype relationship well before the creation of the MAVE workflow in 2010^[1]. However, they were limited by the lack of high-throughput methods at the time^[17]. Some high-throughput methods, such as NGS were available as early as 2005 but were popularised and improved upon through the late 2000s^[31]. Previous DNA sequencing technology, specifically Sanger sequencing, was low-throughput and could only perform a small number of reads at a time, making it far less efficient than its NGS counterparts^[36]. Additionally, this era lacked high-efficiency library input methods such as CRISPR-Cas9 and advanced recombinase-based landing pad systems^[37]^[38]. However, lentiviral transduction systems had been established^[39], as well as early recombinase-based systems such as FLP-FRT recombination and Cre-Lox recombination^[40].

The available technology led to the proto-MAVE experiments of this era^[17]. The majority of these experiments only included small numbers of variants, likely due to technological limits^[7]. However, an outlier study from 2003 examined 2314 missense variants^[41]. This study’s variant sample size places it ahead of its time; however, it was still subject to the technological constraints of the era^[17]. This resulted in a labour-intensive process where each mutant had to be cloned and assayed individually^[41].

Middle-era MAVEs (2010-2020)

True MAVEs began to appear from 2010-2011 as NGS became standard in research^[42]^[43]^[44]. Advances in sequencing technology such as Illumina NGS, and later more advanced library integration methods, such as CRISPR-Cas9 and advanced recombinase-based landing pad systems, meant that high-throughput MAVE-based experiments were now possible^[31]^[37]^[38]. During this era, multiple MAVE studies were published that included thousands of variants^[7].

While many early MAVE and proto-MAVE studies focused more specifically on Protein interactions, this era saw the integration of MAVEs into the Clinical genetics space^[7]. In 2015, the American College of Medical Genetics and Genomics (ACMG) published guidelines for interpreting sequence variants, in which they discuss using functional evidence for variant classification^[29]. MAVE experiments continued to improve alongside technology as this era progressed. By the end of the middle era, the workflow had been well established in literature, and clinical guidelines were in need of refinement following these improvements^[18]^[9].

Modern MAVEs (2020-present)

The era of modern MAVEs is primarily distinguished from the previous era by an increase in its recognition by the clinical genetics community^[7]. In 2020, both the ClinGen sequence variant interpretation (SVI) working group and the Brotman Baty Institute Mutational Scanning Working Group (BBI-MSWG) published new guidelines surrounding the use of functional evidence such as MAVEs in classifying Clinically significant genomic variants, especially for VUSs^[7]^[5]^[6]. SVI published their recommendations to the ACMG, which have since been updated as of 2025, but still include similar recommendations^[5].

Further studies utilising MAVEs have been used in the formation of large databases^[45]. As of 2026, MaveDB is the largest, boasting over 7 million variants and approximately 2000 datasets as of 2024^[45]. An additional dataset which incorporates MAVE data is the BRCA Exchange dataset, which as of 2026 includes 76809 BRCA1 and BRCA2 variants^[46]. These large datasets have significant potential applications in large-scale data processes including improved data modeling and phenotype prediction^[10].

Characteristics

MAVEs have several properties that make them a powerful technique for predicting genotype-to-phenotype relationships compared to traditional low-throughput functional assays. First, MAVEs are high-throughput as demonstrated by their ability to evaluate the functional consequence of thousands of genetic variants in a single experiment^[14]. This is enabled by generating large variants via saturation mutagenesis combined with next-generation sequencing ^[7]^[14]^[15].

A defining feature of MAVEs is the parallel assessment of variant function^[14]^[23]. After variant libraries are introduced into a model system, all variants are exposed to a phenotypic selection. The cell population is then sequenced to quantify variant frequencies before and after selection, allowing simultaneous analysis of functional effects across the entire library^[15].

Since MAVEs generate large-scale, count-based data within a single experiment, they are suitable for statistical modelling. Various tools used to calculate functional scores estimate effect sizes based on variant frequency, often using Bayesian methods or maximum likelihood estimations^[13].

One of the biggest strengths of MAVEs is the ability to link genotype-to-phenotype through a measurable output such as growth, reporter activity, protein function^[4]^[24]. Variants are subjected to selection, and their relative enrichment or depletion reflects their functional impact^[13]. This genotype-phenotype association is central to MAVEs.

The prime output of a MAVE is a variant effect map, which summarizes the functional score for each variant in a gene, protein, or regulatory element^[3]^[13]. Since a comprehensive map of every variant is generated from MAVEs, they can reveal which mutations are highly constrained and lead to a loss of function, identify flexible regions where variation is tolerable, and help validate sequence/structure/function relationships^[7]^[25].

The scale and density of mave datasets make them valuable for training and benchmarking computational variant effect predictors (VEPs). Empirical variant effect maps have been used to evaluate VEPs, improve pathogenicity prediction, and inform clinical variant interpretation^[23]^[25].

Limitations

Context-dependent bias

The fact that this is being done in a laboratory model means there is always going to be some ambiguity about how it relates to humans until clinical data can be compared to MAVE data^[33]. Although many homologous genes and biological processes exist between model organisms and humans, the biological context differs compared to humans^[2]. For example, proteins may fold differently^[38].

Assay-specific biases

The outcomes of MAVEs rely very heavily on what sort of Phenotypic data you are looking for^[6]. The functional scores generated by a MAVE are specific to the phenotypic assay chosen^[3]. This means that two variants can appear equal in one assay and unequal in another assay. For example, two variants may produce the same protein amount however, their enzymatic ability may be different^[3]. This is sometimes accounted for by an experiment using multiple types of phenotypic selections^[8]^[9]. However, this remains a potential blind spot in this experiment.

Scalability

Researchers must have a pre-defined region of interest, such as a Protein or Non-coding region^[17]. The one-region nature of MAVEs also means that Epistatic effects may be missed, as two genes will rarely be mutated together in the same MAVE^[17]. This exemplifies MAVE's scalability problems. Whole-genome MAVEs are not feasible due to the large number of variations present in the genome^[42]^[43]. This is why MAVEs have to focus on a single gene. Moreover, if researchers sought to do MAVEs in multiple genes, they would encounter exponential scale issues. Therefore, although MAVEs are useful, they would struggle to show the interactivity between variants on multiple loci^[17].

AI statement

ChatGPT was used for source discovery and spellcheck in select sections. Each source was accessed and cited from the original document. The model was instructed not to write or rewrite any of this content during spelling and grammar checking.

References

[1]
Stein, A., Fowler, D. M., Hartmann-Petersen, R. & Lindorff-Larsen, K. Biophysical and Mechanistic Models for Disease-Causing Protein Variants. Trends Biochem Sci 44, 575–588 (2019).
[2]
Inoue, F. & Ahituv, N. Decoding enhancers using massively parallel reporter assays. Genomics 106, 159–164 (2015).
[3]
Vanella, R. et al. Understanding activity-stability tradeoffs in biocatalysts by enzyme proximity sequencing. Nat Commun 15, 1807 (2024).
[4]
Patwardhan, R. P. et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nat Biotechnol 27, 1173–1175 (2009).
[5]
ClinGen Variant Classification Guidance - ClinGen | Clinical Genome Resource. https://clinicalgenome.org/tools/clingen-variant-classification-guidance/.
[6]
Gelman, H. et al. Recommendations for the collection and use of multiplexed functional data for clinical variant interpretation. Genome Med 11, 85 (2019).
[7]
McEwen, A. E., Tejura, M., Fayer, S., Starita, L. M. & Fowler, D. M. Multiplexed assays of variant effect for clinical variant interpretation. Nat Rev Genet 27, 137–154 (2026).
[8]
Chiasson, M., Dunham, M. J., Rettie, A. E. & Fowler, D. M. Applying Multiplex Assays to Understand Variation in Pharmacogenes. Clin Pharma and Therapeutics 106, 290–294 (2019).
[9]
Gasperini, M., Starita, L. & Shendure, J. The power of multiplexed functional analysis of genetic variants. Nat Protoc 11, 1782–1787 (2016).
[10]
Frazer, J. et al. Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95 (2021).
[11]
Dogruluk, T. et al. Identification of Variant-Specific Functions of PIK3CA by Rapid Phenotyping of Rare Mutations. Cancer Res 75, 5341–5354 (2015).
[12]
Cevik, S. et al. Matching variants for functional characterization of genetic variants. G3: Genes, Genomes, Genetics 13, jkad227 (2023).
[13]
Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016)..
[14]
Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019).
[15]
Belli, O., Karava, K., Farouni, R. & Platt, R. J. Multimodal scanning of genetic variants with base and prime editing. Nat Biotechnol 43, 1458–1470 (2025).
[16]
Çubuk, H., Jin, X., Phipson, B., Marsh, J. A. & Rubin, A. F. Variant scoring tools for deep mutational scanning. Mol Syst Biol 21, 1293–1305 (2025).
[17]
Fowler, D. M. & Fields, S. Deep mutational scanning: a new style of protein science. Nat Methods 11, 801–807 (2014).
[18]
Starita, L. M. & Fields, S. Deep Mutational Scanning: A Highly Parallel Method to Measure the Effects of Mutation on Protein Function. Cold Spring Harb Protoc 2015, pdb.top077503 (2015).
[19]
Mulvey, B., Lagunas, T. & Dougherty, J. D. Massively Parallel Reporter Assays: Defining Functional Psychiatric Genetic Variants Across Biological Contexts. Biol Psychiatry 89, 76–89 (2021).
[20]
Hong, Z., Shimagaki, K. S. & Barton, J. P. popDMS infers mutation effects from deep mutational scanning data. Bioinformatics btae499 (2024) doi:10.1093/bioinformatics/btae499.
[21]
Rubin, A. F. et al. A statistical framework for analyzing deep mutational scanning data. Genome Biol 18, 150 (2017).
[22]
Soneson, C., Bendel, A. M., Diss, G. & Stadler, M. B. mutscan—a flexible R package for efficient end-to-end analysis of multiplexed assays of variant effect data. Genome Biol 24, 132 (2023).
[23]
Yeh, C.-L. C., Amorosi, C. J., Showman, S. & Dunham, M. J. PacRAT: a program to improve barcode-variant mapping from PacBio long reads using multiple sequence alignment. Bioinformatics 38, 2927–2929 (2022).
[24]
Weile, J. et al. Pacybara: accurate long-read sequencing for barcoded mutagenized allelic libraries. Bioinformatics 40, btae182 (2024).
[25]
Crawford, K. & Bloom, J. alignparse: A Python package for parsing complex features from high-throughput long-read sequencing. JOSS 4, 1915 (2019).
[26]
Degner, K. N., Bell, J. L., Jones, S. D. & Won, H. Just a SNP away: The future of in vivo massively parallel reporter assay. Cell Insight 4, 100214 (2025).
[27]
Starr, T. N. et al. Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding. Cell 182, 1295-1310.e20 (2020).
[28]
Wendler, R. Genetic testing found a variant of uncertain significance. Now what? MD Anderson Cancer Center https://www.mdanderson.org/cancerwise/genetic-testing-found-a-variant-of-uncertain-significance--now-what.h00-159464001.html.
[29]
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 17, 405–424 (2015).
[30]
Sun, S. et al. A proactive genotype-to-patient-phenotype map for cystathionine beta-synthase. Genome Med 12, 13 (2020).
[31]
Pareek, C. S., Smoczynski, R. & Tretyn, A. Sequencing technologies and genome sequencing. J Appl Genetics 52, 413–435 (2011).
[32]
Merid, S. K., Goranskaya, D. & Alexeyenko, A. Distinguishing between driver and passenger mutations in individual cancer genomes by network enrichment analysis. BMC Bioinformatics 15, 308 (2014).
[33]
Starita, L. M. et al. Variant Interpretation: Functional Assays to the Rescue. Am J Hum Genet 101, 315–325 (2017).
[34]
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
[35]
McQuerry, J. A. et al. Massively parallel identification of functionally consequential noncoding genetic variants in undiagnosed rare disease patients. Sci Rep 12, 7576 (2022)
[36]
Crossley, B. M. et al. Guidelines for Sanger sequencing and molecular assay monitoring. J Vet Diagn Invest 32, 767–775 (2020).
[37]
Doudna, J. A. & Charpentier, E. Genome editing. The new frontier of genome engineering with CRISPR-Cas9. Science 346, 1258096 (2014).
[38]
Matreyek, K. A., Stephany, J. J. & Fowler, D. M. A platform for functional assessment of large variant libraries in mammalian cells. Nucleic Acids Research 45, e102–e102 (2017).
[39]
Naldini, L. et al. In vivo gene delivery and stable transduction of nondividing cells by a lentiviral vector. Science 272, 263–267 (1996)
[40]
The Cre-lox and FLP-FRT systems. https://www.jax.org/news-and-insights/2006/may/the-cre-lox-and-flp-frt-systems.
[41]
Kato, S. et al. Understanding the function-structure and function-mutation relationships of p53 tumour suppressor protein by high-resolution missense mutation analysis. Proc Natl Acad Sci U S A 100, 8424–8429 (2003).
[42]
Fowler, D. M. et al. High-resolution mapping of protein sequence-function relationships. Nat Methods 7, 741–746 (2010).
[43]
Hietpas, R. T., Jensen, J. D. & Bolon, D. N. A. Experimental illumination of a fitness landscape. Proc. Natl. Acad. Sci. U.S.A. 108, 7896–7901 (2011)
[44]
Araya, C. L. & Fowler, D. M. Deep mutational scanning: assessing protein function on a massive scale. Trends Biotechnol 29, 435–442 (2011).
[45]
Rubin, A. F. et al. MaveDB 2024: a curated community database with over seven million variant effects from multiplexed functional assays. Genome Biol 26, 13 (2025).
[46]
BRCA Exchange. https://brcaexchange.org/.