Draft:Multiplexed Assays of Variant Effect
From Wikipedia, the free encyclopedia
Multiplexed Assays of Variant Effect (MAVEs) are used to determine sequence-function relationships, identify variants that alter protein structure, stability, or activity, and characterize regulatory elements.[1][2][3][4]. MAVEs have become important in clinical genetics, particularly for reclassifying variants of uncertain significance (VUSs) in human disease genes. Guidelines on functional evidence, such as data generated by MAVEs, have been increasingly published in the 2020s[5][6]. Beyond this, MAVEs have numerous applications in cancer research, medicine, and Functional genomics, enabling researchers to study both protein-coding and non-coding regions[7]. A typical MAVE workflow involves the generation of a library of variants in a gene or regulatory element of interest which is introduced into an experimental model system[8]. Then, the model is subjected to a phenotypic selection and high-throughput DNA sequencing is used to quantify the frequency of each variant before and after selection. The changes in frequency are used to determine the functional score of each variant[9].
| Review waiting, please be patient.
This may take 8 weeks or more, since drafts are reviewed in no specific order. There are 3,121 pending submissions waiting for review.
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
Reviewer tools
|
Comment: needs references , notable per https://www.nature.com/articles/s41576-025-00870-x , thanks Ozzie10aaaa (talk) 16:40, 22 February 2026 (UTC)
Background
The MAVE workflow exists to address the challenge of linking genotypic variants with their corresponding phenotypic outcomes[7]. This correlation poses a challenge when a single gene has a wide range of differences between individuals[1]. While some variants are identifiable as pathogenic based on clinical data, many others are not[10]. These unknown variants are termed variants of uncertain significance (VUSs). Often, VUSs arise due to a lack of evidence and a small sample size[10]. Genotype-to-phenotype predictions are not yet reliable enough for the type of causative understanding required to solve VUSs[1]. Historically, researchers have relied on clinical data, rather than genotype-to-phenotype predictors, to determine the pathogenicity of a variant[7][1].
However, recent advances in technology have allowed the MAVE workflow to become established[7]. This provides functional data that can be used alongside clinical or familial data to predict variant significance[1]. Along with understanding VUSs, MAVE evidence has been used in determining the effects of cancer mutations, understanding non-coding regulatory elements, and identifying protein weaknesses as drug targets[2][3][8][11].
Workflow

Select a model system
Selecting an appropriate model system for MAVEs requires choosing an experimental model that enables high-throughput, functional screening of genetic variants. A wide variety of models have been used in MAVEs including phage, bacteria, yeast, and human cell lines[7]. To ensure the successful selection of an appropriate model system, the system should accurately reflect the biological processes of the organism of interest and allow for robust, quantitative measurements of variant function[12].
Integrate the variant library into the model system
After selecting a model system, a large library of genetic variants is integrated such that each cell expresses a single variant to maintain the link between sequence and phenotype. The variant library is generated by introducing mutations across the target gene, protein, or regulatory element (e.g., promoter, enhancer, splice sites, etc.)[8]. The library can be constructed either through random, error-prone PCR-based mutagenesis or synthesized oligo arrays programmes with mutations of interest[8]. The variant library is then introduced into the model system. Common approaches for library introduction include lentiviral transduction or recombinase-based "landing-pad" systems[7][8].
Alternatively, variants can be introduced directly into the endogenous genomic locus. Recent advances in CRISPR-Cas9 genome editing have enabled precise modification of target genes while preserving their native context. New editing techniques involve using a modified Cas9 that nicks one DNA strand, attached to an enzyme that chemically converts one base into another, allowing for precise point mutations without making a full double-strand break[13]. Prime editing also uses a modified Cas9 that nicks one DNA strand but is attached to reverse transcriptase[14]. A prime editing guide RNA (pegRNA) directs Cas9 to the correct location but also carries new genetic information to be inserted[4][7]. This new sequence is then copied into the cell's DNA, allowing precise introduction of specific substitutions, insertions, or deletion[14].
Apply phenotypic selection of interest with a functional assay
Cells expressing the library of interest are assayed for a phenotype of interest, such as drug resistance or reporter activation, to separate the variants based on their function. For example, to find variants associated with drug resistance, cells are grown in the presence of a specific drug and sequenced before drug addition and at several time points thereafter to determine how common each variant is in the population. A variant that becomes more common over time indicates that it is beneficial for survival and growth in the presence of the drug[7][15].
Sequence cells and compute variant scores
After phenotypic selection, cells are sequenced using next-generation sequencing (NGS) methods, such as Illumina and PacBio, to determine the frequency of each variant before and after selection. Enrichment scores are calculated for each variant to determine its impact on function, such that variants that increase in frequency are typically interpreted as beneficial under the assay conditions, whereas variants that decrease are considered detrimental. Computational tools used for variant scoring include Enrich2, DiMSum, and mutscan[16].
Scientific Principles
Model systems
MAVEs can be performed in a variety of experimental model systems, each with its own advantages and limitations. Phage and bacterial systems allow for the creation of very large variant libraries but lack eukaryotic protein folding and post-translational modifications[7]. Yeast provides a eukaryotic context with many conserved cellular processes and human cancer cell lines offer a more physiologically relevant human environment, preserving native protein interactions and modifications[7]. Stem cell-derived specialized cells most closely model specific human tissues, but are technically demanding and less stable[7].
Phenotypic selection
Phenotype selection is a process in the MAVE workflow that is used to select for a desired phenotypic outcome[7]. The proposed disease mechanism is taken into account to ensure the relevance of the resulting functional evidence[13]. Some phenotypic outcomes include an assessment of model system viability. This can include survival, morphology, or competitive survival assays of a model system[7]. Another type of phenotypic selection is drug sensitivity assays, which assess model survival under drug selection. Furthermore, reporter assays are often used to measure protein levels or transcriptional activity[2]. molecular barcodes, as seen in MPRAs, are often utilized[16][17]. Finally, proteins produced in these experiments can be purified and assayed. This can be achieved by performing a protein abundance assay, where the total amount of protein present is assessed[18]. Alternatively, researchers can examine a protein’s functionality by, for example, testing the rate of substrate conversion[19]. Due to the varying nature of these phenotypic analyses, researchers can receive different results based on their choice of phenotypic selection methods[7].
Variant score calculation
A critical component of MAVEs is assigning functional scores to variants and generating a variant effect map. There are various computational tools used to calculate scores that convert FASTQ and BAM files from sequencing data into counts for each variant and assess its enrichment or depletion following phenotypic selection[16]. Software packages including Enrich2, TileSeqMave, and popDMS, allow researchers to analyze variants at multiple time points[16][20][21]. In contrast, other tools, such as mutscan, are limited to comparing only a single pre- and post-selected population[22]. Variant scores are calculated using statistical approaches, including Bayesian methods, frequentist methods, and maximum likelihood estimation[16]. Additionally, tools also differ in how they combine biological replicates and use methods such as simple measuring, weighted means based on error estimates, or complex statistical modelling[16]. Alternatively, researchers can use barcode-based workflows that link unique barcodes to each variant[23][24][25]. This method uses long-read sequencing to obtain reads that include both the barcode and the complete variant sequence, and a barcode-variant mapping file is generated for downstream scoring software[16][23][24][25].
Types of MAVEs
Deep mutation scanning
Deep mutational scanning (DMS) is a high-throughput technique that combines saturation mutagenesis, functional selection, and NGS to map the functional effects of thousands of amino acid mutations in a protein simultaneously[17][18]. DMS has a wide variety of applications, which include helping to identify beneficial, neutral, or detrimental mutations to engineer proteins with enhanced properties as well as understand the impact of genetic variants, particularly those that cause disease[1]. Finally, it reveals the functional consequences, including catalytic activity and conformational stability to protein sequences[3].
Massively parallel reporter assays
Massively Parallel Reporter Assays (MPRAs) are used to quantify the impact of thousands of DNA sequence variants on gene expression in a single experiment. This method is commonly used to study noncoding variants, including those located in promoters, enhancers, and other regulatory elements[2][4]. Each DNA sequence in the library is linked to a unique molecular barcode, which is cloned upstream or downstream of a reporter gene. After the library is introduced into the cells, the activity of each sequence is measured by quantifying the abundance of its barcode in the reporter RNA using NGS, providing a quantitative measure of the regulatory activity of each sequence, and allowing for the identification of variants that alter gene expression and regulatory function[19][26].
Applications

MAVEs are commonly used to study genotypic to phenotypic effects of a genomic region of interest, such as a gene or non-coding region[17]. A protein-coding gene may be studied in the germline, or in somatic mutations as seen in cancer[7]. Additionally, MAVEs can be used to understand the regulatory effects of non-coding regions of the genome[2]. Furthermore, MAVEs can be used to identify potential drug targets in exogenous pathogens[27].
Clinical genetics
In clinical genetic testing, about 20% of genetic tests identify a variant of uncertain significance (VUS)[28]. A VUS, as defined by the American College of Medical Genetics and the Association for Molecular Pathology (ACMG-AMP) guidelines, is a mutation with unconfirmed pathogenic effects.[c] MAVEs can provide insight into these variants by producing functional evidence[9]. To help solve VUSs, functional evidence may be paired with other data such as population frequency data, disease segregation data, nearby mutations with known effects, and clinical data[29]. These methods help develop a better understanding of genotypic to phenotypic effects[17]. Furthermore, this data allows researchers to create genotype-to-phenotype maps that can be useful for identifying future VUSs[9].
An example of this application is a study published in 2020 by Sun et al.[30]. The study focused on cystathionine beta-synthase (CBS), which causes classical homocystinuria. Researchers used mutations that had been observed in clinical settings and introduced these VUSs into yeast, most of which were rare in the human population. They used a variety of phenotypic selection techniques to map the phenotypic effects of these mutations onto the protein. In comparison testing, the protein map outperformed all of the computational predictors. This is useful not only for attempting to solve VUSs in the immediate study, but it also provides a tool for solving future VUSs[7][30].
Cancer research
Tumour sequencing has become more common as sequencing technology has become more accessible[31]. It is possible to sequence both a tumour and a germline sample to determine tumour-specific mutations, but this alone does not tell you if the changes are driver or passenger mutations[32]. Driver mutations promote cancer survival and growth, and passenger mutations are neutral[32]. MAVEs can provide functional evidence for cancer-promoting effects of unknown mutations, helping to determine if they are driver or passenger mutations[33].
One such study used the MAVE workflow to examine a cancerous gain-of-function mutant (PIK3CA)[11]. In this experiment, a MAVE was used to analyse mutations throughout the gene. Researchers were then able to map driver mutations to critical domains in the resulting protein[1].
Regulatory genomics
A majority of the human genome consists of non-coding DNA[34]. Functional elements, namely regulatory elements, exist in this region[2]. MAVEs have been used to study several non-coding elements including gene promoters, enhancers, and silencers[9]. Massively Parallel Reporter Assays (MPRAs) are commonly used to study these non-coding variants[2][4].
For example, a study published in 2022 examined a cohort of patients with rare genetic diseases suspected to have mutations in non-coding regions[35]. Variants were identified through sequencing and the study then followed an MPRA workflow. The experiment identified 8 non-coding variants for future functional study[35].
Drug development
MAVEs function at the gene variance level, and they can be done for any species or even viruses[17]. A clinical application of this is the ability to use MAVEs to identify vulnerabilities in pathogenic species by determining the mechanisms of protein function[18]. This increased understanding of a pathogen can be further extended to provide insight into potential treatments, such as antiviral drugs[9].
During the COVID-19 pandemic in 2020, researchers used a MAVE workflow to examine and identify weaknesses in the receptor binding domain (RBD) of the SARS-CoV-2 spike protein[27]. This study used yeast as a model system and performed phenotypic assays such as observing binding to human ACE2, a key mechanism of infection[27]. However, this study found no increase in ACE2 binding affinity between SARS-CoV-2 and previous coronaviruses[27].
History

Early MAVE-like experiments studied the genotype-to-phenotype relationship[17]. However, they lacked the High-throughput sequencing methods, which led to smaller-scale experiments[17]. This early Experimental design paved the way for MAVE technology, which emerged as Next-generation sequencing (NGS) was popularised and library introduction methods improved[9]. Experts have argued that the history of MAVEs can be divided into three eras based on available technology and experimental guidelines: the early era with proto-MAVEs, the middle era, and the current era[7].
Proto-MAVEs (pre 2010)
Scientists have been trying to understand the genotype-to-phenotype relationship well before the creation of the MAVE workflow in 2010[1]. However, they were limited by the lack of high-throughput methods at the time[17]. Some high-throughput methods, such as NGS were available as early as 2005 but were popularised and improved upon through the late 2000s[31]. Previous DNA sequencing technology, specifically Sanger sequencing, was low-throughput and could only perform a small number of reads at a time, making it far less efficient than its NGS counterparts[36]. Additionally, this era lacked high-efficiency library input methods such as CRISPR-Cas9 and advanced recombinase-based landing pad systems[37][38]. However, lentiviral transduction systems had been established[39], as well as early recombinase-based systems such as FLP-FRT recombination and Cre-Lox recombination[40].
The available technology led to the proto-MAVE experiments of this era[17]. The majority of these experiments only included small numbers of variants, likely due to technological limits[7]. However, an outlier study from 2003 examined 2314 missense variants[41]. This study’s variant sample size places it ahead of its time; however, it was still subject to the technological constraints of the era[17]. This resulted in a labour-intensive process where each mutant had to be cloned and assayed individually[41].
Middle-era MAVEs (2010-2020)
True MAVEs began to appear from 2010-2011 as NGS became standard in research[42][43][44]. Advances in sequencing technology such as Illumina NGS, and later more advanced library integration methods, such as CRISPR-Cas9 and advanced recombinase-based landing pad systems, meant that high-throughput MAVE-based experiments were now possible[31][37][38]. During this era, multiple MAVE studies were published that included thousands of variants[7].
While many early MAVE and proto-MAVE studies focused more specifically on Protein interactions, this era saw the integration of MAVEs into the Clinical genetics space[7]. In 2015, the American College of Medical Genetics and Genomics (ACMG) published guidelines for interpreting sequence variants, in which they discuss using functional evidence for variant classification[29]. MAVE experiments continued to improve alongside technology as this era progressed. By the end of the middle era, the workflow had been well established in literature, and clinical guidelines were in need of refinement following these improvements[18][9].
Modern MAVEs (2020-present)
The era of modern MAVEs is primarily distinguished from the previous era by an increase in its recognition by the clinical genetics community[7]. In 2020, both the ClinGen sequence variant interpretation (SVI) working group and the Brotman Baty Institute Mutational Scanning Working Group (BBI-MSWG) published new guidelines surrounding the use of functional evidence such as MAVEs in classifying Clinically significant genomic variants, especially for VUSs[7][5][6]. SVI published their recommendations to the ACMG, which have since been updated as of 2025, but still include similar recommendations[5].
Further studies utilising MAVEs have been used in the formation of large databases[45]. As of 2026, MaveDB is the largest, boasting over 7 million variants and approximately 2000 datasets as of 2024[45]. An additional dataset which incorporates MAVE data is the BRCA Exchange dataset, which as of 2026 includes 76809 BRCA1 and BRCA2 variants[46]. These large datasets have significant potential applications in large-scale data processes including improved data modeling and phenotype prediction[10].
Characteristics
MAVEs have several properties that make them a powerful technique for predicting genotype-to-phenotype relationships compared to traditional low-throughput functional assays. First, MAVEs are high-throughput as demonstrated by their ability to evaluate the functional consequence of thousands of genetic variants in a single experiment[14]. This is enabled by generating large variants via saturation mutagenesis combined with next-generation sequencing [7][14][15].
A defining feature of MAVEs is the parallel assessment of variant function[14][23]. After variant libraries are introduced into a model system, all variants are exposed to a phenotypic selection. The cell population is then sequenced to quantify variant frequencies before and after selection, allowing simultaneous analysis of functional effects across the entire library[15].
Since MAVEs generate large-scale, count-based data within a single experiment, they are suitable for statistical modelling. Various tools used to calculate functional scores estimate effect sizes based on variant frequency, often using Bayesian methods or maximum likelihood estimations[13].
One of the biggest strengths of MAVEs is the ability to link genotype-to-phenotype through a measurable output such as growth, reporter activity, protein function[4][24]. Variants are subjected to selection, and their relative enrichment or depletion reflects their functional impact[13]. This genotype-phenotype association is central to MAVEs.
The prime output of a MAVE is a variant effect map, which summarizes the functional score for each variant in a gene, protein, or regulatory element[3][13]. Since a comprehensive map of every variant is generated from MAVEs, they can reveal which mutations are highly constrained and lead to a loss of function, identify flexible regions where variation is tolerable, and help validate sequence/structure/function relationships[7][25].
The scale and density of mave datasets make them valuable for training and benchmarking computational variant effect predictors (VEPs). Empirical variant effect maps have been used to evaluate VEPs, improve pathogenicity prediction, and inform clinical variant interpretation[23][25].
Limitations
Context-dependent bias
The fact that this is being done in a laboratory model means there is always going to be some ambiguity about how it relates to humans until clinical data can be compared to MAVE data[33]. Although many homologous genes and biological processes exist between model organisms and humans, the biological context differs compared to humans[2]. For example, proteins may fold differently[38].
Assay-specific biases
The outcomes of MAVEs rely very heavily on what sort of Phenotypic data you are looking for[6]. The functional scores generated by a MAVE are specific to the phenotypic assay chosen[3]. This means that two variants can appear equal in one assay and unequal in another assay. For example, two variants may produce the same protein amount however, their enzymatic ability may be different[3]. This is sometimes accounted for by an experiment using multiple types of phenotypic selections[8][9]. However, this remains a potential blind spot in this experiment.
Scalability
Researchers must have a pre-defined region of interest, such as a Protein or Non-coding region[17]. The one-region nature of MAVEs also means that Epistatic effects may be missed, as two genes will rarely be mutated together in the same MAVE[17]. This exemplifies MAVE's scalability problems. Whole-genome MAVEs are not feasible due to the large number of variations present in the genome[42][43]. This is why MAVEs have to focus on a single gene. Moreover, if researchers sought to do MAVEs in multiple genes, they would encounter exponential scale issues. Therefore, although MAVEs are useful, they would struggle to show the interactivity between variants on multiple loci[17].
AI statement
ChatGPT was used for source discovery and spellcheck in select sections. Each source was accessed and cited from the original document. The model was instructed not to write or rewrite any of this content during spelling and grammar checking.
