Peak calling
Computational method used in analyzing DNA
From Wikipedia, the free encyclopedia
Peak calling is a computational method used to identify areas in a genome that have been enriched with aligned reads as a consequence of performing a ChIP-sequencing or MeDIP-seq experiment. These areas are those where a protein interacts with DNA.[1] When the protein is a transcription factor, the enriched area is its transcription factor binding site (TFBS). Popular software programs include MACS.[2] Wilbanks and colleagues[3] is a survey of the ChIP-seq peak callers, and Bailey et al.[4] is a description of practical guidelines for peak calling in ChIP-seq data.
Methods
Peak calling may be conducted on transcriptome/exome as well to RNA epigenome sequencing data from MeRIPseq[5] or m6Aseq[6] for detection of post-transcriptional RNA modification sites with software programs, such as exomePeak.[7] Many of the peak calling tools are optimized for only some kind of assays such as only for transcription-factor ChIP-seq or only for DNase-Seq.[8] However new generation of peak callers such as DFilter[9] are based on generalized optimal theory of detection and has been shown to work for nearly all kinds for tag profile signals from next-gen sequencing data. It is also possible to do more complex analysis using such tools like combining multiple ChIP-seq signal to detect regulatory sites.[10] In the context of ChIP-exo, this process is known as 'peak-pair calling'.[11] A recent benchmarking study compared the performance of several peak calling tools, highlighting the strengths and limitations of each method. This study evaluated peak calling tools including MACS2, SEACR, GoPeaks, and LanceOtron, and provides guidance for selecting appropriate peak callers in CUT&RUN experiments.[12]
Differential peak calling
Differential peak calling is about identifying significant differences in two ChIP-seq signals. One can distinguish between one-stage and two-stage differential peak callers. One stage differential peak callers work in two phases: first, call peaks on individual ChIP-seq signals and second, combine individual signals and apply statistical tests to estimate differential peaks. DBChIP,[13] MACS2, and MAnorm[14] are examples for one stage differential peak callers.
Two stage differential peak callers segment two ChIP-seq signals and identify differential peaks in one step. They take advantage of signal segmentation approaches such as Hidden Markov Models. Examples for two-stage differential peak callers are ChIPDiff,[15] ODIN.[16] and THOR. Differential peak calling can also be applied in the context of analyzing RNA-binding protein binding sites.[17]
Software
This incomplete list includes tools that are commonly used for peak calling in bioinformatics analyses.[18]
| Program | Year published | Author(s) | Description | License | Latest Version | Active development | Source |
|---|---|---|---|---|---|---|---|
| MACS | 2021 (3.x)
2012 (2.x) 2008 |
Yong Zhang, Tao Liu, Clifford A Meyer, Michael S Lawrence, et al. | Model-based Analysis of ChIP-Seq. Widely used for identifying narrow peaks (e.g., transcription factor binding sites). Models the characteristic tag shift size of ChIP-seq data and utilizes control samples for noise reduction. | BSD 3-Clause | 3.0.3 (Feb 20, 2025)
2.2.9.1 (Dec 2023) |
Yes | [19] |
| SICER | 2019 (SICER2)
2009 |
Chongzhi Zang, David E. Schones, Keji Zhao, W. Lee Kraus, et al. | Spatial clustering approach initially developed for identifying diffuse signals and broad genomic regions of enrichment | MIT License | 1.0.2 (Feb 21, 2020) | No | [20] |
| epic2 | 2019 | Johannes Dröge, Johannes Alneberg, et al. | A reimplementation of the SICER algorithm focused on improving performance (speed, memory usage) for identifying broad domains. | MIT License | 0.2.2 (May 2023) | Yes | [21] |
| HOMER | 2010 | Sven Heinz, Christopher Benner, Nelson Nery, et al. | Part of a software suite, the `findPeaks` utility performs peak calling, with distinct modes for narrow peaks ('factor' style) and broad domains ('histone' style). | GPL / Custom Academic | 4.11 (Nov 2019) | No | [22] |
| SPP (R package) | 2008 | Peter V. Kharchenko, Mikhail Y. Tolstorukov, Peter J. Park | Uses cross-correlation analysis to estimate fragment length and identify signal peaks. It was incorporated into the ENCODE analysis pipeline. | Artistic License 2.0 | 1.15.4 (Oct 2023 / Bioconductor 3.18) | No | |
| Genrich | 2018[p] | John S Hageman, Paweł Czyż, et al. | Supports handling of multi-mapping reads, PCR duplicate removal, and integrated analysis of multiple replicates using Fisher's method. | MIT License | 0.6.1 (Jun 2021) | No | [23] |
| HPeak | 2010 | Zhaohui S Qin, Yongqun He, Arul M Chinnaiyan, et al. | Peak-finding algorithm based on a Hidden Markov Model (HMM). | Free Academic Use | 1.0 (?) | No | |
| JAMM | 2015 | Mahmoud M. Ibrahim, Scott A. Lacadie, Nikolaus Rajewsky, et al. | Uses mixture model clustering of biological replicates. | GPL-3.0-only | 1.0.7rev6 (~2014) | No | |
| PePr | 2014 | Yanxiao Zhang, Maureen A. Sartor | Uses a sliding window approach modeling read counts with a negative binomial distribution. Ranks identified peaks based on consistency across replicates. | GPL-3.0-only | 1.1.20 (Sep 2019) | No | [24] |
| LanceOtron | 2022 | Ross S. Harris, Nathan D. Leclair, et al. | Deep learning (convolutional neural network) based peak caller. | GPL-3.0-only | 1.0.1 (Jun 2023) | Yes | [25] |
| SEACR | 2019 | Michael P. Meers, Daniel Tenenbaum, Steven Henikoff | Designed for low-background enrichment data common in techniques like CUT&RUN and CUT&Tag. It identifies enriched regions by comparing signal against the total signal, avoiding traditional input normalization. | MIT License | 1.3 (May 2019) | No | [26] |
| GoPeaks | 2021 | Vincent A. Zuber, Jeffrey E. Maxson, et al. | Designed for CUT&RUN and CUT&Tag datasets. | MIT License | 1.0.0 (Feb 2023) | Yes | [27] |
- p Published as pre-print