Peak calling

Computational method used in analyzing DNA From Wikipedia, the free encyclopedia

Peak calling is a computational method used to identify areas in a genome that have been enriched with aligned reads as a consequence of performing a ChIP-sequencing or MeDIP-seq experiment. These areas are those where a protein interacts with DNA.[1] When the protein is a transcription factor, the enriched area is its transcription factor binding site (TFBS). Popular software programs include MACS.[2] Wilbanks and colleagues[3] is a survey of the ChIP-seq peak callers, and Bailey et al.[4] is a description of practical guidelines for peak calling in ChIP-seq data.

Methods

Peak calling may be conducted on transcriptome/exome as well to RNA epigenome sequencing data from MeRIPseq[5] or m6Aseq[6] for detection of post-transcriptional RNA modification sites with software programs, such as exomePeak.[7] Many of the peak calling tools are optimized for only some kind of assays such as only for transcription-factor ChIP-seq or only for DNase-Seq.[8] However new generation of peak callers such as DFilter[9] are based on generalized optimal theory of detection and has been shown to work for nearly all kinds for tag profile signals from next-gen sequencing data. It is also possible to do more complex analysis using such tools like combining multiple ChIP-seq signal to detect regulatory sites.[10] In the context of ChIP-exo, this process is known as 'peak-pair calling'.[11] A recent benchmarking study compared the performance of several peak calling tools, highlighting the strengths and limitations of each method. This study evaluated peak calling tools including MACS2, SEACR, GoPeaks, and LanceOtron, and provides guidance for selecting appropriate peak callers in CUT&RUN experiments.[12]

Differential peak calling

Differential peak calling is about identifying significant differences in two ChIP-seq signals. One can distinguish between one-stage and two-stage differential peak callers. One stage differential peak callers work in two phases: first, call peaks on individual ChIP-seq signals and second, combine individual signals and apply statistical tests to estimate differential peaks. DBChIP,[13] MACS2, and MAnorm[14] are examples for one stage differential peak callers.

Two stage differential peak callers segment two ChIP-seq signals and identify differential peaks in one step. They take advantage of signal segmentation approaches such as Hidden Markov Models. Examples for two-stage differential peak callers are ChIPDiff,[15] ODIN.[16] and THOR. Differential peak calling can also be applied in the context of analyzing RNA-binding protein binding sites.[17]

Software

This incomplete list includes tools that are commonly used for peak calling in bioinformatics analyses.[18]

More information Program, Year published ...
List of peak-calling software
ProgramYear publishedAuthor(s)DescriptionLicenseLatest VersionActive development Source
MACS 2021 (3.x)

2012 (2.x)

2008

Yong Zhang, Tao Liu, Clifford A Meyer, Michael S Lawrence, et al. Model-based Analysis of ChIP-Seq. Widely used for identifying narrow peaks (e.g., transcription factor binding sites). Models the characteristic tag shift size of ChIP-seq data and utilizes control samples for noise reduction. BSD 3-Clause 3.0.3 (Feb 20, 2025)

2.2.9.1 (Dec 2023)

Yes [19]
SICER 2019 (SICER2)

2009

Chongzhi Zang, David E. Schones, Keji Zhao, W. Lee Kraus, et al. Spatial clustering approach initially developed for identifying diffuse signals and broad genomic regions of enrichment MIT License 1.0.2 (Feb 21, 2020) No [20]
epic2 2019 Johannes Dröge, Johannes Alneberg, et al. A reimplementation of the SICER algorithm focused on improving performance (speed, memory usage) for identifying broad domains. MIT License 0.2.2 (May 2023) Yes [21]
HOMER 2010 Sven Heinz, Christopher Benner, Nelson Nery, et al. Part of a software suite, the `findPeaks` utility performs peak calling, with distinct modes for narrow peaks ('factor' style) and broad domains ('histone' style). GPL / Custom Academic 4.11 (Nov 2019) No [22]
SPP (R package) 2008 Peter V. Kharchenko, Mikhail Y. Tolstorukov, Peter J. Park Uses cross-correlation analysis to estimate fragment length and identify signal peaks. It was incorporated into the ENCODE analysis pipeline. Artistic License 2.0 1.15.4 (Oct 2023 / Bioconductor 3.18) No
Genrich 2018[p] John S Hageman, Paweł Czyż, et al. Supports handling of multi-mapping reads, PCR duplicate removal, and integrated analysis of multiple replicates using Fisher's method. MIT License 0.6.1 (Jun 2021) No [23]
HPeak 2010 Zhaohui S Qin, Yongqun He, Arul M Chinnaiyan, et al. Peak-finding algorithm based on a Hidden Markov Model (HMM). Free Academic Use 1.0 (?) No
JAMM 2015 Mahmoud M. Ibrahim, Scott A. Lacadie, Nikolaus Rajewsky, et al. Uses mixture model clustering of biological replicates. GPL-3.0-only 1.0.7rev6 (~2014) No
PePr 2014 Yanxiao Zhang, Maureen A. Sartor Uses a sliding window approach modeling read counts with a negative binomial distribution. Ranks identified peaks based on consistency across replicates. GPL-3.0-only 1.1.20 (Sep 2019) No [24]
LanceOtron 2022 Ross S. Harris, Nathan D. Leclair, et al. Deep learning (convolutional neural network) based peak caller. GPL-3.0-only 1.0.1 (Jun 2023) Yes [25]
SEACR 2019 Michael P. Meers, Daniel Tenenbaum, Steven Henikoff Designed for low-background enrichment data common in techniques like CUT&RUN and CUT&Tag. It identifies enriched regions by comparing signal against the total signal, avoiding traditional input normalization. MIT License 1.3 (May 2019) No [26]
GoPeaks 2021 Vincent A. Zuber, Jeffrey E. Maxson, et al. Designed for CUT&RUN and CUT&Tag datasets. MIT License 1.0.0 (Feb 2023) Yes [27]
Close
p Published as pre-print

See also

References

Related Articles

Wikiwand AI