Draft:Audio cleaner

Lead

Audio Cleaner is the process of removing imperfections (such as noise, hiss, distortion, or impulse noise) from sound recordings.^[1] This field utilizes both analog signal processing and digital signal processing to improve the fidelity and intelligibility of audio across various media, including historical archives, music production, and telecommunications. While "audio cleaner" is a common consumer-facing term, the technical community typically refers to these practices as noise reduction or speech enhancement.^[2]

Audio cleaner is applied across diverse contexts, including music production, broadcasting, film and television post-production, podcasting, telecommunications, and archival restoration. Common procedures involve denoising, de-reverberation, removal of clicks and pops, suppression of hum, and various forms of spectral repair.^[3]

History

Early analog methods

Early analog methods of audio cleaning appeared during the analog recording era, when hardware-based solutions were used to reduce unwanted noise, such as electrical filtering, gating, and dynamic range control. Tape systems often employed noise reduction techniques, including Dolby A and Dolby B, to reduce tape hiss in both professional and consumer-grade recordings.^[3]

Digital audio workstations (DAW) era

With the advent of digital signal processing (DSP) and digital audio workstations (DAWs), audio cleaning has become increasingly reliant on software technology. Techniques such as spectral subtraction, Fourier transform analysis, and Wiener filtering have made the processing of audio signals more precise^[4]. During this period, audio restoration developed into a specialized technique in the field of music remastering and archival preservation.

Modern AI-based approaches

Over the past decade, machine learning and artificial intelligence have become a central component of many audio cleaning systems.^[5] Deep neural networks, notably convolutional and transformer-based models, are now commonly used for tasks such as real-time denoising and de-reverberation. As a result, techniques that were previously confined to specialized studio environments have increasingly appeared in consumer-facing tools, including browser-based and mobile applications.^[4]

Techniques

Audio Denoising

Audio denoising refers to the process of removing unwanted noise from audio signals while preserving the wanted sound (such as speech or music). It is a key technology in the field of signal processing and is widely used in areas such as telecommunications, music production, broadcasting, podcasting, and hearing aids.

Audio denoising techniques encompass a range of methods, from traditional mathematical models or signal processing methods to modern machine learning-based approaches.^[6]

Background

Audio signals often contain unwanted noise introduced during recording, podcasting, or playback. Common sources of noise include environmental sounds, electrical interference, microphone artifacts, and compression-related distortions.

In the context of signal processing, audio denoising is considered a specialized application of noise reduction, focusing specifically on audio data. A particularly challenging but common form of the problem is the underdetermined case of single-channel speech denoising, due to the complexity of speech processes and the unknown nature of the non-speech material. The complexity is further compounded by the nature of the data, since audio material contains a high density of data samples.^[5]

Traditional Methods & Modern Machine Learning-based Approach

Traditional methods rely on signal processing techniques to reduce noise. Spectral subtraction is a commonly used method, which estimates the noise spectrum and subtracts it from the noisy signal.^[7] Wiener filtering is another classical method that minimizes the mean-square error between the estimated clean signal and the true signal.^[8]

The modern machine learning-based approach is a data-driven way, which particularly has been introduced to enhance denoising performance. Convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformer-based architectures are commonly used in modern denoising pipelines.

These machine learning-based approaches are particularly effective without professional skills in complex acoustic environments where traditional methods may fail.^[9]

Recently, researchers have explored deep neural networks as an alternative to traditional signal processing-based denoising techniques. These machine learning-based approaches learn to map noisy audio signals to cleaner output and have shown effectiveness compared with traditional methods. Such approaches aim at fully leveraging the expressive power of deep networks while avoiding expensive time-frequency transformations or loss of phase information.^[5] The developments have contributed to the practical implementation of audio denoising systems, applications or browser-based tools in the real world.^[10]

Audio Denoising Applications

Audio denoising is applied in multiple domains, including:

Speech enhancement for telecommunications and virtual assistants;
Cleaning holidays or interview recordings for important moment;^[11]
Broadcasting and podcast editing;
Music restoration and post-production;
Accessibility technologies such as hearing aids.^[9]

De-reverberation

De-reverberation refers to reducing or eliminating the reverberation caused by sound reflections in enclosed spaces. Excessive reverberation can degrade speech intelligibility and audio quality, especially in audio recorded in acoustically untreated environments. De-reverberation techniques are commonly used in fields such as conferencing, film dialogue editing, and forensic audio analysis. Unlike audio denoising, which primarily targets stochastic or quasi-random background noise, de-reverberation addresses structured, time-correlated reflections of the original sound caused by acoustic environments rather than external noise sources.^[12]

Click and Pop Removal

Click and pop removal focuses on eliminating short-duration impulsive noise events, such as those arising from vinyl record defects, digital glitches, or electrical interference. These techniques are commonly employed in archival audio restoration and remastering workflows.^[13]

Hum and Buzz Removal

Hum and buzz removal techniques primarily target narrowband noise, which typically originates from power-line interference at 50 or 60 Hz and their harmonics. This process is commonly implemented using notch filters or adaptive filtering techniques.

Spectral Repair

Spectral repair refers to manually or automatically modifying specific time-frequency regions of an audio signal to remove unwanted noise. This technique is often used to address transient noises, such as sudden coughs during recording, microphone handling sounds, or accidental external impacts.^[13]

Applications

Audio cleaner technologies are applied across a wide range of industries and use cases, including:

Music production and post-production;
Podcasting and online content creation;
Film and television sound editing;
Broadcasting and telecommunications;
Archival and historical audio preservation;
Consumer voice enhancement and communication tools.

Terminology and usage

In consumer software and marketing contexts, tools designed for audio denoising are sometimes referred to as audio cleaners or voice cleaners.^[14] The term is informal and does not correspond to a single standardized algorithm or methodology. Its meaning may vary depending on the application domain or software implementation. In academic and technical literature, more specific terms such as audio denoising, speech enhancement, or audio restoration are generally preferred.