Noise Reduction

So, as I learned during my recordings, church acoustics apply just as much to the heating system as it does to choirs and organs, which means that I have to put the actual creation of my sample patches off, and focus on getting as clean a signal as possible without reducing the fidelity of the organ. Which leads me to look into the process of noise reduction.

I’m going to preface this by saying, I don’t fully understand this process, these algorithms, or half of what I’m going to be trying to talk about in this post.

So, the process of noise reduction can be done in many different ways, but the first thing to understand is the two types of noise reduction algorithms: Fixed and Adaptive.

Fixed Noise Reduction

Fixed noise reduction, or spectral noise gating uses Fourier analysis of sample sections of noise to create a spectrum graph, which is used to gate the audio, reducing the level for any sound that isn’t about the threshold. This threshold varies at different frequency bands in order to sufficiently remove the background sound during sections that are above the threshold.

XNoise – Waves Plugin

This works best for samples which have a consistent noise throughout the entire sample, as the filter is static.

This gating is often combined with other processes such as frequency smoothing and time smoothing, which both are baffling to me for the time being. From what I can gather, these processes help to make it so that the effects of the noise gating don’t degrade the quality of the frequencies that are above the threshold at any given time, but again, I’m still not 100% on what they are or what they do, and definitely not how they work. [1]

Adaptive Noise Reduction

Adaptive filtering is a process which models the relationship between the input and output of a filter throughout the duration of it’s use. This means that it adjusts, or adapts, to the changing signal and as such, adaptive noise reduction is typically used for samples with noise that varies over the duration of the recording.

where x(n) is the input signal to a linear filter

            y(n) is the corresponding output signal

            d(n) is an additional input signal to the adaptive filter

            e(n) is the error signal that denotes the difference between d(n) and y(n). [2]

The process is similar to fixed reduction, but instead of using a Fourier analysis of the noise, the filter is created with a Least Mean Square algorithm. This takes a fixed filter like the Fourier example used above and changes the variables of the filter (ie. the threshold in each frequency band) in response to the input signal.

 

The quality of both of these types of filtering depend on the specific algorithm used. And quality noise reduction software isn’t cheap. Waves noise reduction plugins range between $200 and $600, iZotope’s RX3 runs $1200 for a full version copy, and CEDAR’s noise reduction hardware units (like the DNS1500) run upwards of $5000.

And it makes sense, in a world where audio is often second (or third) fiddle, it’s not always possible to get the best recordings, so noise reduction is a necessity.

Further Reading

The Stereophonic Zoom – Michael Williams

The Stereophonic Zoom is a document which describes a variable dual microphone system for stereo recording. The system described acts as a spacing unit which can alter the distance between microphones as well as the angle in which they face. The interaction between distance and angle creates a variable stereo width.

Williams argues against the implementation of a singular system for stereo recording because “rather than reduce the choice of systems, an effort must made to increase the number of systems available. Each sound recording engineer must have the largest possible selection of systems to choose from, in order to solve the specific problems presented by a particular recording situation and, to express his own personal interpretation, as freely as possible,”.

He describes the accepted standard listening condition pictured below:

stereozoom

Williams continues by describing the importance of the listening environment in determining the characteristics of the stereo width of a recording. He makes recommendations for treatment following the IEC guidelines for a standard listening room.

Williams goes on to describe how sounds are localized, which I describe throughout my comparison of stereo mic techniques, but include timing differences, varying the intensity between two speakers, or a combination of the two methods.

As per my earlier postings, I opted for an X/Y mic setup, meaning that I’m relying primarily on amplitude variation between the two channels for my stereo.

Williams goes on to describe the specs of the “stereophonic zoom” recording device in great detail, indicating distance v. angle, frequency responses, and the relation between direct in reverberation sound through graphs, and describes the test phases and limitations of the device during testing. This section, while interesting has little to do with my way forward, and can be summarized by saying, on paper this system looks good, though like all things audio, it’s subjective and I’d have to hear results before I’m sold.

Sound on Sound: Synth Secrets

In this series of articles, Sound on Sound writer Gordon Reid writes extensively on how analogue synthesizers work. Parts three of the sixty-three part series concern envelopes, what they are, and how they work. Envelopes are important to my process, because while the envelopes of my samples are “baked in”, the principles of attack, decay, sustain, and release will all apply. With this in mind, Reids article, while fascinating, wasn’t particularly useful to me past his very brief discussion of ADSR (Attack, Decay, Sustain, and Release), and his definition of an envelope.

Reid defines an envelope by saying “the graph of the way a parameter changes over time is a visual representation of its envelope,” (Part 7)

The article defines the four parts of ADSR:

  • Attack: The speed at which a sound reaches it’s maximum volume
  • Decay: The speed at which the loudness drops to the sustain level
  • Sustain: The level the loudness maintains until the release
  • Release: The amount of time it takes to decay from the Sustain level to silence

Interestingly enough, Reid describes the ADSR graph of an organ, because of it’s simplicity. He compares it to two other envelopes, though the comparison isn’t particularly useful to me. He says “The organ has a rapid attack and maintains its full volume before dropping to silence when the player releases the key” .

Envelope Graphs

While this envelope describes a synthesized organ, the release on my samples will have to be longer, to facilitate the natural reverb of the church. However, the fact that their is no decay will make looping my samples easier, because the the sustain is longer, and easy to define.

Topics in Sampling: A Short Reading List

This post will be a short exploration of some of the mathematical and digital concepts involved in sampling, which I might attempt to explore further in later posts.

Digital-Analog Conversion

The process of digitizing analog signals (sound, light, etc) and re-outputting those signals as analog again.

Further Reading:

The Nyquist-Shannon Theorem

The Nyquist-Shannon Theorem states that you must sample at least 2x as often as the highest frequency you want to capture.

Further Readings:

Fourier Transforms

A mathematical function which transforms between time or space domain and frequency domain. In audio this can be used to visualize sound into graphs and find the specific frequency of sounds, typically using FFT (fast fourier transforms) and Spectrograph technology.

Further Reading:

Zero Crossings

A zero crossing is a frame of audio where the waveform where it has zero amplitude, it’s not rising to it’s peak or falling to it’s trough.

Further Readings:

ADSR Envelopes

Further Reading:

Comparing Stereo Microphone Techniques

This post will describe and compare a number of stereo micing techniques. In the process of recording my samples I will have to mitigate a number of challenges, which will become the basis for my comparisons. They are:

  • Fidelity: Since a major technical goal of the project is transparency, I want to recreate the experience of being in the room as truthfully as possible.
  • Phase: Because the cathedral is such a live space, I have some concerns about how I’m going to deal with phase. This will be looked at in further detail in a dedicated post.
  • Consistency: Due to restrictions and availability of the Church I will need a set up that is easy to recreate across multiple days.

1. Spaced Pair or A-B

(1)

A spaced pair is a pair of omni-directional microphones, typically placed equidistant from the sound source. The stereo effect is created by the timing differences between the recordings off each microphone (2). Simply put a sound that is closer to the left mic will reach the left microphone faster than it will reach the right microphone. This has a few advantages: Omni-directional microphones typically have a better frequency response than directional microphones (see: Earthworks TC30K [omni] versus the Neumann TLM103), and spaced stereo usually has a larger perceived stereo width (3).

However this comes with some disadvantages. Firstly, spaced pair recordings aren’t mono-compatible. When mixed down there are often phase issues and/or comb filtering (3). This is a concern because I want the samples to be useful in as wide a range of uses as possible, also because I’m as of yet unsure how successful my experiments in stereo sampling will go, so having a good mono fall back is crucial. Secondly, even in stereo there can often be “dead space” in the centre, and avoiding this requires experimenting with the spacing of the microphones (3). This becomes problematic because it will take more time to set up (and time is fairly limited), and it will be harder to reproduce than a coincident configuration.

2. X-Y Pair

File:XY stereo.svg(4)

X-Y stereo is achieved with two coincident or near-coincident directional microphones, aimed between 90 and 130 degrees apart (5). Unlike the A-B system, where the stereo effect is created by recording timing differences between the channels, the X-Y uses small differences in amplitude to create a stereo effect. The advantage of this is that it doesn’t add the recorded timing differences on top of the timing differences from the speakers to the ears, making for a “cleaner” listening experience (6).

This placement also has the advantage of being very mono-compatible (6). This is because the microphones are in the exact same position (on the horizontal plane), meaning that they should theoretically be perfectly in phase (7). They are also much easier to set up consistently because both microphones are in the same position (although it requires keeping track of the angle between the capsules).

The major disadvantages of this technique being that the use of cardioid mics colours the sound and can limit the frequency response in the low end (see frequency charts linked above), and coincident recordings can sometimes be harsher than their spaced equivalent (3).

3. Blumlein Pair

Blumlein Pair(8)

The Blumlein Pair uses two bi-directional, or figure 8, microphones angled 90 degrees from each other. This creates a stereo image similar to that of the X-Y placement, although it also captures the room ambiance because of the rear pickup pattern (9). This seems like an ideal scenario for the organ, because the room itself plays a large part in the sound of the instrument, although this would also remove the possibility of removing extraneous sounds through microphone placement.

This technique also has the disadvantage of using mics that are often affected by proximity, where farther placements of the mic will result in less low frequency response (10). This is a pretty big problem in this situation because the ranks are located around the room and pretty far away from possible recording positions.

One thing that I’m unsure of and need to test/look into further is how the bi-directional recording will work in regards to phase, because my initial feelings are that the microphone will naturally capture an inverted version of a sound when in hits the opposite side of the microphone. The reality is obviously going to be more complex than that, but it’s something I’m going to look into.

4. Mid-Side

File:MS stereo.svg(1)

Mid Side micing using a directional microphone aimed directly at the source as well as a coincident bi-directional microphone 90 degrees off axis from the sound source. The stereo image, much like the X/Y technique, is created by differences in amplitude as opposed to timing or phase (11).  The stereo is created through the phase interactions between the bi-directional microphone and the mid microphone. A sound that’s to the right of the unit will be phase inverted, meaning that you can extract the right signal by subtracting the side signal from the mid signal (12). The big disadvantage here being the extra post-production steps required to create a usable signal. But this is mitigated by the ability to control the stereo image in post production and it’s excellent mono-compatibility.

In terms of the frequency response of the recordings, it’s subject to the same issues as both X/Y recordings and Blumlein recordings, in that both directional mics and figure 8 typically have poor or coloured low frequency response. This can be mitigated by using an omni-directional microphone in the mid position which increases the low frequency response and spaciousness (13).

Both the Blumlein and M-S are similarly easy to set up with consistency to the X/Y configuration, which makes them well suited to the project.

Additional Sources:

Further Reading