What and how do we hear?

All the processes of recording, processing and reproducing sound in one way or another work on one organ by which we perceive sounds - the ear. Without an understanding of what we hear, what is important to us and what is not, what is the reason for certain musical patterns - without these and other little things it is impossible to design good audio equipment, it is impossible to effectively compress or process sound. What is described here is just the basics. Yes, it is impossible to describe everything - the process of sound perception is still far from being fully understood. These basics, however, may seem interesting even to those who know what a decibel is - we'll go anyway a little further than what is described in the help of audio processing programs...

A bit of anatomy (the device of the ear - short and clear):

Outside we see the so-called outer ear. Nothing special interests us here. Then comes the channel - about 0.5 cm in diameter and about 3 cm in length. Next is the tympanic membrane, to which the bones are attached - the middle ear. These bones transmit the vibration of the tympanic membrane further - to the other membrane, to the inner ear - a tube with liquid, about 0.2 mm in diameter and as much as 3-4 cm long, twisted like a snail. The point of having a middle ear is that the vibrations of the air are too weak to vibrate the liquid directly, and the middle ear, together with the eardrum and the membrane of the inner ear, constitute a hydraulic amplifier - the area of the eardrum is many times larger than the membrane of the inner ear, so the pressure (which is F /S) is amplified tenfold. In the inner ear, along its entire length, a certain thing is stretched, resembling a string - another elongated membrane, hard at the beginning of the ear and soft towards the end. A certain section of this membrane oscillates in its range, low frequencies - in a soft area near the end, the highest - at the very beginning. Along this membrane are nerves that sense vibrations and transmit them to the brain using two principles:

The first is the impact principle. Since the nerves are still capable of transmitting vibrations (binary impulses) with a frequency of up to 400-450 Hz, it is this principle that is used directly in the field of low-frequency hearing. There it is difficult otherwise - the vibrations of the membrane are too strong and affect too many nerves. The shock principle is extended slightly to about 4 kHz with a trick - several (up to ten) nerves strike in different phases, adding up their bandwidth. This method is good because the brain perceives information more fully - on the one hand, we still have a slight frequency separation, and on the other hand, we can still see the vibrations themselves, their shape and features, and not just the frequency spectrum. This principle has been extended to the most important part for us - the spectrum of the human voice. And in general, up to 4 kHz is all the most important information for us.

Well, the second principle is simply the location of the excited nerve, it applies to sounds over 4 kHz. Here, besides the fact, we don�t care about anything at all - neither the phase, nor the duty cycle. Naked spectrum.
Thus, in the high-frequency region, we have a purely spectral hearing of not very high resolution, but for frequencies close to the human voice - more complete, based not only on the division of the spectrum, but also on additional analysis of information by the brain itself, giving a more complete stereo - picture, for example. More on that below.
The main perception of sound occurs in the range of 1 - 4 kHz, the human voice is enclosed in the same range (and the sounds made by most of the processes that are important to us in nature). The correct transmission of this frequency segment is the first condition for natural sound.

About sensitivity (by power and frequency):

Now about decibels. I will not explain from scratch what it is, in short - an additive relative logarithmic measure of the loudness (power) of a sound, which best reflects the human perception of loudness, and at the same time is quite simply calculated.

In acoustics, it is customary to measure loudness in dB SPL (Sound Power Level - I don�t know how it sounds here). The zero of this scale is approximately at the minimum sound that a person hears. Accordingly, the countdown is in the positive direction. A person can meaningfully hear sounds up to about 120 dB SPL. At 140 dB, severe pain is felt, at 150 dB, damage to the ears occurs. Normal conversation is about 60 - 70 dB SPL. For the remainder of this section, reference to dB refers to dB from zero SPL.
The sensitivity of the ear to different frequencies is very different. The maximum sensitivity is in the region of 1 - 4 kHz, the main tones of the human voice. The sound of 3 kHz is the sound that is heard at 0 dB. Sensitivity drops sharply in both directions - for example, for a sound at 100 Hz, we need as much as 40 dB (100 times the amplitude of oscillations), for 10 kHz - 20 dB. We can usually say that two sounds differ in loudness, with a difference of about 1 dB. Despite this, 1 dB is more than a little. It's just that we have a very strongly compressed, leveled perception of loudness. But the entire range - 120 dB - is truly huge, in amplitude it is millions of times!
By the way, doubling the amplitude corresponds to a 6 dB increase in volume. Attention! do not confuse: 12 dB - 4 times, but a difference of 18 dB - already 8 times! not 6 as you might think. dB - logarithmic measure)
The spectral sensitivity is similar in properties. We can say that two sounds (simple tones) differ in frequency if the difference between them is about 0.3% around 3 kHz, and around 100 Hz a difference of 4% is required! For reference - note frequencies (if taken together with semitones, that is, two adjacent piano keys, including black ones) differ by about 6%. In general, in the region of 1 - 4 kHz, the sensitivity of the ear is maximum in all respects, and is not so much, if we take non-logarithmic values with which digital technology has to work. Take note - a lot of what goes on in DSP can look terrible digitally and still sound indistinguishable from the original.
In digital processing, the concept of dB is considered from zero and down into the area of negative values. Zero is the maximum level represented by the digital circuit.

About phase sensitivity:

If we talk about the ear as a whole, nature created them the way they created them, guided primarily by considerations of expediency. The frequency phase is absolutely not important to us, since it does not carry useful information at all. The phase relationship of individual frequencies changes dramatically from the movements of the head, the environment, echoes, resonances - whatever. This information is not used by the brain in any way, and therefore we are not sensitive to frequency phases. However, it is necessary to distinguish between small phase changes (up to several hundred degrees) and serious phase distortions that can change the temporal parameters of the signals, when we are talking not about phase changes, but rather about frequency delays - when the phases of individual components vary so much, that the signal decays in time, changes its duration. Well, for example, if we hear only a reflected sound, an echo from the other end in a huge hall - in some way this is just a variation in the phases of the signals, but so strong that it is quite perceived by indirect (temporal) signs. In general, it's already stupid to call it phase changes - it's more literate to call it delays.
In general, our ear is absolutely insensitive to minor phase variations (although how to say - insignificant .. in general, up to antiphase :). But all this applies only to the same phase changes in both channels! Asymmetric phase shifts are very important, more on that in the next issue.

About volume perception:

A person can perceive the spatial position of a sound source. By the way, the word 'stereo' in the original language, unfortunately, I don't remember which one, means something like 'full'. There are two principles of stereo - perception, which correspond to two principles of transmission of sound information from the ear to the brain (see above).

The first principle is for frequencies below 1 kHz, which are weakly disturbed by obstacles in the form of a human head - they simply go around it. These frequencies are perceived in a percussive way, transmitting information about individual sound impulses to the brain. The temporal resolution of the transmission of nerve impulses allows us to use this information to determine the direction of the sound - if the sound arrives in one ear before the other (the difference is of the order of tens of microseconds), we can detect its location in space - after all, the delay is due to the fact that the sound had to go through more additional distance to the second ear, spending some time on it. This phase shift of the sound of one ear relative to the other is perceived as information positioning the sounds.

And the second principle - used for all frequencies, but mainly for those above 2 kHz, which are perfectly shaded by the head and auricle - simply determining the difference in volume between the two ears.

Another important point that allows us to determine the location of the sound much more accurately is the ability to turn our head and look at the change in sound parameters. Just a few degrees of freedom are enough, and we can determine the sound almost exactly. It is generally accepted that the direction is easily determined with an accuracy of one degree. This technique of spatial perception is what almost makes it impossible to make realistic surround sound in games - at least until our head is covered with rotary sensors .. After all, the sound in games, even with modern 3D maps, does not depend on rotation our real head, so the full picture almost never develops and, unfortunately, cannot.

Thus, for stereo perception at all frequencies, the volume of the right and left channels is important, and at frequencies, where possible, up to 1 - 2 kHz, relative phase shifts are additionally estimated. Additional information - subconscious head turn and instant evaluation of results.

Phase information in the 1 - 4 kHz region takes precedence over loudness differences, although a certain level difference overrides the phase difference, and vice versa. Not entirely relevant or directly contradictory data (for example - the right channel is louder than the left, but is delayed) complements our perception of the environment - after all, these inconsistencies are born from the reflective / absorbing surfaces around us. Thus, the nature of the room in which the person is located is perceived to a very limited extent. This is also helped by huge level phase variations common to both ears - delays, echoes and reverbs.

About notes and octaves. Harmonics:

The word 'harmonic', in general, means a harmonic oscillation, or more simply - a sine wave, a simple tone. In audio technology, however, the term numbered harmonics finds use. The fact is that many physical, acoustic or simply simple mathematical processes give the addition of a certain frequency with frequencies that are multiples of it. A simple (fundamental) tone of 100 Hz is accompanied by harmonics of 200, 300, 400 and so on Hz. The sound of a violin, for example, is almost all harmonics, the fundamental having only slightly more power than its harmonic counterparts. Generally speaking, the nature of the sound of a musical instrument depends on the presence and power of its harmonics, while the fundamental tone determines the note.

Remembering further. An octave in music is an interval where the fundamental frequency changes by a factor of two. Note for the first octave, for example, has a frequency of approximately - 27.5 Hz, the second - 55 Hz. The composition of the harmonics of these two different sounds has a lot in common - including 110 Hz (for the third octave), 220 Hz (fourth), 440 Hz (fifth) - and so on. This is the main reason that the same notes of different octaves sound in unison - the influence of the same higher harmonics is added. The fact is that harmonics are always provided to us - even if a musical instrument reproduces only one fundamental tone, higher harmonics will appear already in the ear, in the process of spectral perception of sound. The note of the lowest octave almost always includes the same notes of all higher octaves as harmonics.

For some reason, our sound perception is arranged in such a way that harmonics are pleasant to us, and the frequencies that get out of this scheme are unpleasant - two sounds, 1 kHz and 4 kHz, together will sound pleasant - after all, this is the essence of one note through two octaves, let and not calibrated according to the standard scale of the instrument. As I said, this is something that often occurs in nature as a result of natural physical processes. But if you take two tones of 1 kHz and 3.1 kHz, it will sound annoying.

So we came to what a chord (triad) is. Musicians know that there are combinations of notes that together sound nice, are perceived as one sound. These are just those three (usually) notes, the even harmonics of which do not interfere with each other, do not pass too close to each other so as not to cause dissatisfaction of the listener, at the same time, other harmonics complement each other in a way that is pleasant to the ear, creating the effect single, harmonious timbre. In this case, only the basic tone of the chord is perceived - the so-called tonic, the note on which the chord is built, the remaining notes are somehow included in the harmonic addition to it.

The octave is a concept useful not only for musicians. An octave in acoustics is a change in the frequency of a sound by a factor of two. We can confidently hear about a full 10 octaves, which is two octaves higher than the last octave of the piano. It's strange, but each octave contains approximately the same amount of information for us, although the last octave is the entire region from 10 to 20 kHz. In old age, we practically stop hearing this last octave, and this leads to a loss of auditory information not twice, but only by 10% - which is not so scary. For reference, the highest note on a piano is around 4 kHz. However, the sound spectrum of this instrument goes far beyond these 4 kHz due to harmonics, really covering our entire sound range. So with almost any musical instrument - the fundamental tones almost never go beyond 5 kHz, you can be completely deaf to higher tones, and still listen to music.

Even if there were instruments with higher tones, the audible harmonic composition of their sound would be very poor. See for yourself - an instrument with a 6 kHz fundamental has only one audible harmonic - 12 kHz. This is simply not enough for a full, pleasant sound, which timbre we would not like to get as a result.

An important parameter of all sound circuits is harmonic distortion. Almost all physical processes lead to their appearance, and in sound transmission they are tried to be minimized so as not to change the tonal coloring of the sound, and simply not to clog the sound with unnecessary, burdensome information. Harmonics, however, can also give the sound a pleasant color - for example, a tube sound is the presence of a large (compared to transistor technology) number of harmonics, giving the sound a kind of pleasant, warm character, practically unparalleled in nature.