Tuesday, August 19, 2008

Who's that girl?

Like the great early Madonna reference in the title? Huh? Huh? No? Well, I am wearing about 10 plastic bracelets on my left arm right now, regardless.

K-Box, also known as WifetoBox, etc. whose name will get even longer soon, asked about speaker identification in a comment for the post about soup. Clearly related topics. How, she asks, can she have so much trouble understanding various types of intonation in speech, and yet have no trouble identifying WHO is speaking on the phone, in person, behind walls, inside large plastic bags, etc?

Well, if Sammy and K-Box have diagnosed K's intonation troubles accurately, she has trouble hearing fine-grained distinctions in pitch. If the voice doesn't go up quite a bit, say at the end of a sentence, she can't tell it's a question without using other cues like context, funny look on the speaker's face, a big question mark balloon above Sammy's head, etc. But these pitch distinctions that she has trouble with are indeed rather small compared to the entire range of pitches that the human ear can hear. A common rise in pitch at the end of a question might be 20-30 Hertz (Hz), which, while big enough for most people to notice, is not that big. Perhaps K needs a good 40 Hz change before she notices the difference. This would match with what she also says about difficulties with music. Notes that are really close together are hard for her to distinguish, but not notes that are far apart.

All of this points to 1) difficulties in small changes but not large ones and, more importantly, 2) something being a bit different in K's brain, not in her ear. Maybe she can tell us if she's ever been to an audiologist, but I would guess that the part of her ear which analyzes frequency works within normal parameters, but when her brain tries to process that frequency information, it's not as precise as in many people.

But what about knowing who a person is, which K can do just fine? Speaker identification is related to frequencies as well, but in a very different way. The sheer scale in frequency distinctions is just much larger. One of the ways we do speaker identification is by listening for tell-tale signs of how the other's vocal tract is shaped. To understand this, we have to talk about the vocal tract.

So, your voice is producing many different frequency patterns at the same time. The basic frequency, called the fundamental frequency, is created by your actual vocal folds/cords flapping back and forth inside your throat. In typical men, it can vibrate from about 50 times a second (50 Hz) to, say, 300 times a second (300 Hz). Typical women have a higher range going from the low hundreds to the 400s. Children's vocal folds flap even faster and can get in to the 600s or so. This fundamental frequency is the pitch of your voice when singing and controls the type of intonation that K has trouble with. So, I might be speaking with a fundamental frequency from my vocal folds of about 150 Hz and to signal a question I raise up to 180 Hz at the end of the sentence. But this fundamental frequency is just one of the frequency patterns coming out of my mouth.

The sound from your throat goes into your vocal tract, your oral and nasal cavities, and bounces around. The air resonates at different levels. By changing the shape of your vocal tract, i.e., by opening and closing the mouth, by moving your tongue and lips, by letting air go into your nose or not, you change how the air resonates in there. It's exactly like having a bunch of bottles that you blow across the top of to make a sound. You put more and less water in the bottle and get a different pitch because you've made the amount of air in the bottle be smaller or bigger. You do the same with your tongue. For vowels, there are two primary resonances, called formants, that mark the vowel. The vowel "ee" has one resonance that is very low, say in the 200s (Hz), and a second resonance that's quite high, getting close to 2000 Hz. Other vowels have resonances that are closer together, but they all differ in frequency by a hundred or more. That's a much bigger difference the 30 Hz little shift in fundamental frequency used in intonation. So K might be able to hear big differences created by the resonating vocal tract, but not small shifts in fundamental frequency from the vocal folds.

I actually still haven't talked about speaker identification yet. The way the air resonates in your head depends upon the shape and size of your head. We have partial control over this, and we use that to speak. But we do not have complete control. If the distance from my vocal folds to my lips is 17 inches while K's is 16 inches, there's not much either of us can do about that. That's just the way it is. Some of us have heavy heads, some light heads. For some, the roof of the mouth arches just slightly behind the teeth and for others it goes booming up dramatically. All of these subtle differences in head shape change the way the air resonates. It's partly what distinguishes male voices from female voices. Even if a man speaks with a fundamental frequency that is identical to a woman, it often sounds like a man talking in a high voice, and not like a woman. That's because the two main tubes in the oral cavity are closer to the same size in women (generally) than in men (generally) so that the air resonates differently.

So, each person has a unique shaped head, and we come to know the way that speech out of a head of that shape sounds. The resonances created by the head are also hundreds to thousands of Hertz apart and so might be more easily recognized by K than small frequency changes would be. FYI, there are other characteristics of our speech as well besides head shape. We speak at different rates, break up phrases differently, put more accents or less accents into our speech, and of course each have our own unique way of speaking our language.


Robin S. said...

It never occurred to me that head shape had anything to do with the sound of a voice. That's really interesting - thanks for this!

Mommy to Ander and Wife to Box said...

Best. post. ever.


writtenwyrdd said...

Always fascinating when you launch into your area of study. I knew about head shape, but the bit about brain processing problems was neat to learn about.

Wish my inability to distinguish what someone's saying with background noise going on was due to that and not too many loud rock venues without earplugs!

Sammy Jankis said...

Awesome post. I wonder if MSN would hire you as a contributor on one of their science sites? I bet there are lots of people in the world who would wander across this post and stay to read.

Sammy Jankis said...

Also, is "timbre" the correct word to describe the quality of a sound? Like, two voices or instruments producing a C note can sound different because of the shape of the source?