Why Are Audiobooks So Damn Hard to Understand? (Part 3)

Since I wrote Part 2, I’ve figured out one reason ‘reading voice’ is so hard to follow. Ordinary speech has either a consistent volume/energy level, or elevated volume/energy on the most important words. Raising the volume/energy for certain words is like using boldface. Making the volume/energy highest at the beginning of the sentence and tapering off towards the end tells the listener that the beginning is especially important, and the end is unimportant. If the first words are not actually the most important, it throws off the listener. It’s like putting the first three words, and only the first three words, in boldface. Would you want to read a long text like this?

High-quality machine voices are easier for me to follow than professional audiobook narrators. Why? They don’t form a personal connection with the text. They don’t form a personal connection with anything.

First, machine voices never use reading voice. They keep volume and energy consistent throughout the sentence. That makes them easier to understand than most people reading from scripts. But professional audiobook narrators also keep volume and energy consistent. Clark tells voiceover artists/narrators to maintain consistency. So what makes machine voices easier to understand than professional narrators?

One important element in listening comprehension is plain old pronunciation. Understanding clear pronunciation requires less effort than understanding garbled pronunciation. Humans cannot outperform high-quality machine voices for pronunciation clarity. Even experienced professional voiceover artists get tired and slip in pronunciation clarity sometimes. Machine voices never tire.

To complicate things, recording equipment interferes with the clarity of pronunciation. Professional voiceover artists can mitigate this problem by adjusting their pronunciation (in other words, ideal pronunciation for recording differs from ideal pronunciation for non-recorded speech). Higher-quality recording equipment can mitigate this problem (pop filters are the most obvious example). Sound engineers/editors can mitigate this problem. Machine voices don’t put up with this bullshit. They don’t need to pass through imperfect recording equipment.

However, professional narrators with clear pronunciation (they all have clear pronunciation) combined with good recording equipment and sound engineers/editors produce recordings with such excellent pronunciation that this doesn’t explain why they are harder to follow than machine voices. (Also, professional audiobook narrators are harder to understand than amateur podcasters with less-than-stellar pronunciation).

Then there’s pitch.

Though pitch patterns are not fixed in English, some pitch patterns are used more than others. The choice of pitch pattern influences meaning, but even a random good pitch pattern sounds better than a pitch pattern English speakers are unused to hearing (unless an unusual pitch pattern fits the intended meaning). We’re also used to hearing a variety of pitch patterns. If the same pitch pattern is used repeatedly, it sounds monotonous.

In unscripted speech, most fluent English speakers use good pitch patterns. That goes out the window when most fluent English speakers read off a script. Including… professional audiobook narrators.

Okay, professional audiobook narrators don’t have pitch patterns as bad as nonprofessionals reading off a random script. But many of the professional narration examples I’ve heard leave something to be desired. In particular, I’ve noticed a tendency to repeat pitch patterns too often.

Improving the personal bond between narrator and script improves pitch patterns. But Clark also recommends explicitly marking the pitch patterns in the script. For a voiceover artist recording a 30-second spot, this is workable. A narrator recording an 8-hour audiobook? I’d be shocked if they mark all the pitch patterns.

Since pitch patterns in English follow rules, those rules can be programmed into computer software. Even rules like ‘don’t repeat the same pitch pattern too often.’

Good machine voices use pitch patterns. Most of the time, the pitch patterns they assign to the text sound okay. With AI machine learning, some machine voices might even now choose pitch patterns based on the content of the text.

However, machine voices have an even bigger advantage over professional narrators.

Clark says that if a narrator oversells, overacts, or sounds too perfect, polished, or professional, the listener will reject them. Pushing feeling into the voice pushes listeners out. Instead, narrators need to invite listeners into the words so that the listeners believe they are hearing their own thoughts and feelings.

I’ve never heard a machine voice which sounded too perfect or polished. Even the better machine voices still sound like machine voices. That imperfection is oddly reassuring. In fact, I prefer the machine voices which sound a little more ‘machine’ like that the machine voices which fall into the Uncanny Valley where they don’t pass a human but are not completely obviously machine voices.

As far as feelings… it’s funny. I know machine voices don’t have feelings. And yet, when I recall machine voices, I hear feelings in the voice. If I hear a machine voice tell a sad story and later recall the voice, the machine voice sounds sad in my memory. This is despite the voice not sounding sad while I listened. My own feelings color the machine voices so that, in my memory, I cannot separate my feelings from the machine voice.

Perhaps I also color my feelings onto extremely neutral human voices, though I haven’t noticed it. I cannot do this with human narrators who push their own feelings—as some professional narrators do.

We handle listening to all kinds of emotional human voices all the time. But those voices coming from sincere feeling. It’s insincere feeling which interferes with my listening comprehension—and, according to Clark, the comprehension of many listeners.

When the narrator lies—that is, when the narrator says something they don’t believe—the listener knows. Though everyone—the narrator, the listener—intellectually understands that this is an audiobook, if the narrator doesn’t believe what they say WHEN they say it, the listener’s subconscious will interpret the narrator as being a liar not worth listening to. Liars must believe their own lies when they say them in order to convince others to believe them, even if they intellectually understand that they are lying.

Machine voices never lie, because they never say anything they disbelieve. They are incapable of disbelief. Perhaps they are even more sincere than any human. Even the self-narrators of memoirs may have a hint of doubt in what they speak. Machine voices have no doubt.

To be honest, the audiobooks I’ve enjoyed most are machine-narrated audiobooks. None of the screenreading software I’ve tried satisfies me, so I depend on recordings by people with better software. If I ever install screenreading software which can read eBooks and satisfy me, I might listen to a lot more audiobooks. However, because I have plenty of podcast episodes which interest me, this isn’t a top priority.


The one mystery I haven’t solved: why do some people love certain professional narrators?

Are those professional narrators easier to understand (for their fans) than other professional narrators? If yes, what makes them easier to understand? Does listening to a particular narrator for many hours make it easier to follow their narration? Do their fans love the aesthetics of their voices so much that they don’t mind if they are hard to follow? Is there something I’m completely missing?

I have found one professional narrator who I understand more easily than the others: Wil Wheaton. However, I haven’t listened to any of his professional paid work, only Radio Free Burrito, where he narrates texts for free because he loves them so much. Perhaps Radio Free Burrito is more comparable to podfic than official audiobooks. Even at Radio Free Burrito, I find the narration more challenging to understand than Wheaton’s opening unscripted remarks. Though I appreciate that he communicates to me better than most professional narrators, I’m not a fan.

Since I’m not a fan of any professional narrator, I can’t learn through direct observation. I hope those of you who are fans of particular professional narrators will help me solve this mystery by leaving comments.


One last note: I recently launched a newsletter about binge reading book reviews. You can learn more here.

1 thought on “Why Are Audiobooks So Damn Hard to Understand? (Part 3)

  1. Pingback: Why Are Audiobooks So Damn Hard to Understand? (Part 2) | The Notes Which Do Not Fit

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.