Why Are Audiobooks So Damn Hard to Understand? (Part 2)

Continued from part 1.

Two key ideas from Elaine Clark’s There’s Money Where Your Mouth Is are that a) voiceover artists (including audiobook narrators) must have a personal and emotional connection with the message so that it sounds real and spontaneous and b) they must empower the listener through suggestion rather than demand so that the listener feels the message conveys the listener’s idea rather than the narrator’s idea. Otherwise, the message will be difficult for the listener to absorb.

This explains why so, so much audiobook narration is difficult to follow.

The book goes into great detail about how to achieve these goals for advertising voiceovers (for TV, radio, and online commercials). The techniques include much text analysis. Additionally, voiceover artists construct scenarios in which they speak to a specific person. Someone who gets a strong emotional reaction from the voiceover artist, even if they are imaginary. I easily understand commercial voiceovers (even in Mandarin) and the book helped me appreciate how much skill goes into making those voiceovers easy for listeners. I have new respect for commercial voiceovers.

However, it takes much less time to do deep textual analysis and construct an effective scenario on the script for a 30-second advertisement than for an entire novel. Though some professional audiobook narrators do textual analysis before they record, they can’t do the same level of analysis as the commercial voiceover artists because they aren’t paid enough. For a voiceover artist getting paid 200 USD for a 30-second recording (I pulled that number out of my ass, I’m not sure if that’s a typical payment) spending two hours on deep textual analysis makes sense. Doing two hours of analysis on every page of a novel doesn’t make financial sense for audiobook narrators. Okay, two hours of analysis per page is too high since much less analysis is needed once they establish characters and storylines. Nonetheless, audiobook narrators can’t take this to the same level as commercial voiceover artists.

Amateur podcasts, whether they are conversations or a single person ranting at the microphone, usually are spontaneous. They are real because the speakers are saying what they think and feel at the moment, rather than following a script which doesn’t reflect their immediate feelings.

By contrast, the few scripted podcasts I’ve heard are even harder to understand than audiobooks. Therefore, I don’t listen to them. Except for one scripted podcast I heard in Mandarin… which was a drama produced by a radio station and voiced by professional actors playing specific characters.

Clark also discusses ‘reading voice’ which is incredibly difficult to understand. In ‘reading voice,’ sentences start strong and taper off in volume and energy towards the end. Professional audiobook narrators don’t do this, but some authors do when reading their work, which explains why I find some author readings incomprehensible.

Self-narrators have two advantages over professional narrators. First, self-narrators have done a deep textual analysis. Their textual analysis was writing the book. For a professional narrator to match an author in textual analysis, a narrator would have to practically re-write the book. Second, the book’s content stirred the author’s feelings so strongly that the authors literally wrote a book. Professional audiobook narrators rarely have that level of personal connection to the text. My guess is that, if an author avoids ‘reading voice’ and pronounces words clearly, they will outperform a professional narrator 90+% of the time.

In memoirs, the author’s personal bond to the content is that much stronger. Therefore memoirs are the easiest category of audiobooks to understand.

While listening to audiobook samples, I discovered that first person is not the easiest point of view to understand. The easiest is second person. Novels written in second person are rare, but they are over-represented in audiobook recommendations. Every audiobook told in second person I sampled was easy to understand (confounding factor: they were all self-narrated). This is because when a narrator uses the ‘you’ pronoun frequently, they can’t avoid developing a deep impression of who ‘you’ are and why they are talking to ‘you.’

First-person novels evade forming a direct dialogue with the audience, but they still force the narrator to have a strong sense of the narrator’s character and why the narrator is telling this story. Sometimes, audiobooks in first person are still difficult to understand. Third person narration puts the greatest distance between the narrator and the audience. In most third person audiobooks I’ve heard, it sounds like someone is talking to a wall, or more accurately, a wall is talking to another wall in an echo chamber where I’m not invited.

One question remains…

Why are machine voices easier to understand than professional narrators?

Machine voices don’t form personal bonds with the story or imagine scenarios. Machine voices only analyze text in a limited, mechanical way. Shouldn’t they be the most incomprehensible? And yet they’re not. In comprehensibility, they beat most humans reading off a script.

I’ll discuss that in part 3.

19 thoughts on “Why Are Audiobooks So Damn Hard to Understand? (Part 2)

  1. I’m curious what makes “reading voice” hard to understand. I’m guessing that’s the thing where you pause between sentences and take a breath, then start off the next sentence strong, kind of like an auditory capital letter. I can’t say I’ve noticed people reading this way or not reading this way, but it seems like the intuitive way to read and a way to clearly mark the beginnings and endings of sentences for a listener. Is it actually not as helpful as one might think? Or am I misunderstanding what “reading voice” is.

    • I almost want to link to a YouTube video as an example… but then I’d be pointing out a specific author as being really bad at reading aloud their own work, which would be too unkind (especially since someone people who are really bad at reading aloud are good writers). So instead I advise that you listen to a bunch of author readings on YouTube or whatever platform you prefer, observe which ones start sentences strong and lose energy as they get to the end of the sentence, and judge for yourself whether that is helpful to you as a listener.

      Clark’s book includes breathing exercises to ensure that voiceover artists/narrators have enough breath and breath control to do a smooth delivery for the duration of a recording session. I know from my own experiments with recording my own voice that, among other things, it requires a certain level of physical stamina (and frequent breaks).

    • Also, machine voices don’t use ‘reading voice.’ Since it would be easy to program into software (be louder/more energetic at the beginning, be quieter/less energetic at the end of the sentence) I’m sure machine voices would do this if it helped a majority of listeners.

    • Okay here’s an example:

      This isn’t the worst example of ‘reading voice’ I’ve heard, but it is an example, and since Iain M. Banks is dead I can’t hurt his feelings. My question is: what’s easier to understand? His opening remarks, which are unscripted? Or when he’s reading from his book in ‘reading voice’?

      • I think this is a great example! I have to concentrate so much harder to understand the scripted even though i do think he performs it well and i like hearing the work he’s reading out loud (the scripted part), but the conversational start definitely is so much… “Easier” for me in some key way.

      • Heh, that may not have been the best example for me. XD

        Banks has quite a thick (to my ears) accent, and that made it a bit hard to follow what he was saying in his opening remarks, where he was speaking more quickly and informally. When he started reading from his book, he slowed down and enunciated more clearly, which made him easier for me to understand.

        I do kind of see what you mean about the drop in volume. I didn’t find it a problem, but it was only a very short clip, and… well, since the whole point was to see if I could understand Banks’s reading, I was probably putting extra effort into trying to understand him! I can see how the drop in volume might be an issue though, if some of the words end up being too quiet to hear easily. It’s definitely harder to concentrate when you have to strain to listen.

        It’s possible I’m one of the people who doesn’t find audiobooks difficult. It’s hard to say, though, since I hardly ever listen to them. I did listen to books on tape a lot as a child, but I don’t know if that’s a fair comparison, since I tended to listen to the same books over and over again and thus had ample time to take in their stories.

        Anyway, thanks for the example. Next time I do listen to an audiobook I will have to think about this.

      • I agree, this is not the best example. I know of examples which are more obviously bad (where the drop in volume/energy is more extreme). But those authors are still alive and could, in theory, find this blog post through a search engine. I don’t want them to discover this blog THAT way, lol.

  2. Oh a surprise part 3 is coming! I was expecting part 2 to be the end. 😅 I am intrigued by this book you are sharing key points from and would be likely to read it myself… If it had an audiobook version, which it doesn’t lmao.

    I struggle so much with reading a book these days that i don’t even want to bother with non-audiobooks for the time being. One day I’ll likely return to reading paper and e-books but today is not that day. (Btw you accidentally didn’t italicize the first two words of title of the book.)

    I think the thing that helps make podfic so compelling for a lot of us is that the narrators are very emotionally invested when they do often have a neurodivergent level hyperfixation or special interest in fandom, especially this specific fandom and even more than that *especially* that specific ship or character etc, and there is so much characterization the narrator already is invested in and aware of before they even open the story, and so much podfic is indeed character and conversationally driven. The most fun podfics to perform are passionate stories with a strong point of view or emotional dialogue.

    Meanwhile when i go on podfic vacations like the Podfic Winter Chillfest right before the pandemic, or the Podfic Summer Sizzle during the pandemic hit the east coast of the USA hard enough we stopped social gatherings, we are performing in an acting sort of way that feels a lot like a theater camp even though it’s all voice acting. Virtually we couldn’t see each others faces but we knew people were listening in live time, and in person it’s knowing well that people are in the room with you emotionally invested in how you’ll perform each line!

    But in this one from when i sat in a couch in person next to two friends being the narrator in February 2020, i still hear myself starting off strong and tapering off by the end of each sentence as a narrator, but especially only in the very opening part where i try to read the title and author and fandom/relationship out loud, the basic information i am not emotionally invested in the wording of, and don’t know how to say.

    Which reminds me of when i performed a podfic of my own fanfic I’d written and still didn’t really know how you perform or pronounce the title lol. In large part because the way this fic functioned, the title, “How Did She Die?”, is literally a line reflecting something different in each of 4 different chapters, and was chosen by me as the fic author to be something that in text form ties all the chapters together. But I struggled with something as simple as how to translate that to audio because the tone of voice to use wasn’t obvious to me for something like this lol.
    https://www.archiveofourown.org/works/5591770

    It’s just fascinating to me to think about now.

    I’m curious what screenreader technology or machine reading you might recommend i check out because i haven’t really given that much of a chance yet and was curious if I could get into it. It would help me “read” all these books that don’t have an audiobook format but do have an ebook format maybe, but also even to be able to “read” blog posts or articles or educational PDF or textbook while doing something like driving or the dishes. I thought that a screenreader might be even harder for me to follow/get into than most audiobooks but I’m not sure.

    Anyway thanks for this thought-provoking blog post topic. I know i really appreciated the chance to think about it and stuff.

    • Unfortunately, I’m not going to recommend any screenreader technology because I’m not satisfied with any of the machine voice software I’ve tried (to be fair, I haven’t searched very far). Most of my observations are based on recordings of machine voices by people who have really good screenreading software (though I’m not sure what software they’re using). I know a lot of people enjoy getting blog posts, magazine articles, etc. converted to audio via machine voice and treating them as podcasts. WordPress even has a feature which I could use to convert my blog posts into podcasts via machine voice. Hmmm… come to think of it, converting this set of blog posts to machine voice podcasts would be *cough* thematically appropriate.

      • Turns out I misunderstood that feature. It wouldn’t automatically generate a podcast with a machine voice, it would just let me record an audio version of this blog post and link it. Darn.

      • I want to say you didn’t misunderstand it because… I could’ve sworn i saw the exact same thing at some point that left the exact same impression on me. That a machine could automatically turn blog posts into podcasts. They must’ve really been misleading to us WordPress bloggers when they launched the feature or something. Maybe changed their minds? I truly don’t know.

  3. Interesting! I started listening to audiobooks after shelter-in-place started this past year, and I’ve found them both more accessible and less accessible at the same time. Some books have been pretty enjoyable and relatively easy to follow, while others… not so much. The sci-fi genre is especially terrible to listen to without having read the book in print first (or at least have the print version on hand for reference), as I suspected would be the case—the only reason I ended up trying that is because my library didn’t have a digital print version of the book I wanted to check out.

    I suppose what you say is true about self-narrated memoirs being the easiest to understand (as long as the author is good at reading aloud), because thinking back on it now, it seems that the books I read in that category were probably the easiest to follow, although of course it varied by the author. Probably the one that was easiest for me to understand was Michell Obama’s Becoming.

    One of the more interesting ones was Eddie Izzard’s memoir (the title is slipping my mind atm), because she didn’t strictly follow her own writing, she went off on many long spontaneous footnote tangents that were only in the audiobook version, and had them edited together after. I found this pretty easy to follow despite losing track of what exactly she was talking about before that, because I’m already used to her style from her comedy shows. But I imagine it could be difficult for some listeners to follow if they’re less used to her style, or if they have trouble with her accent (I know some people I’ve watched her comedy shows with have said they couldn’t understand a lot of what she was saying without subtitles because of that).

    I wonder if you’ve listened to any of N.K. Jemisin’s Broken Earth trilogy in audiobook format? All of those books are written in second person, but they aren’t self-narrated, they’re narrated by a professional voice actor. I’ve only listened to the first one so far, and I found it very easy to follow, but I’ve already read those books in print, so I have no idea if I would’ve found it as easy if I hadn’t.

    Listening to books I’ve already read before in print is probably my most preferred type of audiobook actually, because then I can have some entertainment that I already know I like while leaving my hands free to do other things like cleaning or cooking, and I don’t have to pay super close attention because I already know more or less what’s going on. Some of the books I’ve reread that way are upwards of 1000 pages, and I’ve tried to reread them before in print but never got far, but this way I could actually finish them again. Although some of them annoyed me because the narrator felt wrong for the role, or would pronounce words with the wrong accent (like Italian instead of French, when the novel is set in a land based on France and the word is supposed to be native to that place).

    I will also say that I find audiobooks MUCH easier to understand when I increase the narration speed significantly! Anything below 1.5x speed is difficult for me to listen to, and I will often fall asleep in the middle of it (which… with some audiobooks, is the entire point lol). Prior to reading your posts, I was thinking that the reason for that was because it better matched my print reading speed, but now that I’ve read this, I think that’s not the only reason. Now I think the sped-up narration also sounds more mechanical, and that makes it easier to understand for me.

    • I had not listened to any of the Broken Earth audiobooks before (though I had considered it because I knew they were in 2nd person). I also haven’t read the books in print (I’ve read other N.K. Jemisin books, I read her blog before her first novel was published, I just… have a lot of books on my TBR list). Just now, I listened to the sample of the first audiobook and did a chore (I always do something with my hands during these tests). It’s not as hard to follow as some professionally narrated audiobooks, but ultimately, I wasn’t able to follow it as I did stuff with my hands. It’s not as comprehensible (to me) as On Earth We’re Briefly Gorgeous or Open Water (self-narrated novels in 2nd person). (Technically, On Earth We’re Briefly Gorgeous alternates between 1st person and 2nd person, but even when it’s in 1st person the narrator often inserts comments about ‘you.’)

      Professional audiobook narrators mispronouncing words/names is a common problem, as I’ve learned from binge-reading Amazon reviews. For example, multiple reviews of a book set in St. Louis said that the audiobook narrator got the pronunciation of multiple locations wrong and, as St. Louis locals, they couldn’t stand it.

      Many, many people prefer audiobooks on 1.5x (for me, if I have trouble following an audiobook at 1.0x speed, I’ll also have trouble at 1.5x, though for audiobooks which I can already follow easily I sometimes prefer 1.5x speed). It had not occurred to me that it might be easier to follow because it sounds more mechanical, so thanks for that insight.

  4. Pingback: Why Are Audiobooks So Damn Hard to Understand? (Part 3) | The Notes Which Do Not Fit

  5. Pingback: Why Are Audiobooks So Damn Hard to Understand? (Part 1) | The Notes Which Do Not Fit

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.