Since I wrote Part 2, I’ve figured out one reason ‘reading voice’ is so hard to follow. Ordinary speech has either a consistent volume/energy level, or elevated volume/energy on the most important words. Raising the volume/energy for certain words is like using boldface. Making the volume/energy highest at the beginning of the sentence and tapering off towards the end tells the listener that the beginning is especially important, and the end is unimportant. If the first words are not actually the most important, it throws off the listener. It’s like putting the first three words, and only the first three words, in boldface. Would you want to read a long text like this?
High-quality machine voices are easier for me to follow than professional audiobook narrators. Why? They don’t form a personal connection with the text. They don’t form a personal connection with anything.
Continue reading