Parallels Between Typography and Conversational Design, feat. Robert Bringhurst

I’m reading Robert Bringhurst’s The Elements of Typographic Style. It’s a beautiful work: a poetic book about a practical art, part philosophy and part a love-letter to the craft.

Given that my daily work is in conversational design, I found a few parallels here between typography and the aspect of conversational design wherein text-becomes-speech, which I’ll isolate here by calling it speech design.

Typography is the craft of endowing human language with a durable visual form, and thus with an independent existence. Its heartwood is calligraphy–the dance, on a tiny page, of the living, speaking hand–and its roots reach into living soil, though its branches may be hung each year with new machines. So long as the root lives, typography remains a source of true delight, true knowledge, true surprise.

Elements, Introduction

Speech design is the craft of endowing human meaning with an ephemeral aural form, and thus with a dependent existence. Which is a fancy way of saying: it’s what we all do naturally; it’s speech. And like music, spoken language uses time as its canvas.

Typography is to literature as musical performance is to composition: an essential act of interpretation, full of endless opportunity for insight or obtuseness.

Elements, Chapter 1

Like typography, the conversational designer acts as an interpreter. Instead of drawing on spatial concepts like kearning, leading, measures, ems, and picas–all “levers” that we can pull and change to change the color and experience of reading a page–a conversational designer pulls other levers. They draw on the properties of linguistics and speech by tweaking prosody, pitch, volume, rate, inflection, pronunciation, register, stance, pauses and breaks, sounds and ambience.

These properties are experienced aurally over time. And they are realized through a technology. Where the typographer had his or her printing press, we have our natural-language-processing and text-to-speech engines. These act as our “speech press.” It’s early days, and so we have limited control. But that control will likely increase.

One of the principles of durable typography is always legibility; another is something more than legibility: some earned or unearned interest that gives its living energy to the page. It takes various forms and goes by various names, including serenity, liveliness, laughter, grace and joy.

Elements, Chapter 1

As with typography, so with speech design. The first goal in speech design should be legibility, or better for this medium, clarity. In linguistics, this might be likened to the last of Grice’s four maxims: the maxim of manner, which suggests we should be clear in what we say.

I do not know if speech design should seek as another aim to be serene, lively, graceful, and joyous. Different contexts may require different tones and colors in the speech. But certainly, it should elicit the intended effect

Reflections on Conversational Design (2)

Smart Speaker on a desk

Spoken sentences and words are the heart and soul of a voice experience. It’s in these moments, when we “have the mic” (so to speak), that designers can establish personality; express generosity; and create a sonic world for another person to inhabit.

But what goes into crafting, and critiquing, these spoken sentences? Where visual designs have some foundational pillars–typography, layout, and color–what does conversational design have that’s similar? What is the “color” of conversation? The “layout” of VUI design? The “typography” of speech? What, in other words, are the different disciplines that a conversational designer can draw from to craft and critique a conversational experience?

(Asked still another way, what disciplines does a conversational designer need to be fluent in? If I were hiring, what would I be looking for? If I were training a designer, what would I be drawing from?)

Here are some of the areas that I think are important to understanding

  • Linguistics. Written words and spoken words are different. We tend to write in lengthy sentences with a careful structure and a wider vocabulary. We tend to talk in chunks of seven words or so, interrupting ourselves as we go along, and using a simpler, shorter vocabulary. We use more vocatives, we take shortcuts–contractions, ellipsis, other “reduced forms”–and we tend to repeat ourselves, using “bundles” of relatively formulaic phrases. And of course, lest we forget, speech is interactive, which sets itself apart entirely from any kind of academic or news-like writing. A conversational designer should know the basics of how the spoken word differs from the written word, and why that’s important–which is fundamentally a linguistic question. And while linguistics is a large and intimidating field, most of the “speech verses writing” questions are tackled in sociolinguistics–a field that also talks about…
  • The Properties of Speech. Not only is spoken syntax and grammar different, but there’s an added element: speech is, well, speech. It’s spoken! And so it contains “paralinguistic” properties: breath, tone and intonation, prosody, volume, pitch, the speed at which we speak. Speech also has to be vocalized with a voice that has a certain timbre, or particular qualities (i.e. a baritone, smooth, female voice or a low, gravely, male voice). A conversational designer needs to know the basics of speech, and how it’s controlled with whatever technology they’re working with: whether it be a text-to-speech engine, or a voice actor in a studio.
  • Stance and Persona. Technically, this is directly linked to the first two points, but it bears repeating: speech expresses an attitude, toward the other person. We might refer to them as “sir” or “dude,” we might say “Please pass the butter” or (with a blunt imperative) “Pass the butter–now.” All of these suggest emotion and feeling toward the other person in the conversation. This also combines to express a personality: bubbly and outgoing, or short and direct, or clear and professional, or casual and friendly. A conversational designer should know how to establish this “art direction” for voice experiences, and what personality they want to project. This is all vital because people will make judgments about your conversational interface’s personality, even when they should know its a computer. That’s what people do.
  • Memory. Unlike graphical interfaces, which linger in space to be viewed and reviewed, voice interfaces do not linger. Once something is spoken, it’s gone, and resides only in the memory. But memory is limited. So we have to be really aware of cognitive load: we can’t give too many options, nor say too much, in any one turn, lest we overwhelm a person’s memory (or patience). Much of conversational design’s “best practices” comes down to keeping prompts short, sweet, and simple–working with human memory, instead of against it.
  • Sound and Music. Traffic, a honk, hammers, and birds–suddenly, you’re in the heart of a bustling city. A soft vibration, gongs, and steady throbs of “ohm,” and you’re now in a monastery, ready to meditate. The familiar three notes, and suddenly, you’re prepared to hear broadcasters or comedians from NBC. Sound and music can transport you. Or with a short “Ching,” it can inform you (You just got paid!). It can establish mood, or the completion of a task. It can change your emotions, or invoke memories. A conversational designer should know the basics of sound: pitch, rhythm, timbre, and melody, and the varieties of information and emotion it can convey.
  • Platform Limitations and Opportunities. As much as I’d like to design for Jarvis, most voice interfaces are far dumber than that–the burden of weak AI. People can’t speak with computers as naturally as they’d like and expect to be understood. For example, if someone wants a large pepperoni pizza with sausage, pepperoni, and pineapple but with gluten free curst–well, with current limitations, we have to ask for only some of that information at a time. We have to be aware of the limitations, and help the user work with those limitations instead of against it, lest we provoke confusion, frustration, or anger. And we need to be aware of the opportunities each platform and technology affords. These technologies and abilities are always changing; a conversational designer needs to stay abreast of the trends and technologies.

There are other things we need to consider, of course. The general “UX” process of testing with real people; a consideration of context; how voice interacts with graphical interfaces, such as on a smart speaker with a screen; recommended best practices; the nuances of creative writing and crafting a brilliant persona; the drawbacks of VUI design, and discerning which use cases are appropriate for voice and which are not; something of the history of the field. The list could go on. But the above points cover what I think someone should know, first and foremost, to design the foundational artifact of VUI design: prompts and speech. Armed with these concepts, I’ve found that it’s easier to both describe and prescribe the right prompts; to accomplish whatever goal the user has in mind, in the right way.

Reflections on Conversational Design (1)

What is a voice user interface? And what artifacts allow designers to express their intentions, and share it with others? I’ve been mulling over something Rebecca Evanhoe said in a Botmock AMA from earlier this year about these very questions. She said a conversational designer needs to be able to design these three things:

  1. The things the computer says: the prompts I write as a conversational designer
  2. The flow of the conversation–the “conversational pathways”–arising from the things the computer says (and the expectations provided)
  3. The interaction model behind it all, the “grammar” that anticipates what a user might say, and links those intents to an utterance

I like this way of thinking about it. First, it highlights that the pathways (2) and interaction model (3) derive from the the prompts we write (1). Those prompts: these are the beating heart and soul of conversational design. The syntax, grammar, and diction; the prosody, volume, and emphasis; the personality conveyed; the sounds used; all of this emerges from how we write the prompts.

And second, it made me realize something. I was going to argue that the prompts and pathways are really human-centered, and that we really have to deal with platform limitations when we start on the interaction model. To some extent, that’s true; but of course, not entirely. Yes, we have to start with how people actually talk, but anticipate the platform limitations from the very start.

And actually, the interaction model is where we really have to anticipate what people will actually say. A robust anticipation is vital, because otherwise, the conversation will falter: the agent that was designed (by me!) won’t know what someone meant.

How to Cooperate in Conversation Design

You turn to me, and say, “Any updates on the designs I asked you about?” To which I reply, “That sandwich from Einstein’s was very, very good.”

You’re instantly confused, and for a very good reason. Unless talking about sandwiches is code for something, I was answering a very different question from the one you asked. And this violates something we usually take for granted: when we talk with each other, we’re cooperating. When I lie, or ramble, or reply with something irrelevant, I’ve stopped cooperating.

This idea is known as the cooperative principle. More precisely, it’s the idea that in conversation, we contribute as much to the conversation as is needed, moment-by-moment, to achieve some goal.

Unless you’re a sociopath (and I assume you are not), you do this naturally. In fact, Paul Grice, the man who invented it, means it as a description for how we normally talk, and not as a prescription for how we should talk. Again, we do this naturally. Grice took the natural, and therefore invisible, thing, and made it visible by articulating it.

Thinking for Yourself

But if we all do it naturally, why is discussing the principle important for designers? Put simply, it is easier to cooperate when we talk than it is to write. Why? Here’s John Trimble, in his excellent book, “The Art of Writing”:

Most of the [novice writer’s] difficulties start with the simple fact that the paper he writes on is mute. Because it never talks back to him, and because he’s concentrating so hard on generating ideas, he readily forgets–unlike the veteran–that another human being will eventually bet trying to make sense of what’s he saying. The result? His natural tendency as a writer is to think primarily of himself–hence to write primarily for himself. Here, in a nutshell, lies the ultimate reason for most bad writing.

John Trimble, “Writing with Style”

(And for most bad design, I’d add, but I digress.)

When we carry a normal conversation with other people, those other people are not on mute. We know what is being said and who is hearing it. We can see their faces, and gauge their understanding: are eyebrows raised? Are they nodding their heads? Are they making eye contact? Are they looking away, disengaged? And what do they say in response? Do they ask questions? Are they getting to their goal? All that they say–the content of their speech, the inflection of their voice, their facial expressions and body language–all of these are constantly available, constantly reminding us that we are speaking for others, and constantly telling us whether we’re playing our part well (or not).

When we write, on the other hand, we are, in very real ways, blind and deaf. Writing is a solitary act, and so it is easy to write for ourselves, to think for ourselves. And so–bringing things full circle–we forget to cooperate, to play our part in the conversation.

As writers, we are designers.

Design is often perceived as visual, but a digital product relies on language. Designing a product involves writing the button labels, menu items, and error messages that users interact with, and even figuring out whether text is the right solution at all. When you write the words that appear in a piece of software, you design the experience someone has with it.

Metts and Welfle, “Writing is Designing”

As an interface designer, this is important to remember. As a conversational designer, it is especially important. A conversational interface relies primarily, and sometimes wholly, on the strength of our writing. And the strength of our writing–our capacity to cooperate–relies on how well we understand our audience.

Following Grice’s Maxims

Let us turn back to the cooperative principle: the idea that we should, at each moment of a conversation, contribute to achieve whatever goal. In normal conversation, there can be many goals: to inform; to comfort; just to listen, and offer comfort and presence; to shoot the breeze and get others laughing. All of these are important to our humanity. But as conversational designers designing conversational computer interfaces, we have a more limited set of aims: to inform, to entertain, and/or to accomplish some task. We want to cooperate with the user and help them achieve these ends. What are practical guidelines to do this?

Luckily for us, Paul Grice gave four maxims. Again, these are descriptive–we naturally do these things. They are:

  • Maxim of Quality (Tell the Truth)
  • Maxim of Quantity (Say Only as Much as Necessary)
  • Maxim of Relevance (Be relevant)
  • Maxim of Manner (Be clear)

Let’s talk about each in turn.

The maxim of quality. We should only say what we understand to be true. We shouldn’t say what is false. When we lie, we are failing to cooperate.

The maxim of quantity. Napoleon once said “Quantity has a quality of its own.” He was suggesting that the size of his army–massive for the time–overcame any defects in their training and preparation.

But what is true for the battlefield is not true for conversation. We do not want to provide too much information. And neither do we want to provide too little. We want to provide the right amount. We all know long-winded people who say too much, who go on for far too long to say what they mean. But it’s also possible to provide too little information. Imagine me asking someone, here in New York City, “How do I get to Chicago?” They might say, “Head due northwest for XXX miles.” True, so far as it goes. But also much less information than I was hoping for. Like Goldlilocks trying to avoid the porridge that is too hot and too cold, we try to provide the amount that is not too much or too little, but “just right.”

The maxim of relevance. Be relevant. Go along with the topic. If I ask you for the time, don’t reply with your opinion of how bad the latest episode of the Bachelor was. It’s irrelevant to what I was asking for.

The maxim of manner. Be clear. Make your writing and speech easy to understand and unambiguous. If I ask you where the closest Starbucks is, do not give me the latitude and longitude. It’s true; it’s concise; and it’s even relevant. But it’s not clear, at all, how I’m supposed to use that information. Ernest Hemingway once wrote that “The indispensable characteristic of a good writer is a style marked by lucidity.”

The maxim of manner is arguably the most important of them all. Something can be relevant, true, and sufficient. But if it is not clear, it cannot be judged as relevant, true, or sufficient.

Let Context Guide

As I said earlier, conversation fills many roles in our lives: to laugh, to comfort, to learn, to love, to persuade, to entertain. But for conversational interfaces, the goals are much more limited. They are usually to inform, to entertain, or to accomplish some task. And when we switch between these contexts and goals–not to mention other contexts like physical location or mobility–we need to consider the impact on the situation, in light of Grice’s maxims.

In conversational design, we deal fundamentally in “turns.” (This is, perhaps, the best parallel to what graphic designers call the “artboard”, or put more simply, a screen.) A turn is made up of the utterance (“what the user says”) and the resulting response (“what the voice assistant says”).

As designers, we have the control over the response the voice assistant provides. Amazon stresses a “one-breath test” for the length of these responses. This means that if a single response by Alexa or Google Assistant cannot be said in less than one breath, than it’s perhaps too long. And this is true most of the time. It is true when the aim is to inform or accomplish most tasks. But it is not always true.

Consider Kung Fu Panda, a popular Alexa skill made by RAIN. The turns are much, much longer than a single breath, because the aim is to entertain.

Or consider Headspace, another voice app RAIN made. I was the lead designer for this app, which ties into the popular Headspace product, which offers guided meditations to everyone. The menu is exceptionally simple:

In the first two responses, the goal (getting quickly to a meditation) dictate that we be brief and clear: here are your options. We broke the conversation up into tiers, to avoid an excessively long list of options at the beginning. But once we reached the meditation, we played a ten-minute response: a guided meditation. Far from being too long, this was cooperating with the user: providing them a guided meditation, where they expected to only listen.

A more difficult lesson I learned with Headspace: in the first iteration, we played a short message at the end of the meditation, explaining how to get access to more meditations. I thought this would be helpful. But far from achieving its goal, users hated it. Just when users had achieved some stillness and quiet, we interrupted it, ruining ten minutes of patient silence. Metts and Welfle have said that “when writing is designing… the goal is not to grab attention, but to help your users accomplish their tasks.” We were grabbing their attention again, when our purpose should have been to help them achieve their tasks at every step.

How to Write for Others

Some of the key points:

  • Conversation is about cooperation.
  • We naturally cooperate in normal conversation. But when writing, our audience is on mute. So it’s easy to forget.
  • Grice’s maxims describe how we normally cooperate. We tell the truth; we say enough (not too much or too little); we stay relevant; and above all, we’re clear.
  • Context is important. Conversational interfaces are usually made to inform, to entertain, or to accomplish some task–and sometimes all of these. Keep this in mind at each turn.

How do we do this? I’ll write more about that in another article. The key, of course, is to keep the audience in mind. Never let your writing–whether it be for a blog post, website copy, a chatbot, or a voice interface–go out without having first thought what your audience wants, and how well you’ve provided that.