Parallels Between Typography and Conversational Design, feat. Robert Bringhurst

I’m reading Robert Bringhurst’s The Elements of Typographic Style. It’s a beautiful work: a poetic book about a practical art, part philosophy and part a love-letter to the craft.

Given that my daily work is in conversational design, I found a few parallels here between typography and the aspect of conversational design wherein text-becomes-speech, which I’ll isolate here by calling it speech design.

Typography is the craft of endowing human language with a durable visual form, and thus with an independent existence. Its heartwood is calligraphy–the dance, on a tiny page, of the living, speaking hand–and its roots reach into living soil, though its branches may be hung each year with new machines. So long as the root lives, typography remains a source of true delight, true knowledge, true surprise.

Elements, Introduction

Speech design is the craft of endowing human meaning with an ephemeral aural form, and thus with a dependent existence. Which is a fancy way of saying: it’s what we all do naturally; it’s speech. And like music, spoken language uses time as its canvas.

Typography is to literature as musical performance is to composition: an essential act of interpretation, full of endless opportunity for insight or obtuseness.

Elements, Chapter 1

Like typography, the conversational designer acts as an interpreter. Instead of drawing on spatial concepts like kearning, leading, measures, ems, and picas–all “levers” that we can pull and change to change the color and experience of reading a page–a conversational designer pulls other levers. They draw on the properties of linguistics and speech by tweaking prosody, pitch, volume, rate, inflection, pronunciation, register, stance, pauses and breaks, sounds and ambience.

These properties are experienced aurally over time. And they are realized through a technology. Where the typographer had his or her printing press, we have our natural-language-processing and text-to-speech engines. These act as our “speech press.” It’s early days, and so we have limited control. But that control will likely increase.

One of the principles of durable typography is always legibility; another is something more than legibility: some earned or unearned interest that gives its living energy to the page. It takes various forms and goes by various names, including serenity, liveliness, laughter, grace and joy.

Elements, Chapter 1

As with typography, so with speech design. The first goal in speech design should be legibility, or better for this medium, clarity. In linguistics, this might be likened to the last of Grice’s four maxims: the maxim of manner, which suggests we should be clear in what we say.

I do not know if speech design should seek as another aim to be serene, lively, graceful, and joyous. Different contexts may require different tones and colors in the speech. But certainly, it should elicit the intended effect

Resistance and the Smartphone

In the latest, tenth-anniversary edition of The Shallows: What the Internet is Doing to our Brains, Nicholas Carr describes a basic problem all our brains have, and how our brains solve them.

At every instant of the day, our nervous system is bombarded by stimuli that may be worthy of our attention–objects in our field of view, sounds and scents, people we know and people we don’t know, ideas and memories, emotions, bodily sensations. From the near-infinite welter of possibilities, the mind has to choose a target. This enormously complicated, enormously important task–nothing so determines our thoughts and behavior as the distribution of our attention–is accomplished through a neural system called the salience network.

“Spanning many areas of the brain,” the salience network gives priority to four kinds of stimuli:

  1. The novel and unexpected
  2. The pleasurable or otherwise rewarding
  3. The relevant
  4. The emotionally engaging

At this point, it’s worth asking: in our modern life, what are the kinds of things that grab our attention in these ways? Carr answers quickly and singly: these four kinds of stimuli that so attract our attention “are exactly the kinds of stimuli our smartphones supply–all the time and in abundance.”

Refreshing their contents continuously, our phones are fonts of new and surprising information. Our phones give us stimulation and gratification whenever we check them, triggering releases of the pleasure-production neurotransmitter dopamine. Because they are deeply person repositories of photos and messages, our phones are always of immediate relevance to us. And our phones are emotionally charged. They send and receive signals of our social status, and they flood us with information on the people, events, and subjects we care most about. Imagine combining a mailbox, a newspaper, a TV, a radio, a photo album, a public library, a personal diary, and a boisterous party attended by everyone you know, and then compressing them all into a single, small, radiant object. That’s what a smartphone represents to us.

This is astonishing. Carr proceeds to describe other forms of media that have drawn us in, but observes that the smartphone is in a league of its own:

Even in the long history of mesmerizing media, the smartphone stands out. It’s an attention magnet unlike any our minds have had to grapple with before. It acts as what Ward calls a “supernormal stimulus” that is able to “hijack” attention whenever it’s part of the surroundings–and it’s always part of the surroundings. With the smartphone, the human race has succeeded in creating the most interesting thing in the world. No wonder we can’t take our minds off it. (emphasis added)

Later, Carr gives yet another phrase: the smartphone’s “colonization of the salience network” has been proceeding since Apple introduced the concept over a decade ago.

A hyper-attractive attention magnet. Supernormal stimulus. The colonialist of our salience network. A hijacker. “The most interesting thing in the world.” This is the smartphone, from the perspective of our capacity for attention.

Now, I’ll be frank: this is scary. For our capacity to attend to what we intend is vital. As Carr pointed out, “nothing so determines our thoughts and behavior as the distribution of our attention.” This is perhaps why the poet Mary Oliver once said that “to pay attention, this is our endless and proper work.” And James Williams, one of the growing number of technologist-turned-technological-skeptics, wrote of his time at Google that he came “to understand that the cause to which I had been conscripted was not the organization of information, but of attention. The digital technology industry was not launching and iterating neutral tools, but directing flesh-and-blood human lives.” This is another way of describing not just the true aims of most companies, but also the nature of attention itself: it is what directs our “flesh-and-blood human lives.”

There’s an extra dimension to all this for me, a practicing Christian. Simone Weil, a Christian mystic, once wrote that “the key to a Christian conception of studies is the realization that prayer consists of attention. It is the orientation of all the attention to which the soul is capable toward God. The quality of attention counts for much in the quality of the prayer.” Mary Oliver, again, also recognizes this: “attention is the beginning of devotion.” If technology so divides my attention, it so divides my capacity to worship–my capacity to be “at one” with my God.

This is all part of why understanding technology, and being selective about which technology I allow to enter my life, seems so vital to me. As Marshall McLuhan once said, understanding is a form of resistance. It’s also the first line of self-defense in our modern world, which David Foster Wallace described as an environment “that precludes everything vital and human.” Hyperbole, perhaps. But if attention is vital and human (and I believe it is), and is being directed by powerful corporate interests via “the most interesting object in the world,” objects that surround us inescapably–well, I must make efforts to either limit this encroachment or push it out altogether. To do any less is to risk my humanity and future.

Why I Started Microblogging

So, I’ve started to microblog. I was inspired by Alan Jacobs’ recent article, getting back to the open web via One of the big reasons he supports starting a microblog this way is is because he owns the content; it’s part of his own domain, his turf. And that’s appealing to me. Additionally, he (and I) can cross-post micro posts to Twitter “without stepping into the minefields of Twitter itself.” And that’s really appealing. And further, I often run across things that I’d like to share but don’t deserve their own post. Outside of Twitter, how do I share it? A microblog creates a space for that.. It becomes, in Alan Jacobs’ words, “a way for me to put everything I do online that is visually small — anything small enough not to require scrolling: quotes, links, images, audio files — in one place, and a place on my own site.”

So that’s why I started. But I wasn’t sure how I’d use my microblog when I did start, or if I’d even keep it up. 8 days in, I’ve had the chance to reflect on how I’musing it: what have I learned about the practice, and myself?

  • I’ve enjoyed linkblogging. When I read something, I can share the link along with a quote or reflection on how it affected me. It’s a great space to think out loud.
  • It’s become my social media home base. I don’t have Facebook or Instagram, but now I have a place to share photos. I have Twitter, but as mentioned, it lets me side-step actually being on Twitter while still sharing on the platform. These blog posts, too, appear on my
  • It’s a record of my thinking and reading that I can look back on. And thanks to IFTTT, it’s all backing up on my Day One journaling app, so I can see it side-by-side with my personal stuff.
  • Every day for the past four days, I’ve posted a photo to go along with the August 2020 photo challenge. I’ve had a few people compliment me on what I’ve shared. I’ve been able to do the same for others. And in a smaller community, that just seems to mean more.
  • As Austin Kleon notes, blogging is a great way to discover what you have to say. My microblog has given me a chance to have thoughts, and this longer blog has given me a space to figure out what it means–to discover what it is I have to say. In other words, my microblog is where I collect the raw materials; my blog is where I assemble them into questions and, perhaps, answers. It’s a place where I figure out what I really think.

I anticipate that my microblog will evolve, and I’ll find new purposes for it, while shedding others. But whatever it becomes, I have to say–I’ve enjoyed it so far. And perhaps that’s the most important thing. It’s a space for short reflection or ideation, coupled with a small community, all on my own domain and turf. And that’s awesome.

Reflections on Conversational Design (2)

Smart Speaker on a desk

Spoken sentences and words are the heart and soul of a voice experience. It’s in these moments, when we “have the mic” (so to speak), that designers can establish personality; express generosity; and create a sonic world for another person to inhabit.

But what goes into crafting, and critiquing, these spoken sentences? Where visual designs have some foundational pillars–typography, layout, and color–what does conversational design have that’s similar? What is the “color” of conversation? The “layout” of VUI design? The “typography” of speech? What, in other words, are the different disciplines that a conversational designer can draw from to craft and critique a conversational experience?

(Asked still another way, what disciplines does a conversational designer need to be fluent in? If I were hiring, what would I be looking for? If I were training a designer, what would I be drawing from?)

Here are some of the areas that I think are important to understanding

  • Linguistics. Written words and spoken words are different. We tend to write in lengthy sentences with a careful structure and a wider vocabulary. We tend to talk in chunks of seven words or so, interrupting ourselves as we go along, and using a simpler, shorter vocabulary. We use more vocatives, we take shortcuts–contractions, ellipsis, other “reduced forms”–and we tend to repeat ourselves, using “bundles” of relatively formulaic phrases. And of course, lest we forget, speech is interactive, which sets itself apart entirely from any kind of academic or news-like writing. A conversational designer should know the basics of how the spoken word differs from the written word, and why that’s important–which is fundamentally a linguistic question. And while linguistics is a large and intimidating field, most of the “speech verses writing” questions are tackled in sociolinguistics–a field that also talks about…
  • The Properties of Speech. Not only is spoken syntax and grammar different, but there’s an added element: speech is, well, speech. It’s spoken! And so it contains “paralinguistic” properties: breath, tone and intonation, prosody, volume, pitch, the speed at which we speak. Speech also has to be vocalized with a voice that has a certain timbre, or particular qualities (i.e. a baritone, smooth, female voice or a low, gravely, male voice). A conversational designer needs to know the basics of speech, and how it’s controlled with whatever technology they’re working with: whether it be a text-to-speech engine, or a voice actor in a studio.
  • Stance and Persona. Technically, this is directly linked to the first two points, but it bears repeating: speech expresses an attitude, toward the other person. We might refer to them as “sir” or “dude,” we might say “Please pass the butter” or (with a blunt imperative) “Pass the butter–now.” All of these suggest emotion and feeling toward the other person in the conversation. This also combines to express a personality: bubbly and outgoing, or short and direct, or clear and professional, or casual and friendly. A conversational designer should know how to establish this “art direction” for voice experiences, and what personality they want to project. This is all vital because people will make judgments about your conversational interface’s personality, even when they should know its a computer. That’s what people do.
  • Memory. Unlike graphical interfaces, which linger in space to be viewed and reviewed, voice interfaces do not linger. Once something is spoken, it’s gone, and resides only in the memory. But memory is limited. So we have to be really aware of cognitive load: we can’t give too many options, nor say too much, in any one turn, lest we overwhelm a person’s memory (or patience). Much of conversational design’s “best practices” comes down to keeping prompts short, sweet, and simple–working with human memory, instead of against it.
  • Sound and Music. Traffic, a honk, hammers, and birds–suddenly, you’re in the heart of a bustling city. A soft vibration, gongs, and steady throbs of “ohm,” and you’re now in a monastery, ready to meditate. The familiar three notes, and suddenly, you’re prepared to hear broadcasters or comedians from NBC. Sound and music can transport you. Or with a short “Ching,” it can inform you (You just got paid!). It can establish mood, or the completion of a task. It can change your emotions, or invoke memories. A conversational designer should know the basics of sound: pitch, rhythm, timbre, and melody, and the varieties of information and emotion it can convey.
  • Platform Limitations and Opportunities. As much as I’d like to design for Jarvis, most voice interfaces are far dumber than that–the burden of weak AI. People can’t speak with computers as naturally as they’d like and expect to be understood. For example, if someone wants a large pepperoni pizza with sausage, pepperoni, and pineapple but with gluten free curst–well, with current limitations, we have to ask for only some of that information at a time. We have to be aware of the limitations, and help the user work with those limitations instead of against it, lest we provoke confusion, frustration, or anger. And we need to be aware of the opportunities each platform and technology affords. These technologies and abilities are always changing; a conversational designer needs to stay abreast of the trends and technologies.

There are other things we need to consider, of course. The general “UX” process of testing with real people; a consideration of context; how voice interacts with graphical interfaces, such as on a smart speaker with a screen; recommended best practices; the nuances of creative writing and crafting a brilliant persona; the drawbacks of VUI design, and discerning which use cases are appropriate for voice and which are not; something of the history of the field. The list could go on. But the above points cover what I think someone should know, first and foremost, to design the foundational artifact of VUI design: prompts and speech. Armed with these concepts, I’ve found that it’s easier to both describe and prescribe the right prompts; to accomplish whatever goal the user has in mind, in the right way.

Reflections on Conversational Design (1)

What is a voice user interface? And what artifacts allow designers to express their intentions, and share it with others? I’ve been mulling over something Rebecca Evanhoe said in a Botmock AMA from earlier this year about these very questions. She said a conversational designer needs to be able to design these three things:

  1. The things the computer says: the prompts I write as a conversational designer
  2. The flow of the conversation–the “conversational pathways”–arising from the things the computer says (and the expectations provided)
  3. The interaction model behind it all, the “grammar” that anticipates what a user might say, and links those intents to an utterance

I like this way of thinking about it. First, it highlights that the pathways (2) and interaction model (3) derive from the the prompts we write (1). Those prompts: these are the beating heart and soul of conversational design. The syntax, grammar, and diction; the prosody, volume, and emphasis; the personality conveyed; the sounds used; all of this emerges from how we write the prompts.

And second, it made me realize something. I was going to argue that the prompts and pathways are really human-centered, and that we really have to deal with platform limitations when we start on the interaction model. To some extent, that’s true; but of course, not entirely. Yes, we have to start with how people actually talk, but anticipate the platform limitations from the very start.

And actually, the interaction model is where we really have to anticipate what people will actually say. A robust anticipation is vital, because otherwise, the conversation will falter: the agent that was designed (by me!) won’t know what someone meant.