Soundwave

Opinion

Why short-form audio is the next frontier for social apps

Audio apps are making it easier than ever for people to express themselves. The next decade is about authentic, short-form audio engagement.

Voice-to-text, virtual home assistants and social audio apps have people talking into their phones again.

Audio apps are making it easier than ever for people to express themselves in more authentic, intimate ways, and they’ve come a long way over the past decade. First, we had books, music, and podcasts on the go, brought to us by central marketplaces like Audible, Spotify, and Stitcher. Then, we had audio messaging with apps like WeChat and Whatsapp, making texting a bit more inconvenient. Of course, speech recognition brought on the ease—and hilarious gaffes—of voice-to-text and virtual home assistants. 

The Internet has long been a text- and image-heavy place. But more and more, we’re using voice to command our lives. We talk to our homes, our cars, our appliances. And more and more, we’re speaking with each other via virtual voice again. The next decade of social is about authentic, bite-sized, voice-based engagement.

As Andreessen Horowitz (a16z) partner and Clubhouse board member Andrew Chen put it in a recent a16z blog post on the future of social audio, “Audio hits different. Listening to someone’s voice is personal, and hearing unedited audio is the opposite of seeing the highlights. It’s about ideas, not the visuals, so it emphasizes a different kind of content that can often feel deeper and more intellectually stimulating.”

Science agrees. Voice-only communication enhances empathic accuracy, research out of Yale University finds. Our voices communicate much more about how we feel than our facial expressions or body language, it turns out, because listeners innately focus with more detail on the nuances of how a speaker is expressing themselves.

“We anticipate that the audio innovation of the next decade will rival what we’ve seen in video apps over the past few years,” Chen says. We agree. It seems the global coronavirus pandemic may have had a hand at accelerating the trajectory, in fact.

The global pandemic starts an audio wave

As we all became confined to our homes during the pandemic, we turned to video conferencing to stay connected in isolation. That seemed logical—replace face-to-face with virtual face-to-face. But soon Zoom fatigue set in. Many of us reverted back to audio-only communications, and in the wake of it all, social voice apps took the world by storm.

Long-form, live audio apps—Clubhouse, Quilt, and Locker Room (now Spotify Greenroom)—gave us a way to gather for real-time conversations, when those weren’t possible in person. We could listen to gurus live for love advice, wellness tips, or career directions, all from our phones. We learned to furiously “flash our mics” with the hope of being heard, the hope of getting our questions answered.

For a while, live audio chats filled a social void—and the big social apps took notice.

The social giants launch audio features

In 2021, two social giants got in on social audio, with Twitter Spaces and Facebook Soundbites making waves. Soundbites—along with Twitter’s now-defunct “voice tweets”—pushes social audio in a new direction: short-form, asynchronous. Why listen to lengthy diatribes live when you can hear thoughtful clips instead, and respond thoughtfully, on your own time? At Soundwave, we’ve been about that since the start.

The social giants, though, are struggling to stay relevant, merely adding new features to already heavy user experiences. The cognitive overhead on Facebook is frightening, for example, which is likely part of the reason the app continues to bleed daily active users in the U.S. and Canada. That, and its aging demographic.

These feature launches from the big guys, though, validate the direction of social audio—it’s short-form, it’s asynchronous. Now that we’re getting back to a post-pandemic world, running a live event from your phone, propped up next to your laptop while you work isn’t going to cut it. Sure, many workers will stay remote, as that’s a global workforce trend in itself, but live voice chats will now compete with the real world. We can go out to bars again, attend conferences, eat dinner across town. Voice is here to stay—but fitting it into a busy schedule and sitting in a chatroom for an hour-long (or longer) conversation will be less realistic.

Audio makes social more authentic

It’s more than just time, though. People are fed up with the overproduced, curated, unrealistic, highlight-reel portrayals of life. We want real, authentic conversations. We want to express ourselves, who we truly are, not just the highlight reel. We want to go deeper than just surface-level.

Most of social media is about status. As a result, more and more social media users are taking “social media breaks”—a 2018 Pew research poll, for example, found that 74% of adult Facebook users had loosened their connection with the app in some way, from taking a break for at least a few weeks (42%), to adjusting their privacy settings (54%), to even cold-turkey deleting the app (26%). Teens, too, are doing the same. The shift is coming at a time when medical studies are noting that increased social media use is linked to increased risk of anxiety and depression, largely as a result of social comparison. 

Social media shouldn’t be about who has the best skin, fastest car, biggest house, cutest babies, and most exotic vacations. That’s what works about instant-recorded audio. It isn’t flashy—it’s just an unedited voice.

If the pandemic—and all of its social unrest—taught us anything, it’s that each voice matters. Social media activism soared, particularly within the Black Lives Matter movement against racial violence and police brutality. 

The future is short-form & on our time

There’s a reason audio clips on iMessage work—we know what we’re getting, and it isn’t a big lift to listen to—or record—a minute of audio.

Participating in a drop-in, live chat room, though? It’s exhausting. First, there’s a divide between the speakers and audience—a clear hierarchy is formed, delineating who has more social capital. Then, there’s the mic flashing for the plebes—if you want to ask a question, a flash might get you noticed. Maybe, maybe not. Finally, there’s the time sink and scheduling. Chats are scheduled for specific times, and some literally go on for days. Who has time for that?

The drop-in, non-stop, live audio chat model is draining for many users—as one frequent host put it, “Some of those chats are not that enthralling.” Unfortunately, that is the reality when you put 10 moderators on a virtual stage for 12 hours and expect them to produce something worthwhile. Yes, highly-produced, one-way podcasts miss the exciting element of live authenticity; but they certainly come with the upside of being edited in a way where every second counts. Some of that discerning eye—or ear—comes with short-form, asynchronous audio.

When you give people a time limit and the ability to stop and think, they produce more thoughtful content. On Soundwave, we have users engaging about bad habits, free speech, weird dates, things they’re grateful for, and everything in between. All in under 60 seconds, and all in their own time.

That’s what was so revolutionary about a tweet—140 characters, tell me what you think. Make it count.

Also, in a world where most of us fear public speaking more than our own deaths, live voice just isn’t going to cut it for the masses. We can’t all be moderators or audience members who ask great questions. But we can all sound off on sexist tiny pockets and late-night munchies when given the time to think for a second. Being in the right mindspace is when magic happens. It’s when genuine intellectual conversation happens.

More than ever, people are thinking about what being in the right mindspace means for them. We are more intentional with our time, consumption, and online interactions. Gen Z—and to a lesser degree, Millennials—are the generations of mindfulness, purpose, balance, and activism.

The flashy, overproduced, overcurated reality of today’s image-driven social media won’t last. The promise of voice is authenticity and connection, which the world craves now more than ever.

Everyone should have a place—a voice—on the Internet. Not just the influencers and celebrities. We all deserve to be heard, and we all have something unique to say. Voice is how that will happen—and it will be short and on our time.

Download Soundwave for iPhone and iPad to speak your mind.

Dan Ndombe

Co-founder and CEO of Soundwave — Just a self-taught programmer kid from Africa turned startup founder in Silicon Valley. Re-imagining the way we connect, express ourselves, and tell stories online.

Previous post
Next post
Voice-to-text, virtual home assistants, and social audio apps have people talking into their phones again.