Science

realtalk@MIT: Using AI to bring human conversation to life

A new Media Lab program uses AI to create audio medleys from small group conversations. The goal is to build trust and understanding in the MIT community.

In today’s digital age, words and information are everywhere—flooding our feeds, congesting our inboxes, and pinging us with text messages. Yet, written words can never bring life to words the way a human voice can. This unique impact of spoken words is the essence behind MIT’s Center for Constructive Communication (C3), a research group aiming to foster conversations that promote shared understanding.

Based in the Media Lab, C3’s work studies the  “ancient wisdoms” of human conversation—such as facilitated dialogue, deep listening, and community organizing—alongside digital technologies such as social media to better understand how to promote trust between people in the modern age. Their research is headed by Professor of Media Sciences Deb Roy, who, prior to joining C3, spent years researching social media and false news, sparking his interest in developing novel communication tools. Today, C3 works closely with Cortico, a nonprofit co-founded by Roy, to translate C3’s research into practice.

A program that has recently emerged from C3 is realtalk@MIT. “The past president of MIT asked if we would pilot some of those tools and methods to create a space for talking about values within the MIT community,” Roy stated. Elena Sapora, the program’s former lead, describes realtalk@MIT as “a new form of digital social network that is rooted in small group conversations.” Conversations are done in small groups and moderated by a trained leader.  The content is recorded, transcribed, and then fed into an AI model that highlights and extracts excerpts surrounding themes of the conversation. Finally, C3 members string these excerpts together to construct a distinct audio medley for the community to listen to on their platform Fora. Voice medleys for organizations such as Durham's Community Engagement Division and the Newark Opportunity Youth Network are available online.

Students at the Institute can organize a “dialogue project” around a topic of interest, and trained realtalk@MIT facilitators will lead small group conversations to produce a voice medley focusing on common themes. Access to the recordings and transcripts will depend on the goal of the project.

The purpose behind these medleys is to overcome the gap between information and human connection by allowing listeners to hear the authentic voices of others in their community. By “hearing people’s voices, just hearing their accents, intonations, and pauses,” Sapora says of her experience listening to the audio clips, “I felt very connected to the person.” Research by C3 backs this sentiment. In a study to compare voice anonymization methods, they found stark differences between voice conversion (VC), which transforms voices to sound like another person without changing linguistic content such as rhythm and accent, and text-to-speech (TTS), which produces synthetic speech such as in Amazon’s Alexa and Apple’s Siri. When comparing human responses to these natural and synthetic voices, they found that VC received similar scores to original voices in terms of the empathy or trust that listeners feel towards the storyteller. On the other hand, TTS resulted in lower scores regarding trust and respect. For them, this work underlined the value of the authentic human voice in sharing experiences and conducting meaningful conversation.

With this aim of cultivating constructive communication, realtalk@MIT launched by gathering students and faculty to discuss light-hearted topics like hobbies and interests to more challenging ones, such as their values, political views, and experiences before and at MIT. When designing the pilot program two years ago, the C3 team sought to “bring the MIT values statement to life,” including Openness and Respect, and Belonging and Community. Professor Dimitra Dimitrakopoulou, the visionary behind the pilot and a sociotechnical scientist, explains that the questions were uniquely phrased to probe how students’ lived experiences reflect the values MIT strives to embody.

“It’s an art,” Dimitrakopoulou says, explaining that questions are carefully designed to draw out personal stories that may help those in the conversation “see their shared humanity.” For example, they often use questions that start with “Tell us about a time when…” or “Share a moment when you felt...”

While Dimitrakopoulou acknowledges that “MIT values are not the ultimate goal of this program,” they have nonetheless played a key role in shaping the program, which officially launched on August 22, 2024, focusing on incoming first-years. This day-long training for 63 Orientation Leaders (OLs) involved participating in a realtalk@MIT session themselves. Participants were led through a facilitator training, where they engaged with a custom app that allows users to identify key moments from conversations and listen to the highlights of others’ discussions. A “sensemaker” is someone trained to sift through transcripts to find common themes in the discussions. Backed by AI models that recommend themes, Roy says that the sensemaking job is a “human-led, AI-supported process,” holding to C3’s values of encouraging students to develop civic muscle and “harness the power of AI to build human connection and build trust.” The following day, a second training session was held for graduate student leaders.

The response was overwhelmingly positive—50 percent reported feeling very satisfied, while another 38 percent felt somewhat satisfied. Five were neutral, and only one expressed being somewhat dissatisfied. None were very dissatisfied.

Through these dialogues, realtalk@MIT aspires to “create opportunities for people to build civic soul.” They believe building skills such as facilitating small group conversations and asking questions in ways that encourage people to share experiences over opinions are critical yet overlooked in a society filled with polarization and superficial interactions.

Further, hearing how things are said—tone, inflection, pauses—is worth a thousand words. Audio, and original voices allow listeners to feel the weight of the storyteller’s experiences and emotions. 

By applying realtalk@MIT’s principles—asking questions with intention, listening to others’ voices with openness, and responding empathetically—C3 hopes to recast conversations into opportunities to build relationships grounded in respect, understand those who challenge our views, and create a sense of unity rooted in our shared humanity.

“Even in the toughest, most controversial topics,” Roy explains, acknowledging others’ stances by saying “‘That’s where you stand, and I respect that,’ your opinion may not shift at all, but that’s not the goal here. If there’s anything we want to shift, it’s your stance towards the other person. We differ on some things, but there’s also a lot that we share.”