April 25, 2024

Comedy clubs are my favorite weekend outings. Rally some friends, grab a few drinks, and when a joke lands for us all—there’s a magical moment when our eyes meet, and we share a cheeky grin.

Smiling can turn strangers into the dearests of friends. It spurs meet-cute Hollywood plots, repairs broken relationships, and is inextricably linked to fuzzy, warm feelings of joy.

At least for people. For robots, their attempts at genuine smiles often fall into the uncanny valley—close enough to resemble a human, but causing a touch of unease. Logically, you know what they’re trying to do. But gut feelings tell you something’s not right.

It may be because of timing. Robots are trained to mimic the facial expression of a smile. But they don’t know when to turn the grin on. When humans connect, we genuinely smile in tandem without any conscious planning. Robots take time to analyze a person’s facial expressions to reproduce a grin. To a human, even milliseconds of delay raises hair on the back of the neck—like a horror movie, something feels manipulative and wrong.

Last week, a team at Columbia University showed off an algorithm that teaches robots to share a smile with their human operators. The AI analyzes slight facial changes to predict its operators’ expressions about 800 milliseconds before they happen—just enough time for the robot to grin back.

The team trained a soft robotic humanoid face called Emo to anticipate and match the expressions of its human companion. With a silicone face tinted in blue, Emo looks like a 60s science fiction alien. But it readily grinned along with its human partner on the same “emotional” wavelength.

Humanoid robots are often clunky and stilted when communicating with humans, wrote Dr. Rachael Jack at the University of Glasgow, who was not involved in the study. ChatGPT and other large language algorithms can already make an AI’s speech sound human, but non-verbal communications are hard to replicate.

Programming social skills—at least for facial expression—into physical robots is a first step toward helping “social robots to join the human social world,” she wrote.

Under the Hood

From robotaxis to robo-servers that bring you food and drinks, autonomous robots are increasingly entering our lives.

In London, New York, Munich, and Seoul, autonomous robots zip through chaotic airports offering customer assistance—checking in, finding a gate, or recovering lost luggage. In Singapore, several seven-foot-tall robots with 360-degree vision roam an airport flagging potential security problems. During the pandemic, robot dogs enforced social distancing.

But robots can do more. For dangerous jobs—such as cleaning the wreckage of destroyed houses or bridges—they could pioneer rescue efforts and increase safety for first responders. With an increasingly aging global population, they could help nurses to support the elderly.

Current humanoid robots are cartoonishly adorable. But the main ingredient for robots to enter our world is trust. As scientists build robots with increasingly human-like faces, we want their expressions to match our expectations. It’s not just about mimicking a facial expression. A genuine shared “yeah I know” smile over a cringe-worthy joke forms a bond.

Non-verbal communications—expressions, hand gestures, body postures—are tools we use to express ourselves. With ChatGPT and other generative AI, machines can already “communicate in video and verbally,” said study author Dr. Hod Lipson to Science.

But when it comes to the real world—where a glance, a wink, and smile can make all the difference—it’s “a channel that’s missing right now,” said Lipson. “Smiling at the wrong time could backfire. [If even a few milliseconds too late], it feels like you’re pandering maybe.”

Say Cheese

To get robots into non-verbal action, the team focused on one aspect—a shared smile. Previous studies have pre-programmed robots to mimic a smile. But because they’re not spontaneous, it causes a slight but noticeable delay and makes the grin look fake.

“There’s a lot of things that go into non-verbal communication” that are hard to quantify, said Lipson. “The reason we need to say ‘cheese’ when we take a photo is because smiling on demand is actually pretty hard.”

The new study focused on timing.

The team engineered an algorithm that anticipates a person’s smile and makes a human-like animatronic face grin in tandem. Called Emo, the robotic face has 26 gears—think artificial muscles—enveloped in a stretchy silicone “skin.” Each gear is attached to the main robotic “skeleton” with magnets to move its eyebrows, eyes, mouth, and neck. Emo’s eyes have built-in cameras to record its environment and control its eyeball movements and blinking motions.

By itself, Emo can track its own facial expressions. The goal of the new study was to help it interpret others’ emotions. The team used a trick any introverted teenager might know: They asked Emo to look in the mirror to learn how to control its gears and form a perfect facial expression, such as a smile. The robot gradually learned to match its expressions with motor commands—say, “lift the cheeks.” The team then removed any programming that could potentially stretch the face too much, injuring to the robot’s silicon skin.

“Turns out…[making] a robot face that can smile was incredibly challenging from a mechanical point of view. It’s harder than making a robotic hand,” said Lipson. “We’re very good at spotting inauthentic smiles. So we’re very sensitive to that.”

To counteract the uncanny valley, the team trained Emo to predict facial movements using videos of humans laughing, surprised, frowning, crying, and making other expressions. Emotions are universal: When you smile, the corners of your mouth curl into a crescent moon. When you cry, the brows furrow together.

The AI analyzed facial movements of each scene frame-by-frame. By measuring distances between the eyes, mouth, and other “facial landmarks,” it found telltale signs that correspond to a particular emotion—for example, an uptick of the corner of your mouth suggests a hint of a smile, whereas a downward motion may descend into a frown.

Once trained, the AI took less than a second to recognize these facial landmarks. When powering Emo, the robot face could anticipate a smile based on human interactions within a second, so that it grinned with its participant.

To be clear, the AI doesn’t “feel.” Rather, it behaves as a human would when chuckling to a funny stand-up with a genuine-seeming smile.

Facial expressions aren’t the only cues we notice when interacting with people. Subtle head shakes, nods, raised eyebrows, or hand gestures all make a mark. Regardless of cultures, “ums,” “ahhs,” and “likes”—or their equivalents—are integrated into everyday interactions. For now, Emo is like a baby who learned how to smile. It doesn’t yet understand other contexts.

“There’s a lot more to go,” said Lipson. We’re just scratching the surface of non-verbal communications for AI. But “if you think engaging with ChatGPT is interesting, just wait until these things become physical, and all bets are off.”

Image Credit: Yuhang Hu, Columbia Engineering via YouTube