The robot overlords aren’t here quite yet—but a new video from AI robotics company Figure shows they might be closer than you think.
The startup released a demo on Wednesday showcasing its humanoid robot called Figure 01. Its developers infused the bot with OpenAI’s artificial intelligence, which gives it “visual reasoning and language understanding,” according to a post on X from Figure founder Brett Adcock. This means that the robot can see things and speak with humans nearby.
In the demo, Figure 01 can be seen putting away dishes, cleaning up trash, and handing an apple to a human when they say they are hungry. “I see a red apple on a plate in the center of the table, a drying rack with cups and a plate, and you standing nearby with your hand on the table,” the robot said.
ADVERTISEMENT
Later in the video, Figure 01 shows off its housekeeping skills by cleaning out trash dumped by the human on the table in front of it, and putting a cup and plates onto a drying rack. As it moves, it does so with smooth precision.
Its “speaking” abilities aren’t perfect. There’s a slight delay between the prompt from the human and Figure 01’s response. However, it’s fairly impressive—and downright uncanny at times. In one response, the robot even adds in a filler noise saying, “So I gave you the apple because it’s the only, uh, edible item I could provide you with on the table.”
While it might seem like a small detail, it’s actually fairly astonishing. There’s no good earthly reason to add in filler noises like “uh” and “um” into its responses. Really, it’s a bad way to communicate. However, it is good if you want the robot to seem as human-like as possible—which it indeed does.
Adcock said that Figure 01 utilized “end-to-end neural networks,” which means it didn’t rely on a remote operator controlling the robot from off-screen. Everything in the video was reportedly done by Figure 01 and its programming alone. “As you can see from the video, there’s been a dramatic speed-up of the robot,” Adcock explained, “we are starting to approach human speed.”
“We feed images from the robot's cameras and transcribed text from speech captured by onboard microphones to a large multimodal model trained by OpenAI that understands both images and text,” Corey Lynch, an AI engineer at Figure, said in a post on X. “The model processes the entire history of the conversation, including past images, to come up with language responses, which are spoken back to the human via text-to-speech.”
This appears to be the same technology utilized by Boston Dynamics in Oct. 2023 when they programmed one of their robot dogs to talk using ChatGPT and act as a tour guide for their facility. In that video, the dog Spot can be seen responding to different prompts from different human users.
“Here we are at the snack bar and coffee machine,” Spot said after a human said that they were thirsty.
It’s an uncanny glimpse into what seemed like a sci-fi fever dream a few years ago—but now is inevitable: walking, talking humanoid robots. Hopefully, they’ll mostly just stick to helping us tidy up around the house and not… well.