Open source for awkward robots

OpenMind’s OM1 is an open source OS for robots that allows robots to perceive, adapt, and act within human environments.

Connect with Jan on LinkedIn and GitHub.

This week’s shoutout goes to user Sean, who won a Lifejacket badge for their answer to Creating the simplest HTML toggle button?.

[Intro Music]

Ryan Donovan: Hello everyone, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I'm your host, Ryan Donovan, and today we are talking about robots. Specifically, we're talking about an open-source platform for robots. My guest for that is Jan Liphardt, who is the CEO and Co-founder of OpenMind. So, welcome to the show, Jan.

Jan Liphardt: Thank you so much, Ryan. Look forward.

Ryan Donovan: So, before we get into robots, tell us a little bit about how you got into software and technology.

Jan Liphardt: Oh, I started life as a physics professor at UC Berkeley, and part of what I did at Berkeley was to build hardware for soft condensed matter physics. That was my first introduction to cameras, and lasers, and software, thermal drift, and power, and vibrations, and a little bit of physical world. But then, over time, I drifted towards software and computing on sensitive data, typically either for social networking or healthcare use cases, and that then ultimately led me to Stanford, where I mostly focus on healthcare. But the hardware bug and the robotics bug has never left me, and some of my earliest memories are admiring all the dreams in movies and books about our future: the rockets, and spaceships, and humanoids, and the flying cars, and things like that.

Ryan Donovan: Were you a Radio Shack guy back in the day?

Jan Liphardt: Of course. Amateur radio, and rockets, and everything you can possibly imagine. Fixing cars. All of that.

Ryan Donovan: So, in the pitch, you were billed as researching the intersection of AI, biology, and decentralized systems. What does that mean for your research outside of the deep physics stuff?

Jan Liphardt: That description sounds incredibly vague and confused, but the reality is quite simple. So, just focusing on robotics for now, one of the things I think most people are seeing is ChatGPT. And many people are trying to wrap their heads around what that means for coding, and software, and lawyers, and medical diagnostics, and a lot of jobs that people do. So, that's a really important new technology that has caught people's attention. And separate to all of that, there's one really new interesting thing, which is the recognition that LLMs and chatbots are not only relevant to the digital world, but they're also directly relevant to the physical world. Because if an LLM is able to generate photorealistic video and write computer code, you can extrapolate from that, 'oh, maybe large language models are also very good at generating actions that suitable hardware can execute in the real world.' Move, jump, speak, laugh, engage, navigate, find, and so forth. And that is what caught my attention something like two years ago, and that's really the genesis of trying to build software for this future where machines around us can learn, and make us laugh, and listen, and all those things. The decentralized side comes from a very specific concern and fear I have. And early on, something like a year ago, I was walking down the street where I live with one of my humanoids, and in those days, a year ago, that was really unusual, and so people would come running, and their kids would wanna hug the humanoid, and the dogs would come running. It was all so new, and one of the questions I got is, 'hey, Jan, how come you're not scared?' And I said, 'oh, the software is open source, so go see what's going on.' And two, we wrote Asimov's Laws of Robotics onto Ethereum. And a lot of people [who] think about blockchains and decentralized systems and so forth, think it from the lens of NFTs or some kind of scam, or some kind of defy. But what's really interesting about decentralized systems and blockchains is this property of immutability. And that strikes me as a really important technical requirement for any kind of system that is autonomous, and learns and may steal, and cheat, and mislead. In retrospect, the fact that there are global systems that are public and immutable is super convenient, and could be that those kind of systems, due to their immutability, turn out to be really important parts of a sort of governance and guard railing system for autonomous machines. So, that's a little bit of a summary for, 'how on Earth did you end up doing this and why do you also care about decentralizing the whole systems?'

Ryan Donovan: That's really interesting. We just put a question in the newsletter from one of our sites about somebody asking, 'did Asimov talk about how to code the laws of robotics?' So, the question is: how to do actually encode those laws of robotics so that they could be computable, immutable, repeatable?

Jan Liphardt: Whatever we did now is janky and quick, simple, and basic because we needed something; I wouldn't call this the reference implementation for the future. The way we've designed our operating system OM1 for humanoids is as a bunch of models that all are communicating through natural language. So, like the vision language model will say, 'I see a famous journalist in front of me called Ryan,' and then the battery subsystem will say, 'your batteries are fully charged,' and inertial subsystem will say, 'you are standing up.' And then, we take those sentences, we fuse them into paragraphs, and those paragraphs then go to systems of large language models to argue about what the next best action is for the robot to take. And so, by virtue of all the internal communications in the software being natural language, it's very easy for us to figure out which model is saying what, and then how to add natural language guardrails to what amounts to a dynamic conversation of many models. The way we store Asimov's laws is in natural language on Ethereum in a smart contract standard that we help co-author, which is explicitly designed to make it easy to write constitutions and rules, and then have robots download them and use them to bias or guardrail their actions.

Ryan Donovan: That's interesting, that sort of internal monologue of models. I assume you explicitly modeled it after that sort of human process, yeah?

Jan Liphardt: Precisely. And the only other thing we've done is we've added your mother. So, the system of large language models that is doing data fusion and action generation, there's one model, 'the mother' or 'the referee,' which is figuratively speaking, positioned behind and above the humanoid, and observing the interaction of the humanoid with the human in front of it. And every 30 seconds it says, "hey, Ryan, you're slouching. Go stand up straight. Go look at the humans in front of you. Consider not starting every third sentence with 'oh,' or the human in front of you looks bored – consider changing your behavior." So, there's an internal monologue, but then there are referees or coaches that are providing regular input to the internal monologue, as well.

Ryan Donovan: For some people, that might be anxiety, but could be useful in this case.

Jan Liphardt: Yeah. So, maybe your 'mother' is not the best way to phrase this, but a coach, mentor, teacher–

Ryan Donovan: A corrective voice in your head, right?

Jan Liphardt: Yes.

Ryan Donovan: Talk to me. This is an open-source platform. This is something that anybody can contribute to?

Jan Liphardt: Absolutely. And so, come join us at GitHub. Go look up OM1 or OpenMind. The future I'm personally scared of is five or 10 years from now, imagine your doorbell rings, and you open the door, and there's this humanoid standing there saying, 'hello, I'm your new humanoid, and I come preconfigured with 375 different skills.' And then, for me as a parent, or developer, or teacher, I really wanna know what's going on inside there. I don't want this to be this mysterious piece of magic. I wanna be able to observe, and interact, and improve, and trust, and to my mind, that really means the software stack that is open. I don't want this humanoid to be like my Tesla, for example, that does over-the-air updates every few days, and I have absolutely no idea what's going on inside there. That's really the fundamental motivation for building an open source software stack for human-focused robots.

Ryan Donovan: That is an interesting consideration because, obviously, you own these devices—like the Tesla or the robot—but the software is controlled, is managed, and it has functionality that you may not understand or know about. Is it possible for somebody without the technical skills to push code, or read code to get a sense of what the robot is thinking?

Jan Liphardt: That's exactly what we want, so we had the first app be contributed to the OpenMind app store for humanoids, and that's something we don't just wanna see one or 10. We wanna see thousands of apps each focusing on a particular skill or capability that when you download it to your humanoid, it acquires that new skill or functionality. And we want all of those to be contributed by normal developers, just like they would build an app for the Apple App Store or Google Play. And we are really modeling this on The Matrix, where, of course, Neo is able to learn jiujitsu by downloading a skill chip. So, we see this conceptually as being exactly the same. We think of humanoids as just the next kind of cell phone. It's a cell phone with arms, legs, sensors, compute, battery, and so forth, just like we're all familiar with. And so, then of course you need an app store, and I really want this future to be open. I want developers everywhere to be able to participate actively in this future, and I don't want to have robot software being downloaded from one company as an encrypted payload that somehow controls humanoids in my family, or at my workplace.

Ryan Donovan: So, one of the things I've thought about and [have been] curious about with the robotics platforms is you have this deployable on any given device that has any number of input sensors, any number of movement parts – how do you account for any given possible input, output, and functionality?

Jan Liphardt: That's one of the major problems right now, but it's something that software developers have had to deal with for the last 60 years. Imagine the number of keyboards, and mice, and displays, and printers, and ethernet cards, and GPUs the computer industry has had to tackle, and the standard answer there is some combination of standards and drivers. So, there's generally a core operating system, and then when you buy a new camera, it comes with a driver that's hopefully compliant, which [it isn't always]. And the same future is playing out in robotics, where generally, most robots these days come with one of three chips. [It's] gonna be an Nvidia Thor or a Rock Chip, or of course, Elon is building his own custom chip for Optimus. The other chip we really like is the Apple silicon, which is incredibly powerful and power-efficient. So, it's like the sleeper chip in humanoids, Max silicon. And then, most of the hardware that's plugged into the computer is super generic. Microphones, lidar, cameras, other sensors, and generally they come with their own drivers. And every single humanoid we buy comes with basic movement capabilities: stand up, jump, translate, oiler angles, get down. 'Sit down' is still really rare and difficult, but more and more we're seeing the humanoid robot hardware come with pretty sophisticated movement capabilities, and then generally you're able to plug in new sensors. So, on some level, it's super annoying, but also a solved problem.

Ryan Donovan: It's just a question of getting the right drivers, and I'm sure anybody with a computer has known [that] updating drivers is a constant process.

Jan Liphardt: Please shoot me. I totally feel your pain.

Ryan Donovan: Yeah. And so, how do you do that in a sort of mobile device like a robot?

Jan Liphardt: That's a long answer, but one thing we do as a company so that we don't go crazy is when we buy a new type of humanoid—let's just say a UBTECH, or a DOBOT, or a LimX, or a Unitree, or an EngineAI, or a Booster—so, those are just a few of the humanoids you can buy commercially today. Typically, they all come with an Ethernet jack, and then we plug in an Nvidia Thor into that Ethernet jack, and all the extra sensors we add go directly into the Nvidia Thor. So, most of the driver-level work we do in partnership with people like Nvidia and RealSense focuses on a very small number of combinations of the RealSense camera with the Nvidia Thor, and then all the compute is done on the Thor or in our cloud. And then, all the basic data and actions flow through something like Cyclone DDS or Xeno, which is robot middleware going through the Ethernet. So, that reduces the driver combination space very dramatically because we attach a standardized backpack called a brain pack to all those different humanoids.

Ryan Donovan: And those humanoids are a single piece of hardware, right?

Jan Liphardt: Correct. And it used to be– there's, right now, the humanoid robotics field is really balkanized. There's some people who care only about motion, so they're gonna care about dancing, and jumping, and iPhone assembly, and chopping onions, and making noodles, and wet wiping your floor. So, that community is very centered on fast, amazing, perfect movement typically articulated through hands. So, they're very focused on hands, and that is an awesome thing if you need a humanoid to chop onions; however, most people I know do not derive their value by chopping onions or wet wiping floors. They're valuable because they can engage, teach, support, listen, find people lying on a floor potentially needing help. And so, all of those skills really just require eyes, and a mouth, and ears, and a brain. And so, then there's people like us who are very focused on what are all the amazing things that humans do that do not require 10 fingers and 10 toes. And that's why our software does not really support drone targeting or balancing on your big toe, because we consider those to be almost secondary compared to all the awesome things that people do when they interact with other people in a healthcare setting, educational setting, workplace, your family, and so forth. For us as a company, we want to help robots be useful extremely quickly. So, we are deemphasizing the movement, assembly, manufacturing, onion chopping, and we're emphasizing slower tasks that are much more on speech engagement, spatial understanding, memory, and things like that. And so, that's how we get away with our software architecture, which is slower compared to an end-to-end AI, or world model, or a vision action model, or something like that.

Ryan Donovan: It does seem like the movement stuff, it's a full body problem—the balancing, the five finger articulation—it seems like that is a lot of computation that wouldn't go to the sort of language model stuff. Would it get in the way?

Jan Liphardt: Exactly. They're simply too slow and they're not well-suited. The LLMs solve a totally different problem. The problem they solve is the data fusion problem, and they solve the, 'what is the best decision for Ryan to take next?' That's what they solve. And then, once the decision has been made, maybe for example, you Ryan, just decided to pick up a red apple. Once the decision has been made, then you need to translate that decision into successfully picking up an apple. And that's where things like Gemini robotics, or world models, foundation models, vision action models, and things like that come into play.

Ryan Donovan: Do you think it'll be possible soon to fuse the two – the motion and the cognition pieces?

Jan Liphardt: Yes, that's happening extremely quickly, but we don't see those as antagonistic. We use world models, and foundation models, and robotics-focused models every single day. Gemini Robotics, and Physical Intelligence 0.5, and others, they just live at the motion level of the stack. And on top of that, you have the memory, social, spatial understanding, face recognition, optimal decision making and tactics level, and that lives one layer above the foundation models, or world models.

Ryan Donovan: I would think that they'd be competing for the same sort of compute, and hardware, and power resources.

Jan Liphardt: Precisely. Yeah. No, that's exactly right, and I think [for] everyone, it really depends on what problem you're trying to solve. If, for example, you're building an autonomous torpedo for the US Navy, then there is no such thing as cloud compute. You have to do everything locally. And then, of course, all the autonomy you do in the setting of an underwater torpedo robot is of course competing for exactly the same memory, and the same battery, and things like that.

Ryan Donovan: We're a little ways off before we have a robot that can find John Connors.

Jan Liphardt: Yes. Stay tuned. Probably not this year.

Ryan Donovan: I was at re;Invent a while back, and there was an Nvidia AWS showcase on physical AI. It seems like right now there is a big push towards physical AI. There's a lot of companies doing it. Why do you think that now is this sort of steam engine time for robots?

Jan Liphardt: On the purely technical side, I think it's clear to many roboticists that all fundamental problems have now been solved. For example, at CS this year, for the first time in my life, I saw good robot hands with 10,000-hour mean time between failure, for $1,250. And so, just in a year, robot hands have gone from super expensive, and they almost immediately break to $1,200, 10,000-hour mean time between failure, all five fingers are moving quickly and nicely. So, the hardware is moving very rapidly. The robotic supply chain in China is awesome and frightening at the same time. A hundred plus companies in China just for humanoid, and the speed at which they are innovating and driving down the cost is remarkable. So, hardware is moving very quickly. There are a few major gaps like self-charging. Anyone with humanoid in the home, of course, doesn't want to take the battery out and put it back in. It needs to self-charge. The feet need to get quieter, so it doesn't clank around your bedroom and wake you up and stuff like that. But those are all doable. And on a software perspective, the motion-focused software is moving quickly, and the data fusion decision-making level is also. So, technically it's solved, and of course that leads to a lot of enthusiasm for a technologist, developers, and professors, and students, and sci-fi aficionados. The big gap right now is society. What does the US Nurses Union think about nursing humanoids? What do taxi drivers think about robot taxis? What do electricians think about humanoids that know the California electrical code? What do kindergarten teachers think about the quadruped dogs we deployed into kindergartens in Asia to teach kids? And now that the technology is moving so quickly, that is surfacing dramatic gaps at the level of regulation insurance. As a robotics company, how do you get insurance when your humanoid steps on someone's foot? In a US household, what does the state of California think about humanoids replacing human electricians? What does our school system think about that? And all those things now need to be dealt with, and that's more of a business, go to market, society, humanity set of questions.

Ryan Donovan: I think with LLMs as well, there is a sort of question of liability for when things go wrong. It's harder to just point at a person, but you also touched on the sort of societal impacts of what does it mean for us when robots are a regular part of life?

Jan Liphardt: The number one question I get from parents when they come visit Stanford, they say, oh, what should my kid major in? And I say, that's a fascinating question. 20 years ago, people would've said, 'oh do computer science, or become a physicist, or become a dermatologist, or a tax lawyer,' or whatever the answer may be. But now, of course, there is a lot more uncertainty about what the next generation of quote-unquote great jobs is gonna be. So, this is not only a question for robots; this is a more general question for AI and LLMs. You're absolutely right to point to horrific stories, where middle school kids had long conversations with chatbots and then committed suicide or something. And so, there's so much left to build, and fix, and discuss, and think about.

Ryan Donovan: Yeah, there is a sort of generalized loss of human connection. The more synthetic interactions we have, the less human interactions we have.

Jan Liphardt: I'm of two minds about that, just considering my mother. My mother refuses to charge her iPhone, and her laptop is guaranteed not to be charged. And so, simply seeing my mom on a Zoom call, who lives 6,000 miles away, involves like three phone calls, and I have to wait a day for her to plug in her laptop to charge it. And it's a major production. And if she had a robot in her home that charged itself, and would allow me to dial in, and simply have a Zoom with her, that would be awesome. And that's part of the reason we built Zoom into the brain of our humanoids, so that family members can appear on the screen, on the face of the humanoid, and engage with other people. So, for me, practically, given that my mother lives 6,000 miles away, that would solve this problem of my mom refusing to charge any piece of electronics, and that would actually make it easier for me to see and talk to my mom. But as a general feature of what you're alluding to, I totally agree.

Ryan Donovan: I think that's definitely one of the hard problems that folks have to face up to. I think I've been hearing a lot of people talk about AI as a new renaissance, and I think when people think about that, they forget that there was like 100 years of war of Reformation after the printing press because of the sort of societal disruption of easy access to information. Do you think there's a path through this that isn't just legislation?

Jan Liphardt: So, I don't have simple answers. My main observation is that if you live in a hacker house in San Francisco, you have a very different understanding of what's happening right now than most people. And there is, right now, a very significant communications and information gap. And at the very least, as a society, we should be trying to fix that. We want informed, broad debate and awareness about what's coming down the pipeline. And then, there's a totally separate question about what does society do to prepare. Implications are education, government, our local community, our neighborhoods, our employers – everyone needs to be engaged around just preparing.

Ryan Donovan: I've seen people talking about the new jobs will be cognitive science majors, and philosophy majors, and people understand systems thinking beyond just the technical details of computers. Do you think that's a path through for both jobs and society?

Jan Liphardt: I agree, but also, I have concerns with that. The number one recommendation I've been giving people is: the world around you is changing quickly and will continue to change extremely quickly. So, the onus is on all of us to reinvent ourselves frequently, and pay a lot more attention to what's going on around us. And Stanford has this philosophy of lifelong learning. The notion is that in the dark ages, like the 1970s, it was probably okay to get a college degree, and then your education was, quote-unquote, done, and that model is completely out the window, and you should be prepared to spend part of every single day for the rest of your life learning something new. And so, I don't know what that is. I don't know if it's anthropology or poetry, or I don't know, but you should be prepared to spend every single day of the rest of your life paying attention to what's going on around you, being very nimble, and learning something new. For me, as someone in technology, and as someone who cares about education, and who's a parent, this is the single most fascinating, awesome, and scary moment I've ever been in.

Ryan Donovan: All right, folks, it's that time of the show where we shout out somebody who came on Stack Overflow, dropped a little knowledge, shared some curiosity, and earned themselves a badge. So, congrats to Life Jacket winner @Sean for answering, 'Creating the simplest HTML toggle button?' They earned that badge by dropping an answer that got a score of five or more on a question that had a negative two or less when they got there, and they brought up the question, as well. I'm Ryan Donovan. I edit the blog, host the podcast here at Stack Overflow. If you have comments, questions, concerns, topics to cover, please email me at podcast@stackoverflow.com, and if you want to reach out to me directly, you can find me on LinkedIn.

Jan Liphardt: So, I am Jan Liphardt, and you can find me on OpenMind.org, or you can find me contributing code on GitHub. So, OpenMind.org should be a great place to start.

Ryan Donovan: Thank you for listening, everyone, and we'll talk to you next time.

Open source for awkward robots

TRANSCRIPT

Add to the discussion