Imagine it’s 2031 and you’ve sat down for dinner with your family.
It’s middle of the week so tonight’s meal is nothing too ambitious, mac and cheese or fajitas. As is the usual routine, you catch up with the family and share a few laughs until the meal is finally served, at which point everyone loads their plates and starts chowing down on what turns out to be a tasty dinner (the third one this week!). Soon your youngest -- the finicky one -- asks for seconds.
Congrats parent, another successful meal, but don’t spend too much time patting yourself on the back because here’s the thing: Neither you nor your significant other spent any time preparing tonight’s dinner. Instead, tonight’s dinner -- and every dinner this week -- was prepared in its entirety by a robot, the very same robot who is now in the kitchen cleaning up after dinner and preparing dessert.
Futuristic? Yes. A science fiction movie cliche? Definitely. But the above scenario may also be a very realistic possibility in large part due to an obscure research project involving 32 GoPro adorned home cooks making dinner.
Creating A Technology Big Bang
With any technology that changes the world, there’s almost always a research breakthrough or two that helps unleash innovation. In today’s world of AI and robotics, most experts would agree that one of these technological “big bangs” was a 2012 ImageNet Challenge research team led by the University of Toronto’s Geoff Hinton.
ImageNet is a crowdsourced database of millions of annotated images. The accompanying ImageNet Challenge is an annual contest where teams of researchers in the area of machine vision come together to pit their machine vision algorithms on the ImageNet dataset and against one another to try and achieve the highest degree of accuracy.
Hinton’s 2012 team had what is widely believed to be a breakthrough in AI research by utilizing deep learning techniques to achieve much greater accuracy than before (85%). Since this breakthrough effort six years ago, there’s been leaps forward each year -- today’s ImageNet Challenge teams routinely achieve 95% accuracy, better than most humans -- helping to drive significant progress in all corners of the AI world from autonomous driving to augmented reality to industrial and consumer robotics.
All of which brings us back to the kitchen.
And Now Into the Kitchen (The Epic Kitchen)
Now, a group of research academics is trying to create what is the equivalent of an ImageNet for the kitchen. Called EPIC-KITCHENS, the project is an ambitious effort to capture people performing natural tasks in their home kitchens like cooking, cleaning and doing laundry and then release the resulting millions of annotated images into the wild. The ultimate goal behind EPIC-KITCHENS is to create an open dataset about kitchen-centric objects, behavior, and interactions upon which researchers across the world can then focus their deep-learning algorithms on in the hope of advancing artificial intelligence in the kitchen.
Why the kitchen? According to the study’s lead, Dr. Dima Damen, the kitchen is one of the most complex environments in everyday life for artificial intelligence to master because it involves so many tasks and actions.
“The most challenging type of object interactions tend to be in our kitchen,” said Damen in a phone interview I conducted last month. “We’re doing lots of tasks, on short notice, we’re multitasking. We might be adding something to our meal and moving something around. That makes the kitchen environment the most challenging environment for our types of perception.”
Damen, who is with the University of Bristol in the UK, partnered with researchers at the University of Toronto and Italy’s University of Catania to bring the project to life. The project took about a year to complete and involved a panel of 32 home cooks across ten nationalities in four cities in Europe (United Kingdom) and North America (Canada and US). To capture their activity, each participant mounted a GoPro on their head and went through 1-5 hours of preparing meals, cleaning and whatever else came naturally.
“We gave them a camera, sent them home, and said just record whatever you are doing in your kitchen for 3-5 days,” said Damen.
From there, the participants watched the video and narrated their videos so researchers had an audio track from which to manually annotate the atomized images -- 11.5 million in all- captured in the 55 hours of video.
The result is a massive database its creators hope will help researchers in training their AI systems to better understand the kitchen. Like ImageNet, the creators also hope to foster competition with challenges and will track the progress with online leaderboards.
The data itself is something many will find somewhat mundane:
The above distribution of annotated actions and objects are what you would probably expect: a really long list of things -- like vegetables, kitchenware, spices -- found in the kitchen. Same for actions. The above distribution breaks down pretty much all the verbs we perform in the kitchen such as put, take, twist and so on.
And that’s the point, at least if you’re a researcher hoping to train an artificial intelligence system. Just as this type of granular data helped ImageNet Challenge teams achieve a 95% accuracy rate with their software, the EPIC KITCHENS team hopes to reach a similar level of accuracy. By helping these systems understand what everyday objects are and how people manipulate them in a series of actions every day to do the basic functions of like in our kitchen like cooking and cleaning, the EPIC-KITCHENS data and what evolves out of it can provide a foundation upon which technologists can eventually create robots that act like humans and perform human-like functions in the kitchen.
The result could be an explosion in innovation in spaces like augmented reality, personalized food identification apps and, yes, cooking robotics. And while a fully-functional Rosie the home cooking robot could be the ultimate end-result of this research a decade from now, chances are we’ll see much more evolutionary improvements between now and then in the form of smarter appliances, more capable virtual assistants and more immersive guided cooking experiences.
And oh yeah: if you’re the type who wants to keep the robots out of the kitchen altogether, don’t worry. One of the biggest challenges with machine understanding of food is that the three-dimensional human comprehension of taste, smell and texture is extremely hard to replicate with machines. Add in the difficulty of AI to understand context and it makes me think that while we may eventually get to cooking robots, they may only be average cooks at best.
The real artists, the chefs -- whether home based are on TV -- are probably safe from the robot invasion.