EPIC-KITCHENS

From the post in 2018:

The ultimate goal behind EPIC-KITCHENS is to create an open dataset about kitchen-centric objects, behavior, and interactions upon which researchers across the world can then focus their deep-learning algorithms on in the hope of advancing artificial intelligence in the kitchen.

Since those early days, the project has continued to progress, recently releasing a newly expanded dataset and publishing the results from the second annual challenge. The first research challenge, completed in 2019, was focused researchers building models that can recognize actions in the kitchen. The recently completed challenge focused on action anticipation, where they asked researchers to predict what action would take place after one second of video.

Researchers who competed in the most research challenge include teams from a variety of universities spanning the globe from Cambridge to Georgia to Singapore as well as some corporate research labs such as the AI team from Facebook.

I recently caught up with the resesarch lead for EPIC-KITCHENS, Dr. Dima Damen from the University of Bristol in the United Kingdom, who told me that the various research teams competing used a variety of approaches to help make their systems better at recognizing and predicting actions based on the information from the video.

“There are some people who’ve used audio,” said Damen. “So they’ve used the audio from the video to identify something like opening the tap versus closing the tap. Traditionally, computer vision has relied on just images without like videos without sound.”

“There are some people who looked on a very big set of things, at what happened the past minute, because that’s helping them. And there are people who said, ‘no, I’ll focus on the objects, like where the hand is, where the object is, that’s a better approach.'”

For the next set of challenges, the group is providing a newly expanded set of data and asking them to focus on things such as “test of time”, where they ask if models trained two years ago still perform well and “scalabilty,” where they will have researchers look at whether more data is better.

Part of the expanded data will be a newly broadened dataset called EPIC-KITCHEN-100, where new footage brings the total number of hours of video captured to 100. According to Damen, the new video is from a cohort that included participants from both the previous study (half of the original 32 participants agreed to participate again) and 8 new participants.

According to Damen, by bringing back past participants, it will allow the computer models to better understand kitchen behavior by factoring in what happens with the passage of time, like in real life, but also better understanding how small changes can impact the results.

“It’s the natural progression, like how life will be,” said Damen. “The question is what happens to computer vision in the meanwhile? So it’s tiny tiny changes, right? It’s a slightly new camera, people might have moved home, and then we’re asking more questions that we believe would be interest to the community.”

Damen said she hopes that her technology can help build better technology and systems that could be of help to humans who need assistance.

“So there are new questions that are being asked which, interestingly, even the assistive technology community is not talking about. As in, if you want to help someone, sometimes you can guess what they’re doing, but many times you can’t.”

Spoon Plus Subscribers can read the full transcript of our conversation and watch my video interview with Daman below.

Imagine it’s 2031 and you’ve sat down for dinner with your family. It’s middle of the week so tonight’s meal is nothing too ambitious, mac and cheese or fajitas. As is the usual routine, you catch up with the family and share a few laughs until the meal is finally served, at which point everyone loads their plates and starts chowing down on what turns out to be a tasty dinner (the third one this week!). Soon your youngest – the finicky one – asks for seconds. Congrats parent, another successful meal, but don’t spend too much time patting yourself on the back because here’s the thing: Neither you nor your significant other spent any time preparing tonight’s dinner. Instead, tonight’s dinner – and every dinner this week – was prepared in its entirety by a robot, the very same robot who is now in the kitchen cleaning up after dinner and preparing dessert. Futuristic? Yes. A science fiction movie cliche? Definitely. But the above scenario may also be a very realistic possibility in large part due to an obscure research project involving 32 GoPro adorned home cooks making dinner. Creating A Technology Big Bang With any technology that changes the world, there’s almost always a research breakthrough or two that helps unleash innovation. In today’s world of AI and robotics, most experts would agree that one of these technological “big bangs” was a 2012 ImageNet Challenge research team led by the University of Toronto’s Geoff Hinton. ImageNet is a crowdsourced database of millions of annotated images. The accompanying ImageNet Challenge is an annual contest where teams of researchers in the area of machine vision come together to pit their machine vision algorithms on the ImageNet dataset and against one another to try and achieve the highest degree of accuracy. Hinton’s 2012 team had what is widely believed to be a breakthrough in AI research by utilizing deep learning techniques to achieve much greater accuracy than before (85%). Since this breakthrough effort six years ago, there’s been leaps forward each year – today’s ImageNet Challenge teams routinely achieve 95% accuracy, better than most humans – helping to drive significant progress in all corners of the AI world from autonomous driving to augmented reality to industrial and consumer robotics. All of which brings us back to the kitchen. And Now Into the Kitchen (The Epic Kitchen) Now, a group of research academics is trying to create what is the equivalent of an ImageNet for the kitchen. Called EPIC-KITCHENS, the project is an ambitious effort to capture people performing natural tasks in their home kitchens like cooking, cleaning and doing laundry and then release the resulting millions of annotated images into the wild. The ultimate goal behind EPIC-KITCHENS is to create an open dataset about kitchen-centric objects, behavior, and interactions upon which researchers across the world can then focus their deep-learning algorithms on in the hope of advancing artificial intelligence in the kitchen. Why the kitchen? According to the study’s lead, Dr. Dima Damen, the kitchen is one of the most complex environments in everyday life for artificial intelligence to master because it involves so many tasks and actions.

EPIC-KITCHENS 2018 TRAILER

Watch this video on YouTube

“The most challenging type of object interactions tend to be in our kitchen,” said Damen in a phone interview I conducted last month. “We’re doing lots of tasks, on short notice, we’re multitasking. We might be adding something to our meal and moving something around. That makes the kitchen environment the most challenging environment for our types of perception.” Damen, who is with the University of Bristol in the UK, partnered with researchers at the University of Toronto and Italy’s University of Catania to bring the project to life. The project took about a year to complete and involved a panel of 32 home cooks across ten nationalities in four cities in Europe (United Kingdom) and North America (Canada and US). To capture their activity, each participant mounted a GoPro on their head and went through 1-5 hours of preparing meals, cleaning and whatever else came naturally. “We gave them a camera, sent them home, and said just record whatever you are doing in your kitchen for 3-5 days,” said Damen. From there, the participants watched the video and narrated their videos so researchers had an audio track from which to manually annotate the atomized images – 11.5 million in all- captured in the 55 hours of video. The result is a massive database its creators hope will help researchers in training their AI systems to better understand the kitchen. Like ImageNet, the creators also hope to foster competition with challenges and will track the progress with online leaderboards. The data itself is something many will find somewhat mundane:

Distribution of actions in kitchen. Source: Epic Kitchens

The above distribution of annotated actions and objects are what you would probably expect: a really long list of things – like vegetables, kitchenware, spices – found in the kitchen. Same for actions. The above distribution breaks down pretty much all the verbs we perform in the kitchen such as put, take, twist and so on. And that’s the point, at least if you’re a researcher hoping to train an artificial intelligence system. Just as this type of granular data helped ImageNet Challenge teams achieve a 95% accuracy rate with their software, the EPIC KITCHENS team hopes to reach a similar level of accuracy. By helping these systems understand what everyday objects are and how people manipulate them in a series of actions every day to do the basic functions of like in our kitchen like cooking and cleaning, the EPIC-KITCHENS data and what evolves out of it can provide a foundation upon which technologists can eventually create robots that act like humans and perform human-like functions in the kitchen. The result could be an explosion in innovation in spaces like augmented reality, personalized food identification apps and, yes, cooking robotics. And while a fully-functional Rosie the home cooking robot could be the ultimate end-result of this research a decade from now, chances are we’ll see much more evolutionary improvements between now and then in the form of smarter appliances, more capable virtual assistants and more immersive guided cooking experiences. And oh yeah: if you’re the type who wants to keep the robots out of the kitchen altogether, don’t worry. One of the biggest challenges with machine understanding of food is that the three-dimensional human comprehension of taste, smell and texture is extremely hard to replicate with machines. Add in the difficulty of AI to understand context and it makes me think that while we may eventually get to cooking robots, they may only be average cooks at best. The real artists, the chefs – whether home based are on TV – are probably safe from the robot invasion. Probably.

The EPIC-KITCHENS Project is Building a Foundation For Artificial Intelligence in the Kitchen

How An Obscure Academic Project May Have Just Started A Kitchen Robot Revolution

EPIC-KITCHENS

Footer