From the post in 2018:
The ultimate goal behind EPIC-KITCHENS is to create an open dataset about kitchen-centric objects, behavior, and interactions upon which researchers across the world can then focus their deep-learning algorithms on in the hope of advancing artificial intelligence in the kitchen.
Since those early days, the project has continued to progress, recently releasing a newly expanded dataset and publishing the results from the second annual challenge. The first research challenge, completed in 2019, was focused researchers building models that can recognize actions in the kitchen. The recently completed challenge focused on action anticipation, where they asked researchers to predict what action would take place after one second of video.
Researchers who competed in the most research challenge include teams from a variety of universities spanning the globe from Cambridge to Georgia to Singapore as well as some corporate research labs such as the AI team from Facebook.
I recently caught up with the resesarch lead for EPIC-KITCHENS, Dr. Dima Damen from the University of Bristol in the United Kingdom, who told me that the various research teams competing used a variety of approaches to help make their systems better at recognizing and predicting actions based on the information from the video.
“There are some people who’ve used audio,” said Damen. “So they’ve used the audio from the video to identify something like opening the tap versus closing the tap. Traditionally, computer vision has relied on just images without like videos without sound.”
“There are some people who looked on a very big set of things, at what happened the past minute, because that’s helping them. And there are people who said, ‘no, I’ll focus on the objects, like where the hand is, where the object is, that’s a better approach.'”
For the next set of challenges, the group is providing a newly expanded set of data and asking them to focus on things such as “test of time”, where they ask if models trained two years ago still perform well and “scalabilty,” where they will have researchers look at whether more data is better.
Part of the expanded data will be a newly broadened dataset called EPIC-KITCHEN-100, where new footage brings the total number of hours of video captured to 100. According to Damen, the new video is from a cohort that included participants from both the previous study (half of the original 32 participants agreed to participate again) and 8 new participants.
According to Damen, by bringing back past participants, it will allow the computer models to better understand kitchen behavior by factoring in what happens with the passage of time, like in real life, but also better understanding how small changes can impact the results.
“It’s the natural progression, like how life will be,” said Damen. “The question is what happens to computer vision in the meanwhile? So it’s tiny tiny changes, right? It’s a slightly new camera, people might have moved home, and then we’re asking more questions that we believe would be interest to the community.”
Damen said she hopes that her technology can help build better technology and systems that could be of help to humans who need assistance.
“So there are new questions that are being asked which, interestingly, even the assistive technology community is not talking about. As in, if you want to help someone, sometimes you can guess what they’re doing, but many times you can’t.”
Spoon Plus Subscribers can read the full transcript of our conversation and watch my video interview with Daman below.