It was a box of Cheez-Its that made me think of the idea.
We were recently at Nvidia’s Robotics Seattle Robotics Lab, watching a presentation on how the company uses computer vision and synthetic data to train robots in the kitchen. In order for a robot to grab a box of Cheez-Its, that robot needs to know what a Cheez-Its box looks like. In order to teach the robot what a Cheez-Its box looks like, Nvidia needs give the robot detailed information about that box including size, shape, and artwork on the front.
This isn’t that hard if your kitchen, like the one at Nvidia, is stocked with only one box of Cheez-Its. You scan that box, input the dimensions and imagery so the robot can match what you’ve scanned to the real thing. But what happens when when you want the robot to find a box of something other than Cheez-Its? Or if the pantry has many different types of Cheez-Its, that kinda look similar but have different flavors? Or if it’s the holidays and the box has been altered from that first model so it now has a snowman on it?
Being able to identify particular products via computer vision isn’t just an issue for robotic hands. Startups like Grabango and Trigo Vision are retro-fitting grocery stores with lots of tiny cameras that use computer vision for cashierless checkout. These cameras need to precisely recognize the items that shoppers pick up and so that the consumer can be accurately charged. That means the AI powering the system needs to know the differences between a bottle of Coke, Diet Coke and Coke Zero, and be able to understand any changes to branding, like a new logo or seasonal updates.
Rather than having each robotics company and every cashierless checkout company separately create their own database of product images, it seems like having some sort of central repository of brand images would be useful. Think of it as a giant library of constantly updating brand images for all the products in a grocery store. CPG companies would upload 3D models of the latest versions of their products to this database, giving computer vision companies access the most up-to-date imagery for training their respective applications.
This is definitely not the most pressing issue facing CPG companies or retailers; cashierless checkout and product picking robots are still very much in the early stages. But they are coming — and preparing for their arrival now would make the evolution of computer vision and robotics that much faster. After all, training those systems is much easier when you can just download an image rather than creating it yourself.
During our visit I asked Dieter Fox, Senior Director of Robotics Research at NVIDIA, if there was such a system. He said there was for common objects, but not brand specific. ShapeNet has a 3D database of 50,000 common objects, and its subset, PartNet recently launched with a database of more than 26,000 objects broken down into their various parts.
There are competitive issues that might have CPG brands balking at the idea. Coke may not want people knowing about a particular branding change or partnership in advance. But the overall concept could be a tide that lifts all boats. It gives computer vision-related companies the most accurate 3D models of products for training purposes. The faster computer vision systems can be trained, the faster they can work in the real world without any hiccups, which would ideally allow brands to sell more products. It would also make it easier for kitchen robots, when they eventually arrive, to autonomously grab ingredients needed while cooking (“Robot, grab the turmeric.”).
This isn’t just for food, obviously. This type of repository could work for any brand across any sector that will involve computer vision. Perhaps it’s something Dieter Fox can talk about when he speaks at our upcoming Smart Kitchen Summit in October. Get your ticket now and maybe you can talk with him about it over a box of Cheez-Its.