Amazon Rekognition can analyze objects and scenes

As everyone from Microsoft to Apple rolls out first-generation voice assistants for the home, the reigning voice control champ Amazon has turned its attention to a new area within the smart home AI universe: computer vision.

For evidence of this focus, one needs to look no further than two new products introduced by Amazon in the past month. While both the Echo Look and Show have the same built-in voice assistant power of Alexa as their predecessors, there is one big difference: both new entrants have cameras. And while Amazon hasn’t come out and said these two new devices are the beginning of a new strategic front in the AI-powered smart home, an examination of these products’ capabilities, recent efforts to bolster the AWS AI lineup and recent statements by Amazon CEO Jeff Bezos help to connect all the dots.

Rekognizing A Pattern

So why the sudden interest in putting cameras in the home?  My guess is it’s in part due to what has been a growing emphasis over the past year by Amazon on its own computer vision powered AI capabilities.

That growing interest became more evident a year ago with the acquisition of Orbeus, the company which provided the foundation for Amazon’s current computer vision service from AWS, RekognitionAccording to Richard Michael, former COO, Orbeus provided a “proprietary cloud based image analysis solution that makes sense of not just faces, but also scenes and objects”.

By last October, the company had relaunched the Rekognition service as part of its suite of AWS AI products. In a blog post, AWS Chief Evangelist Jeff Barr described how Rekognition could be used:

If you have a large collection of photos, you can tag and index them using Amazon Rekognition. Because Rekognition is a service, you can process millions of photos per day without having to worry about setting up, running, or scaling any infrastructure. You can implement visual search, tag-based browsing, and all sorts of interactive discovery models.

You can use Rekognition in several different authentication and security contexts. You can compare a face on a webcam to a badge photo before allowing an employee to enter a secure zone. You can perform visual surveillance, inspecting photos for objects or people of interest or concern.

You can build “smart” marketing billboards that collect demographic data about viewers.

While Amazon hasn’t come out and announced that Rekognition is being used to power the Echo Look, the company’s “fashion assistant,” the features of the Look tells me it most likely is. The device, which lets users take selfies and build a “style book” which the Look will then analyze to make recommendations, has a feature called Style Check:

Style Check keeps your look on point using advanced machine learning algorithms and advice from fashion specialists. Submit two photos for a second opinion on which outfit looks best on you based on fit, color, styling, and current trends. Over time, these decisions get smarter through your feedback and input from our team of experienced fashion specialists.

This is exactly what the Rekognition API does. By combining machine learning with computer vision, Rekognition is constantly learning, ultimately becoming better and better at analyzing images based on an ever-growing set of data based on those images. For the Echo Look, the end result is better recommendations. And while this is a fashion-centric use case that focused on color, style and fit, there’s no doubt that this technology can be used in a variety of use cases ranging of from home security to analyzing the contents of a refrigerator.

And what about the Echo Show? While Amazon doesn’t highlight the Show’s image recognition capabilities, my guess is that Amazon will give the Show Rekognition-powered computer vision over time to add enhanced functionality.

A “Horizontal Enabling Layer”

Recent comments from Amazon CEO Jeff Bezos helps one understand the company’s ongoing effort to push AI services beyond just Alexa. In a recent interview at the Internet Association gala, he shared his thoughts on AI (per GeekWire):

“Machine learning and AI is a horizontal enabling layer. It will empower and improve every business, every government organization, every philanthropy — basically, there’s no institution in the world that cannot be improved with machine learning. At Amazon, some of the things we’re doing are superficially obvious, and they’re interesting, and they’re cool. And you should pay attention. I’m thinking of things like Alexa and Echo, our voice assistant, I’m thinking about our autonomous Prime Air delivery drones. Those things use a tremendous amount of machine learning, machine vision systems, natural language understanding and a bunch of other techniques”.

“But those are kind of the showy ones. I would say, a lot of the value that we’re getting from machine learning is actually happening beneath the surface. It is things like improved search results. Improved product recommendations for customers. Improved forecasting for inventory management. Literally hundreds of other things beneath the surface.”

While Bezos points to the voice assistant tech in Alexa and Echo, he also gives a nod to machine vision. He describes all of these technologies as a “horizontal enabling layer.” What does he mean by this? In short, he is describing AI as a technology that is broadly applicable to almost every application, whether enterprise or consumer, and how the addition of which can add immense value to the end product.

With Alexa, Amazon was able to show, not tell, about that value of voice control. That is very powerful. I am sure they hope that, in a similar way, the Echo Look and Show can act as ambassadors for computer vision to the broader world. And while we may not witness the same kind of explosive adoption of Amazon powered computer vision AI as we did with Alexa,  in part because there are already a number of products that do basic image analysis using AI (such as Closeli) in the market, I do believe that Amazon can raise the awareness about how image recognition and detection AI enhance a variety of smart home and consumer use-cases.

Can Amazon Overcome The Creep Factor?

One last caveat: inward facing cameras in the home have plateaued in recent years, while outward facing security cameras like the Ring and Arlo have flown off shelves. The reason for this is people want to know what’s going on outside their home, but they don’t want people – including potential hackers – seeing what’s going on inside. With all the stories of security vulnerabilities, who can blame them?

While Amazon seems unbothered by this, it remains to be seen if their new interest in video AI will see any pushback from consumers.

Only time – and maybe Alexa – will tell.