Computerworld

Zensors app lets you crowdsource live camera monitoring

Workers in the cloud can train machine learning tools to take over monitoring tasks
Zensors is a mobile app that can use crowdsourcing and machine learning to automatically monitor an area of interest and send alerts to users. The graph at right shows that that the input question at left has been answered affirmatively.

Zensors is a mobile app that can use crowdsourcing and machine learning to automatically monitor an area of interest and send alerts to users. The graph at right shows that that the input question at left has been answered affirmatively.

If you feel like you need eyes in the back of your head, there's a crowdsourcing app for that.

Zensors is a smartphone application that can monitor an area of interest by using a camera, crowdsourced workers and artificial intelligence.

Developed by researchers from Carnegie Mellon University and University of Rochester, the idea behind Zensors is to use any camera in a fixed location to detect changes in what's being monitored -- for instance whether a pet's food bowl is empty -- and automatically notify users.

The developers say it's a cheap, accessible way to add sensors to the environment, part of the move toward building smart homes and smart cities.

The project, presented at the 2015 Computer-Human Interaction Conference (CHI) in Seoul this week, is based on simple user questions written in everyday language about the area being monitored.

For example, a question could be: is there a car in the parking space? The presence of a car would trigger a positive response in the alert to the user, which could be sent via email or text message.

The camera could be the image sensor in any mobile device, provided it has been set up to monitor something, or a webcam, security camera or any other connected camera. It will capture images at an interval set by the user.

Users first select a region of interest in the camera's view by circling it with a finger on a touchscreen -- that's intended to limit the surveillance and protect the privacy of people who might walk into part of the frame.

Next, a question is input in the Zensors app, and the job of monitoring the images is farmed out to the Internet. Redundant images in which nothing has changed are automatically ignored.

The people who do the initial monitoring could be staff at a call center or an outsourcing service such as Amazon's Mechanical Turk, which was used in the CMU study. When the monitors decide that the question has an affirmative answer, a graph in the app soon changes; it could also issue alerts to users.

Zensors gets interesting, however, when the process becomes automatic. After a certain period of human monitoring, machine learning algorithms in the software can learn when a certain condition has been met. For instance, they could learn to recognize that a pet's food bowl is empty.

To ensure the algorithms' accuracy, the system would be periodically checked by workers, which could take a more hands-on role if the area monitored has an unexpected change.

Computer vision tools can also be added to the data processing, allowing the system to perform tasks such as counting cars or people in a certain area.

In a demonstration, a smartphone running Zensors was placed face-up on a table. A question was keyed in: "Is there a hand?" After holding a hand over the phone's camera, the app's graph changed, showing that Mechanical Turk workers had answered from afar. The researchers blamed network latency for the fact that the answer took about 30 seconds.

With better responsiveness, Zensors could be used in a variety of business and home applications. A restaurant manager could use it to learn when customers' glasses need to be refilled, and security companies could use it for automatic monitoring.

"We are the first ones, as far as I know, to fuse the crowd with machine learning training and actually doing it," said Gierad Laput, a PhD student at Carnegie Mellon's Human-Computer Interaction Institute, who also showed off new smartphone interfaces at CHI.

The cost of human monitoring is 2 cents per image, according to the researchers. It costs about US$15 worth of human-vetted data to train the algorithms so they can take over.

By contrast, having a programmer write computer-vision software for a sensor that answers a basic yes or no question could take over a month and cost thousands of dollars.

"Natural-language processing, machine learning and computer vision are three of the hardest problems in computer science," said Chris Harrison, an assistant professor of human-computer interaction at CMU. "The crowd lets us basically bypass a lot of that. But we just let the crowd do the bootstrapping work and we still get the benefits of machine learning."

The researchers plan to keep improving the Zensors app, now in beta, and then release it to the public.

Tim Hornyak covers Japan and emerging technologies for The IDG News Service. Follow Tim on Twitter at @robotopia.