Computer Vision at The Edge: Where it’s Headed in 2019

essidsolutions

2018 brought great advancements to computer vision but roadblocks exist with the increase in complexity and processing requirements. So, what’s ahead for this technology in 2019? The answer is at the edge.

2018 has seen great advances in computer vision capabilities. The accuracy of object detection and facial recognition continues to improve, and the number of readily available options based on state-of-the-art deep learning technologies including convolutional and recurrent neural networks continues to increase. The improvements come at a cost – an increase in the complexity and processing requirements of the technologies.

YOLOV3 for example, a popular object recognition model, has a 106 layer fully convolutional underlying architecture, more than doubling from the previous version. Other models, such as RetinaNet and SSD variants are also showing huge strides in accuracy, but again, at the cost of increased complexity and reduced performance.

Keeping Up with New Demands

While the complexity and computational requirements of advanced computer vision technology increase, there is a demand for applying these technologies against growing numbers of high-resolution live video streams. The number of video surveillance cameras is increasing at a dramatic rate, along with the expectation that they provide proactive intelligence. A passive video system is no longer enough. Cameras, quite simply, need to be a lot smarter.

The reality of rolling out advanced machine learning technologies requires a new way of thinking about implementations. Streaming full-resolution video to the Cloud for processing is prohibitively expensive, needs too much bandwidth and introduces high latency. Putting large numbers of high-powered servers on-site has its own sets of issues, requiring large amounts of precious space and power, and can be cost-prohibitive when trying to roll out across large numbers of cameras.

It also does not address the realities of dealing with multi-location environments that become increasingly important to make use of the data. Processing live video from 1 or 2 cameras is one thing. Processing video from hundreds of cameras in real-time across one or more locations, often with limited available resources, requires us to think entirely differently.

The Solution: Video at the Edge

The answer lies at the edge. Putting the intelligence at the edge allows the workload to be distributed across many devices. This can mean either embedding stronger processing capabilities into the camera itself or adding highly efficient edge appliances that sit between cameras and the cloud.

To enable this, edge processing companies are beginning to release fast, power-efficient, and specialized AI processors. Nvidia has launched several modules in their Jetson series for performing real-time inference in embedded devices and Intel, through its acquisition of Movidius, offers their Myriad series processors and neural compute stick. The last few years have also seen a huge amount of investor funding going to a new generation of chip companies offering low-cost, high-performance deep learning processing capabilities.

Companies such as Mythic, Graphcore, and others have received 100’s of millions of dollars in venture funding. Recently even Google and Amazon announced their own edge processing chips. This is an amazing acknowledgment of the importance of processing machine learning at the edge by two pure-play cloud companies.

What’s to Come

Edge-based processing will enable an entirely new kind of real-time intelligence. What are passive video recorders today will soon watch over kids at the swimming pool, detect weapons near a school, or open doors for employees without a key.

They will look for defects in manufacturing lines, find workers who aren’t donning safety equipment, and learn how people move around in a retail environment to optimize flow and reduce wait times. Cameras will finally provide real-time actionable data. We will see huge improvements in our ability to increase security, manufacturing reliability, in-store shopper satisfaction and safety.