Deep Learning, or When is a Dog a Bear?
Deep learning is a disruptive technology that will vastly improve our video products
Occasionally a technology appears on the scene that radically alters the evolution of other technologies for years to come. We saw this with the introduction of the Internet and web browsers, and with multicore processors and graphics processing units (GPUs). Deep learning is exactly that kind of revolutionary technology.
Stemming from advances in neural networks, deep learning is a branch of artificial intelligence that mathematically recognizes patterns. In the case of LenelS2’s VRx and Magic Monitor products, the data of interest is raw video streaming from cameras, but the same techniques can be used for data of almost any type.
In version 5.2 of VRx, our interest is focused on identifying objects in video, but identifying people based on photo ID images could use the same technology. Identifying a sound from an audio recording can work the same way. Because the identification is statistical, it can move beyond more concrete ideas like, “this is a person” toward less concrete ones like, “this is an aggressive person” with ease.
The knowledge that deep learning algorithms depend on actually starts in human brains. In the case of object identification in video, for example, the algorithms are “trained” by mathematically examining a large number of sample images that contain the object of interest and where that object has been identified by a human trainer.
In the case of the training dataset we used to make Magic Monitor’s object recognition in version 7.1, thousands of pictures containing people, animals, vehicles and other random objects were utilized to build the dataset used by the detection algorithm. This is the real learning process, and it involves literally days of constant computation by high capacity computers. Once created, the training dataset can be used by object detection algorithms based on neural network technology to identify objects very quickly.
To be clear, though, object identification is not absolute. VRx can, among other things, identify cars, trucks, dogs and bears. It’s not always completely clear whether a car is a car or a small truck. The object detection algorithms produce a measure of the likelihood that a classification is correct, not an absolute statement of what the object is. For example, one of our software developers owns an Alaskan malamute, which is a very large, very furry dog. Our object detection algorithms occasionally misclassify Niko (the malamute) as a bear even though no human would ever make that mistake.
We could address the misidentification problem by adding more images of bears and malamutes to the training process. The point is clear, though: our object identifications are statistical estimates, not absolutes. And the principle is clear as well: if we can tell a car from a truck or a bus from a van, with appropriate training we should be able to tell a “happy” person from an “angry” one.
You’re already seeing deep learning in action in autonomous driving and facial recognition, and you’ll continue to see its uses expand at a rapid rate. In LenelS2 products, our latest video products are capable of object identification, but we’re evaluating scene and behavior identification as well. Over time, features derived from deep learning will show up in our access control products as well.
Make no mistake: deep learning is going to be at the heart of product innovation in many areas for years to come.