Pictures contain a thousand words, and also millions of pixels. The human brain can process visual content with ease, but has traditionally been incredibly hard for computers to do. With the immense amount of visual information captured by cameras in the modern age, it is becoming increasingly unfeasibly for humans alone to process it all.
Thankfully this all changed in 2012 when deep learning revolutionised the way computers understood visual content. Now we can accurately do all of the following at scale:
- Detect and classify people or objects in an images or videos
- Accurately track movements of people or objects
- Read text in the wild
More importantly, we can now build incredibly accurate content detection models with not a lot of training data, thanks to advancements in transfer learning. This makes it very easy to leverage computer vision for highly custom applications, whether it be in transport, retail, sporting, or agriculture.