What is Computer Vision?
Computer Vision, also known as CV is a field of Computer Science. The main goal of computer vision is to give computers the ability to have a high-level understanding of digital images and videos. In short, it is a field of computer science that seeks to develop techniques, and tools to help computers “SEE” and understand the content of digital images and videos. Computer Vision focuses on the implementation of the complex human vision system inside computers making them perform intelligent tasks as humans do.
Photos and videos are becoming an integral part of our lives. There are over 4.4 Billion internet users and growing every day. In 2019, the number of videos watched on YouTube every minute was 4,333,560, and around 300 hours of videos were uploaded on YouTube every minute. Skype users make around 176k calls. All of this to say, that there are a lot of other sources on the internet where graphical data i.e images and videos are being uploaded consistently. YouTube might be the 2nd largest search engine after Google, where hours of videos are being uploaded every minute. It is easy to index and search the text, but in order to index and search the images, algorithms need to read beyond an image. So, in order to get the most out of images and videos, and to provide the best services to users based on this data, computers need to understand the images and actually “see” inside them.
How does a Computer see an image?
Since vision is a task that is effortless for humans, i.e if you see an object, you can easily recognize it, or if you see a danger sign, you can easily sense it and plan accordingly. But if you ask the computer to perform the same task, it is quite challenging for computers (Ballard et al., 1983). To a computer, an image is just an array of numbers just like somewhat shown in the figure below.
Where each number represents the value of the intensity of the pixel at that point that is between 0 and 255 where 0 represents black and 255 represents white.
For a black and white image, this is just a single-dimensional array. But when you are dealing with color images, they are not just single-dimensional arrays, in fact, they are a combination of 3D arrays where 1st Dimension represents Red, 2nd represents Green, and 3rd represents the intensity of the Blue color.
Together these 3 colors combine to represent any color. For example, the color dark purple consists of 18.82% red, 9.8% green, and 20.39% blue and has an RGB of 48, 25, and 52 respectively.
This image from the Stanford CS231n will help you understand how a computer sees an image better.
Numbers themselves are lifeless. They do not carry any meaning to them.
“Just like to hear is just not to same as to listen, to take pictures is not the same as to see” ~ Fei Fei Le (Director of AI laboratory at Stanford).
And this is where computer vision comes into play. With the latest technologies, like Machine Learning, Deep Learning, Parallel Computing, GPUs, TPUs, the latest Algorithms, Computer Vision is able to bridge the gap between computers seeing and computers comprehending what they see.
Previously when there were not many advancements, mostly traditional algorithms, and techniques were used in Computer Vision which would make it a pretty difficult field. But now due to advancements in Machine Learning and Deep Learning algorithms, and good hardware support to execute those algorithms, this field is gaining in popularity.
How does Computer Vision work?
Let’s take an example where you meet a new person X, and you both get to know each other. Your brain has activated the neurons and stores the vital information about X such as his name, and how he looks, his voice, etc in your mind. Whenever you see that person next time, you will recognize him with his looks, or you will recognize him with his name, or by his voice. How is it done? This happens because your mind has stored the information, and when you see him next time, the neurons in your brain activate and retrieve the person’s name from your mind after matching the person’s face you see from your memory. This concept is known as feature matching.
A deep learning architecture works in a very similar way. As soon as the machine sees a person, neurons are applied to the given input image to extract the features from that image, and then the model is trained. As you have already seen, that image is nothing but a matrix of pixels, so our model will learn patterns from those pixels and you can then perform further operations on it.
Here you can see how a deep learning model can extract the features from a single image after which it can classify between several different classes.
You can understand the working of a computer vision system by the following image.
Source: Manning Free Content
There are different kinds of computer vision tasks such as detection, tracking, segmentation, and many others. Most of them are done using Deep Learning and Machine Learning techniques along with some traditional algorithms.
Some of the examples of the tasks discussed above can be seen in the figure below.
Image Source: CS231n
Real-life Computer Vision applications
Computer Vision has become an important part of society, with applications in almost every industry and field of life. For example, computer vision is present in medicine, drones, automobiles, retail, call-centers, and many other industries.
Some of the famous applications, that widely use Computer Vision are
- Autonomous Vehicles
- Detection of different diseases such as cancer
- Optical Character Recognition
- Attendance system using Facial Detection
And many more.
This section will go over various computer vision applications, from over 15 different industry use cases.
Computer Vision applications in Autonomous Vehicles
Creating an autonomous vehicle is not an easy task. It requires deep learning, or a machine learning model to calculate a lot of things. Some of the core parts of self-driving cars, where Computer Vision plays an important role are:
Lane Tracking is a vital component of an autonomous vehicle to decide which lane it should stay in, and not move randomly on the road.
Lane Tracking can be achieved using different deep learning techniques, or traditional computer vision algorithms. Lane detection helps a car stay on its track and this significantly reduces the risks of road accidents and enables a smooth driving experience for an autonomous vehicle.
Vehicle detection is also another important part of an autonomous vehicle that can be achieved using Computer Vision or Sonar. There are many object detection algorithms that use deep learning or machine learning to help in detecting that specific object. Some of the most famous algorithms include YOLO, RCN, and SSD.
Traffic Signs Detection
Detection of traffic signs is an important task that is achieved using Computer Vision and Deep Learning. Imagine an autonomous vehicle not stopping at a red light, or overspeeding at a school zone. Hence it is vital to detect these signs and act accordingly.
Source: Tsinghua University
Imagine a self-driving car or bike hits a pedestrian. This would raise a big question mark on the road safety of autonomous vehicles. In order to protect pedestrians, and to make wise moves, autonomous vehicles need to detect humans first. This is where computer vision comes into play and detects pedestrians and helps autonomous vehicles make wise decisions, thus increasing road safety.
Source: Carnegie Mellon University
This is the brain of a self-driving car, which tells a self-driving car where it can drive and safely plan the path ahead of it. This can be done using different algorithms such as the PathNet algorithm.
Some of the famous startups, products using Computer Vision for Autonomous Vehicles are
- Tesla, famous for self-driving cars
- Baidu, famous for self-driving cars, and taxis
- Yamaha for self-driving bikes
- MIT’s self-driving tricycle
Computer Vision in Sports Industry
In the sports industry, computer vision was not famous many years ago, but today due to advancements in algorithms, and computational efficiencies, computer vision is making its way into the industry.
Imagine an analyst spending hours manually replaying the footage and collecting events. This is where Computer Vision comes into play and offers several techniques to gather data and obtain valuable analysis using computer vision and deep learning that can locate and segment each player of interest and following them over the duration of the video.
There are many applications of computer vision in the sports industry, some of which are discussed below.
Player tracking involves the detection of the position of the player at a given moment in time. It is an important trick that allows coaches to analyze and track the movements and the way of the movements of players.
Pose Estimation is a famous technique where the deep learning model learns to track the pose of a body in real-time. Lead developers have made several applications such as commentary generation using pose estimation which can take in a pose and generate a commentary on it in real-time.
Just think how many times, while watching a sport, you disagree with the referee’s decision. Well, AI referee’s can resolve this issue by analyzing the match and giving pretty accurate solutions using computer vision and deep learning.
Automatic scene prediction, and pose estimation technology using Deep Learning and Computer vision to predict, and analyze a match.
Some of the famous startups
- Smart Gym
- Otari, an interactive workout mat
- IBM and Wimbledon highlights and replay generator for Tennis
- IN/OUT Tennis Referee
- PitchBrain, a football match analytics generator
Computer Vision in Agriculture
Drone Based Crop Analysis
Drone-based crop analysis is a technique where farmers use drones, having a camera to analyze their crops. This can help them detect diseases, low water areas, crops that need more attention, and many more. Drones can fly over a long distance in less time, and thus can use computer vision to perform many different tasks regarding crops.
This can also be used by Security agencies to detect illegal crops.
Automate the counting, weighing, and tracking of the animals
Computer Vision enables us to automate the counting, tracking, and weighing process of the animals, which can ensure us their traceability, health, growth, safety, hence saving us from a lot of manual work.
Ensure Health Safety
Computer Vision, and Machine Learning can help in the detection of several different crop diseases, enabling us to take action on time and save a lot of work and money, and ensuring health and safety standard measures.
Source: Bitrefine Group
Automated Spraying System
This computer vision application will automatically detect pests and other diseases and automatically release pesticide spray, hence reducing a lot of manual work and increasing production yield.
- XSUN: Provides aerial survey imaging
- SenseFly: Provides drone based multiple solutions for agriculture
- Cromai: Attaining farms and crops diagnostic data
Computer Vision in Security
Computer Vision is now playing a vital role in security and safety agencies. Some of its famous applications are:
Facial Recognition and Authentication is an important security application where computer vision can detect someone’s face, and match it with a database of persons under warrant.
Fake News Detection
Fake news is a big cause for restlessness in society. It can cause chaos, and sometimes even lead to violence. Fake news with Deep Fakes are very common examples. Computer Vision and deep learning can help in detecting these deep fakes and removing false news.
Let’s take some examples of COVID SOPs, which are crucial these days for public health. Computer Vision comes into play here and can help in maintaining social distancing, mask usage, and much more.
CCTV Cameras tracking unusual activities
CCTV cameras, combined with Deep Learning and Computer Vision, can help us detect unusual activities such as theft, robbery, harassment and other harmful activities such as fighting.
Famous Industry Startups:
- Landing AI Social Distancing Tool by Andrew Ng
- OARO verifying digital authenticity against Deep Fakes
- VAAK AI Theft Detection System
- IRIS Face Mask Detection
Computer Vision in HealthCare
Since the rise of Deep Learning and Machine Learning, the field of health care is receiving many advancements. Some of the applications include
Breast Cancer Detection
Every 1 in 9 women is at risk of breast cancer in Pakistan, and about 1 in 39 women fatality. Getting diagnosed on time can play a vital role in reducing risk. Machine Learning and Computer Vision play an important part here in detecting breast cancers well on time.
Source: New York Times
Measuring Blood Loss accurately
One of the biggest causes of mortality in childbirth is due to postpartum hemorrhaging. This happens mainly due to excessive blood loss. Using Computer Vision, doctors can accurately measure how much blood has been lost during the birth process and hence treat women more properly.
More Precise Diagnosis
Modern deep learning algorithms and lots of data have minimized false positive errors. Diagnosis is more precise, and this can reduce the number of redundant surgical processes.
Interactive Medical Imaging
Computer Vision for medical imaging allows 3D visualization in a nice, interactive, and detailed way. Now, deep learning and computer vision can be used to perform a visual analysis of interactive 3D models to make more accurate medical diagnoses.
Automatic generation of medical reports
Extensive use of medical imaging data has enabled computer vision and deep learning to generate accurate and precise reports based on medical imaging for example detecting lung disease from X-Ray imaging. Feeding data from MRIs, X-Rays, CT scans, and other sources to algorithms will automatically generate reports and extract in-depth insights.
Famous Startups and Hospitals
- Winne Palmer Hospital Using AI to detect different disease
- Oxipit: Automated Report Generation
- ADAS 3D: Medical Imaging
Computer Vision in Customer Care
Computer Vision can be used in creating Virtual Assistants for call centers and customer agents. Computer Vision can enable gradual automation towards full customer self-service using device recognition and augmentation.
Object Recognition in a Technical Support Model
Computer Vision can help detect technical devices and parts, including the model details, device parts, and differentiate between devices of different models. It can also detect faults, and anomalies using Computer Vision, Deep Learning, and Anomaly detection techniques. Computer Vision could identify the product model from the image and pull up warranty information, troubleshooting steps or repair guides.
Via a smartphone, the customer can indicate the faulty device, and a virtual assistant can use computer vision and other tools to detect faults and interact with the user in real-time to help solve his problem.
Computer Vision can use Facial Recognition to authenticate the customers, hence increasing the security and customer experience.
Automatic Data Filling
Computer Vision can add a lot of data automatically based on visual features such as customers facial profile, cars he owns, devices, bills, etc. It can also help predict issues before they even happen, allowing a customer care team to avoid dissatisfaction.
- SMART VISION
Source: : ReadWrite.com
Computer Vision in Education Industry
There are many ways computer vision is helping in the education industry. Especially during the lockdown situation in COVID-19, everyone in the institute has been looking for new, effective, and engaging ways to teach the students. Some of the applications are:
Distance Learning has reduced the engagement and attention of students by a significant rate. In a physical class, the teacher can easily check on disengaged students and can do something to re engage them. But in an online class, it is very difficult for a teacher to manually engage every student. Engagement detection can solve this problem in online classes, where Computer Vision and Deep Learning can detect less engaged students, and notify the teacher so that the teacher can engage the whole class.
Improvement in cameras and detection algorithms have resulted in great attendance systems, where automatic attendance of students is marked using CCTV cameras, or other cameras installed in schools. These systems can automatically recognize the student, mark their attendance, and note their arrival and departure time.
Security in Educational Institutes
Security is a big concern in educational institutes. A computer vision model can be trained to detect any unlawful activity that is not permissible inside the school premises to avoid any mishap.
Fight Detection using Deep Learning
Cheating Detection in Exams
Computer Vision and Deep Learning can be used to train an AI model that can detect if students are looking at someone else’s paper or using some cheating material.
Computer Vision Lab: Michigan State University
Some of the startups and companies using Computer Vision for Educational purposes are:
- ST Unitas (Princeton Review) enhanced tutoring with AI
- Emotuit: Uses Computer Vision to detect engagement
- Mettl: Online Cheating free exams
- Respondus Monitor: Online Cheating free exams
- Several High Schools and Universities are using Cameras for automated attendance in China.
Computer Vision in Fashion
Computer Vision is transforming the fashion industry by introducing new techniques that are helping businesses and users. We are going to look at some of the applications in which computer vision is transforming the fashion industry.
Think of a scenario where you saw a handbag on Instagram, and want to buy it for yourself, but can not find where to get it. You can simply use that photo for an intelligent search using computer vision, and machine learning.
New Fashion Generation
Tired of old fashion? AI can help you generate new fashion using different deep learning algorithms such as Neural Style Transfer, where we combine the context of 1 image, and the style of the other image to generate a new image.
Neural Style Transfer for Fashion generation
You can also use GANs to generate new fashion which did not exist before.
DCGAN on FashionGEN PyTorch
Recommender Systems are an important part of machine learning. Computer Vision and Recommender systems together can give good recommendations based on your previous good pictures and fashion sense.
Famous companies and startups using Computer Vision in Fashion
- Alibaba FashionAI
- Alexa Fashion and clothing Recommendation
Computer Vision in Retail Industry
Computer Vision is a growing field and making its impact in the retail industry through its powerful wide-range applications. Some of the important applications are
Cashierless stores are getting very famous where computer vision and deep learning can be used to detect the prices and calculate the bill of items a customer picks, automatically.
Stock visibility can be defined as awareness of what actually is happening at the store. The computer vision system is able to see all kinds of fraud, theft, and suspicious attempts done by any customer, which results in minimum theft loss and higher sales.
The camera system is also able to record areas with low stock and can alert the store to get a refill on those items.
Many companies use different techniques to collect and gather customer data. Using computer vision techniques like facial recognition, object detection, object tracking, Deep Learning models can learn and identify several patterns. It can also help in finding which products are high in demand and which are eye catching products. It can also help in drawing heat maps for stores using which retailers can redesign their store layout to enhance customer attraction.
Startups and Companies using Computer Vision in Retail
- Amazon GO, cashierless stores
- Scandit, computer vision solutions for retail
- Eyedo, computer vision store management
Computer Vision in Manufacturing
Some of the common used applications of Computer Vision in Manufacturing industry are as follows:
Predictive Maintenance is the process of using machine learning, Computer Vision, and IoT devices to monitor data on machinery and components, to collect data points and identify signals or take corrective actions before assets or components break down.
3D Vision Inspection
Suppose a component passes through the manufacturing plant, different scans of that component are taken on different angles to produce a 3D model. This can allow the system to identify if there are any faulty components which can have disastrous effects later down the production line. It is mainly used in automobiles.
Imagine a pharmaceutical company, which has to count the number of tablets and capsules before packaging them, or a fruits company has to count number of fruits in their package, and make sure no other thing gets packed in with it. Computer Vision comes into play here and provides package inspection which can automate the process.
Labeling, tracking, and tracing
Many times, a product can be mislabeled, or misplaced resulting in customer dissatisfaction and business loss. Computer Vision can help in labeling, tracking, and tracing the object.
- ACQUIRE AUTOMATION
Computer Vision in Food Industry
Food industry is one of the biggest industries with an estimated worth of 6000 billion dollars in 2021. Computer Vision is making a great impact on the food industry. Some of the important applications are as follows.
Computer Vision and Machine Learning can help in automating the inspection of the quality of the food. This can help in reducing the risk of eating unhealthy food.
Food Product Safety
With a large number of operations from production, to processing, to packaging, and distribution, Computer Vision can ensure the safety and quality of different products at multiple stages. This hastens the speed and minimizes the manual work.
Automated Counting and Sorting
Automated counting and sorting can be achieved using computer vision and machine learning. This will automatically count the products, and sort them based on color, type, size etc.
Computer Vision in Restaurants
Going out to enjoy a meal with friends and family is a luxury that almost everyone of us enjoy. With so many restaurants out there, Computer Vision is making an impact on restaurants bringing modern technology to restaurants. Some of the ways in which computer vision is making impact are as follows.
By implementing a cashierless payment system that uses computer vision, restaurants can save a lot of labour cost, secure and fasten the payment process. Cashierless environments allow the customer to quickly and easily order on their own terms, making changes to their orders without any fuss.
Reducing Food Waste
Every year, around 22-33 million pounds of food get wasted every year, which has a detrimental effect on the environment. Common problems that cause food waste in restaurants are over-preparing, over-purchasing, food spoilage etc.
Keeping track of dates, storage, frequencies and ordering is an easy task to cover using Machine Learning and Computer vision.
By overseeing all steps a computer vision powered system can provide analytics that can be used to increase efficiency and customer satisfaction. Using Computer Vision, owners and customers can understand their customer better and get deeper insights.
Famous Startups using Computer Vision for Restaurants
- MeldCX viana