Startup CEOs lead busy lives. For example, finding our CEO on the three floors of the office space can be more arduous than ‘finding waldo. But as developers, our minds always follow the problem-solution approach.
We would like to know if we can use the face recognition module we wet our feet with daily to identify targeted persons over IP camera streams.
Face recognition technology has taken the world by storm recently, with its adoption increasing rapidly in various fields, including security, retail, B2B, and entertainment.
The advancement in deep learning has enabled the widespread use of Face Recognition technology. Managing your employees is a seamless experience with face recognition tools. You can save time that would otherwise be spent on attendance tracking, payroll management, etc.
This advanced guide will help you to understand various deep learning models and libraries used for face recognition and how to face recognition is enabled on a practical level.
Let’s get started.
How does facial recognition works?
A face recognition system works by identifying or verifying a person’s face in an image. It involves several steps, which can be organized into a pipeline, as shown in the example image.
- Face Detection—Face detection refers to the process of automatically identifying human faces within digital images or video frames
- Feature Extraction—Extracting the essential features from an image of the face.
- Face Classification—It is the process of categorizing a detected face into one or more predefined categories based on extracted features.
Various feature extraction and classification process are there. First, we will discuss MTCNN(Multi-Task cascaded Convolutional Neural Network), which is used for face detection.
MTCNN
The MultiTask Cascaded Convolutional Neural Network (MTCNN) is a state-of-the-art tool for detecting faces in images and videos. It uses a 3-stage neural network detector to locate and identify faces accurately. You can learn more about MTCNN in the linked research paper.
How does MTCNN work?
To detect faces of various sizes, the image is first resized multiple times. The P-network then scans the image, performing the initial detection. While it has a low threshold for detection, which leads to many false positives even after using Non-Maximum Suppression (NMS), this design is intentional. The regions identified by the P-network, which may contain many false positives, are input into the second network, the R-network.
As its name suggests, the R-network refines the detections by using NMS to obtain relatively precise bounding boxes. The O-network in MTCNN refines the bounding boxes for face detection and also has the optional feature of detecting facial landmarks such as eyes, nose, and mouth corners at a low cost. These facial landmarks can be useful for face alignment
Facenet
Google’s FaceNet is a computer program that can identify and verify faces on a large scale. It is based on a deep convolutional neural network, a type of artificial intelligence trained to recognize patterns in data.
FaceNet uses a unique training method called a triplet loss function to help distinguish between different faces. This means that when the program is shown two images of the same person, it will try to make the “vectors” (mathematical representations of the images) for those two images as similar as possible.
On the other hand, when two images of different people are shown, it will try to make the vectors for those images as dissimilar as possible. FaceNet is the foundation for several open-source face recognition systems, such as FaceNet with TensorFlow, Keras FaceNet, DeepFace, and OpenFace.
How does FaceNet work?
FaceNet is a machine learning model that takes an image of a person’s face as input and outputs a vector of 128 numbers. This vector, called an embedding, represents the most significant features of the face and contains all the essential information from the image. When using FaceNet, the goal is for the embeddings of similar faces to be similar as well.
One of the significant aspects of FaceNet is its loss function. It uses the triplet loss function. We need three images to calculate the triplet loss: anchor, positive and negative.
We want the distances between the embedding of our anchor image and the embeddings of our positive images to be lesser than the distances between the embedding of our anchor image and our negative images.
The Triplet loss function can be formally defined as follows-
- f(x) takes x as an input and returns a 128-dimensional vector w.
- i denotes i’th input.
- Subscript a indicates an Anchor image, p indicates a Positive image, and n indicates a Negative image.
FaceNet Learns in the following way
- Randomly selects an anchor image.
- Selects an image of the identical individual as the anchor image in a random manner.
- Randomly selects an image of a person different from the anchor image (negative example).
- Modify the parameters of the FaceNet network such that the positive example is positioned closer to the anchor than the negative one.
Softmax
To classify a new face, we calculate the distance between its embedding and the embeddings of known faces. Then, we use a classifier called Softmax to determine which known face the new face belongs to.
Softmax was a natural choice for us since the entire system is based on neural networks, but you could also use other classifiers such as SVM or Random Forest. As long as the face embeddings are high quality, any classifier should work well at this step.
Deep Face Library
DeepFace is a deep-learning facial recognition system developed by Facebook’s AI research team in 2014. It is a neural network-based approach that uses a 3D model to align facial features and a deep neural network to encode facial images into a high-dimensional feature vector. The deep Face model supports several face recognition models such as OpenFace, Google FaceNet, VGG-Face, Facebook DeepFace, ArcFace, DeepID, Dlib, and SFace.
The four functions, verify, find, and analyze, along with the stream, do all the functionalities of the face recognition module.
Verify function
The function determines whether face pairs belong to the same or different individuals. It expects exact image paths as inputs. And the function will return a dictionary, and you have to verify the value of the verified key. It will return true if the faces match; otherwise, it will return False.
Find Function
The DeepFace find function looks for the identity of the input images in the database path, similar to the one provided as the input image.
Analyze Function
DeepFace provides robust facial attribute analysis such as age, gender, facial expressions such as (fear, anger, happiness, and sadness) and race, including Asian, white, middle eastern, Indian, Latino, and black.
Stream Function
The stream function gives live streaming using our webcam. It applies both face recognition and facial attribute analysis.
Comparison of Face Recognition models in Real-time
We have tested the FaceNet model in TensorFlow, PyTorch, and the Deep Face library. Below are the results and conclusions we drew after rigorously testing the above models explained. We used the following criteria to test my models.
- Different angles of the face
- Different Lighting conditions
- Head Moving
- Frame rate achieved
- Detection among a group of people.
Models | Different Angles of Face | Diff lighting conditions | Head Moving | Detection among group | FPS Achieved |
Facenet Tensorflow | Some FPs are there | Depends on the dataset provided | FP is coming | False positives are coming | 6-8 FPS |
Facenet Pytorch | Getting with minimum FP | Depends on the dataset provided | Minimum no of FP. | Getting results up to 80 % accurately | 7-9FPS |
DeepFace | FP is coming | Not getting | Facing FP issues | Not detecting among a group | 2-3 FPS |
Conclusion
Face recognition technology has the potential to revolutionize a wide range of industries and applications. Whether used for security purposes, to improve the customer experience in retail settings, to manage your employees, or for entertainment, this technology can make our lives easier and more convenient.
While there will always be concerns about issues such as privacy and accuracy, the benefits of face recognition technology far outweigh the potential downsides, and we should embrace it as a powerful tool for the betterment of humanity.
If you would like to leverage the possibilities of Face recognition, get in touch with our experts today for a free consultation.