Why we regularly test Face recognition algorithms at NIST
Facial and object recognition technologies are rapidly evolving and becoming more sophisticated, with a wide range of potential applications in security, access control, marketing, and more. As these technologies become more widespread, it is important to ensure that they are accurate, reliable and fair.
One way to achieve this goal is for companies developing face recognition technologies to seek independent testing by the National Institute of Standards and Technology (NIST). NIST is a US government agency that develops and promotes standards for a wide range of technologies, including biometrics. NIST’s face recognition evaluations are rigorous and comprehensive, and algorithms that pass these tests are widely considered to be among the best in the world.
Here are some reasons for our participation in the tests:
- Identify areas of algorithm improvement. The NIST Vendor Benchmark provides detailed information about algorithm performance. This feedback helps us develop better algorithms and stay ahead of the curve in a rapidly evolving field.
- Demonstrate accuracy and reliability. The test is a rigorous and comprehensive benchmark that tests algorithms on a variety of factors, including accuracy, speed, and robustness under challenging conditions. Passing the vendor test demonstrates that the algorithm is among the best in the world and can be relied upon to perform accurately and reliably in live applications. Performance Transparency.
- Test results are public and openly available. Everyone can get to know the performance of algorithms from vendors from all over the world. Trust and responsibility. Recognition technologies are powerful tools, but they also raise concerns about privacy and the potential for abuse. By submitting their technologies for independent testing by NIST, companies demonstrate their commitment to developing and implementing these technologies in a responsible and ethical manner.
NIST’s Face Recognition Vendor Test (FRVT) has been the world’s leading independent evaluation of face recognition algorithms for a long time. The FRVT is conducted on a regular basis using a large and diverse dataset of images. The results of the FRVT are publicly available, which allows potential users of face recognition algorithms to compare the performance of different algorithms across a variety of scenarios.
How NIST's FRVT Benchmark Evolves
NIST continues to improve the FRVT test to keep pace with the latest advancements in facial recognition technology. In 2023, NIST split the FRVT into two components: FRTE (Face Recognition Technology Evaluation) and FATE (Face Analysis Technology Evaluation). The FRTE test is designed to evaluate the performance of facial recognition algorithms in a broader range of tasks and conditions than the previous FRVT test.
It is envisioned that FATE will include new tasks such as face segmentation, facial feature analysis, and age estimation. It also will include new complex environments such as face recognition in crowds and face recognition from a distance.
What a basic test consists of and how to read the results
Let’s understand what the vendor’s main test is, the results of which you can often find in reviews of facial recognition algorithms.
The FRVT uses a large and diverse dataset of images, including frontal and profile views, images with different lighting and pose variations, and images of people from different demographic groups. The algorithms are tested on their ability to accurately match faces in a variety of scenarios, such as:
- 1:1 Identification: Matching the face in the test image with the face in the gallery image. At that, there is exactly an image in the gallery that matches the test image.
- 1:N Identification: Searching a gallery of images to determine if a probe image contains a match to any of the images in the gallery.
- Watchlist matching (included in 1:N Identification): Searching a gallery of images for matches to a list of known suspects.
NIST also reports the accuracy of face recognition algorithms under a variety of challenging conditions, such as:
- Low-quality images: Images with poor resolution, noise, or compression.
- Pose variations: Images where the subject’s face is not facing directly at the camera.
- Lighting variations: Images with poor lighting conditions, such as overexposure or underexposure.
- Occlusions: Images where the subject’s face is partially obscured by objects such as glasses, hats, or masks.
The dataset includes images of people of different races, ethnicities, genders, and ages, as well as images taken in a variety of lighting and environmental conditions.
NIST uses two metrics to measure the accuracy of face recognition and matching algorithms:
- False Negative Identification Rate (FNIR) is the proportion of mated comparisons that the algorithm fails to match. In other words, it is the percentage of times that the algorithm incorrectly fails to identify a known person.
- False Positive Identification Rate (FPIR) is the proportion of impostor comparisons that the algorithm matches. In other words, it is the percentage of times that the algorithm incorrectly identifies a stranger as a known person.
NIST reports the FNMR and FMR of each algorithm over a range of thresholds. The threshold is the score that the algorithm must produce in order to return a match. A higher threshold will result in fewer false matches, but also more false non-matches.
In addition to accuracy, NIST also tests the speed and robustness of face recognition and matching algorithms. The speed of an algorithm is measured by the time it takes to process an image and return a match. The robustness of an algorithm is measured by its ability to perform accurately in challenging conditions, such as low light, poor image quality, and occlusions (e.g., sunglasses, hats, and masks).
So, let’s take a look at the testing process step by step.
- NIST collects a large and diverse dataset of images. The dataset includes images of people from different races, ethnicities, genders, and ages, as well as images taken in a variety of lighting and environmental conditions.
- For 1:1 tests, the collected data is split into many pairs of pictures so that there are similar and dissimilar pictures. For 1:N, the data is split into 3 parts. Gallery, where the matched people are. Mated probes, where there are other pictures of those people in Gallery. Non-mated probes with pictures of people who are not in the Gallery.
- NIST submits the gallery and probe images to the face recognition and matching algorithms being tested.
- NIST measures the performance of each algorithm by calculating the FNMR and FMR at a range of thresholds.
- NIST reports the results of the FRVT publicly.
Case: Verigram’s vendor test results
Our algorithm has received the highest NIST rank in the «1:N Identification» category based on a database of 12 million faces with results FPIR=0.001, FNIR=0.0115.
These results show that the algorithm achieved an FNIR of 0.0115 at an FPIR of 0.001 on the mugshots dataset, which contains 12 million faces. This means that the algorithm was able to find and to correctly identify a matched face in 98,85% of cases, while only making a false positive match in 0,1% of cases. This means that the algorithm is very likely to be able to identify a person’s face, even if they are wearing glasses, a hat, or a mask.
In simple words, this means that the algorithm is very good at identifying faces in a large database, even under challenging conditions. First place means that under these conditions our algorithm works better than those of other vendors.
It is important to note that the performance of face recognition algorithms can vary depending on the quality of the images, the size and diversity of the database, and the specific conditions under which the algorithm is used. However, the results of the NIST test show that the algorithm is very good at identifying faces in a large database, even under challenging conditions.
What else to consider when submitting an algorithm to NIST for testing
If your company plans to participate in the testing, we suggest considering some more important additions. Not all companies may be prepared for this testing, but for our organization, the collected data has been a vital reference for algorithmic enhancements.
Apart from the above-mentioned reasons, a few crucial factors should also be evaluated before submitting face and object recognition algorithms for impartial testing at NIST.
- Time. NIST is currently conducting a thorough evaluation that may take several months to finish.
- Publication of results. Everyone can see the test results. You may find that the algorithms of world-famous companies produce worse results than those of brave newcomers. A willingness to be transparent is an important consideration when deciding whether to participate in testing.
Verigram is preparing for a new submission. We will look forward to the results.