With the ongoing buzz surrounding artificial intelligence, Bing’s team of researchers recently created a new way of producing high-quality data to be used in training machine learning models. In fact, they described a system with the power to consistently discriminate between poorly labeled data and accurately labeled data in a paper and blog post that was published before the Computer Vision and Pattern Recognition Conference (CVPR), to be held in Salt Lake City.
Based on the published information, the researchers said that obtaining adequate high-quality data for training AI is usually the challenging bit as far as creating an AI-based service is concerned. Normally, the data labeled by human beings is mostly of high quality, which means that it has few errors. However, such data comes at a high cost in terms of both time and money. Alternatively, automatic techniques promote a less costly generation of data in massive quantities but lead to more labeling errors or what the researchers refer to as label noise.
According to the Bing team training, algorithms necessitates the collection of hundreds of thousands or millions of data samples before sorting them out into different categories, which is a strenuous activity when data scientists execute it manually. One of the most used shortcuts entails obtaining data from search engines by combining a list of categories, carrying out a web search for each item in the list and gathering the results. For instance, when developing a corpus for a computer vision algorithm, which can be used in distinguishing between different types of food, you can easily perform an image search for sushi.
The researchers also insisted that not each result applies to the searched-for category, and errors in training data can cause inaccuracies and biases in the machine learning model. One of the main ways of alleviating the mislabeling issue is through training a second algorithm that not only identifies mismatches but also corrects them. Nonetheless, this solution is processing-intensive, which means that each model has to be trained for a specific category.
Bing’s team of researchers applies an artificial intelligence (AI) model that can correct mistakes in real time. In the course of training, one section of the system, the class embedding vector, learns how to choose images that best represent the individual categories automatically. On the other hand, another section of the model learns how to embed the sample images in the same vector, hence the name query embedding vector.
With the ongoing training process, the system is developed in such a manner that the query image vector and class embedding vector become more similar to each other, especially if the image is part of the category or even further apart if it is not a member of the category. The team added that the system finally recognizes patterns that it utilizes to locate highly representative images for every group. Furthermore, it works best even without human-approved labels.
The approach described by the researchers is already proving to be highly effective, especially in generating high-quality training data for image-based activities. As such, the Bing team hopes the same can be applied to speech, text or even video.