By drawing inspiration from infants, scientists working at DeepMind have come up with a system that can begin to train itself to see and comprehend a spatial environment. The algorithm makes up a vital step for computer vision research. What’s more, this field is intended to give such tools the power to evaluate video and in a similar manner to humans.
This type of work currently being done at Google’s DeepMind depends mostly on offering a helping hand to machine learning systems. As such, this task usually involves teaching them using tens of thousands of labeled pictures. In turn, the DeepMind system can identify the ideal method to interpret a scene. The system, which referred as Generative Query Network, eliminates labels and concentrates on unsupervised learning.
How does a Computer Learn for itself?
In the Generative Query Network (GQN), two networks are involved in a cooperative game. In this case, the first network is involved in looking at an image in a bid to describe it in the most briefly. On the other hand, the second network uses the description from the first network in an attempt to forecast what the scene would appear like from a varying point of view. Furthermore, when an incorrect forecast is made, both networks can update themselves to minimize the possibility of making a similar error in the future.
The team at DeepMind tested the network not only on rooms that have multiple objects but also in maze-like surroundings. Ali Eslami, a DeepMind researcher, said when this game is played for long, both networks eventually function well. Michael Milford, Queensland University of Technology’s robotics professor also said the network boasts great potential. He also explained that computer vision research usually focuses on developing algorithms that can undertake specialized functions such as homecare robots or even self-driving vehicles.
Although a system that can learn on its own how to represent the world can be extra flexible, Dr. Milford asserted that the research has only been conducted in the laboratory and focused on simple 3D instances. Nevertheless, he was uncertain whether the system can survive in the real world since there are numerous factors present such as darkness, rain, dust or even humans running in front of the self-driving vehicles.
When is a Laptop considered a Laptop?
The Generative Query Network (GQN) avoids the use of labeled images to know more about the world. As such, it may not instantly know the names of various objects or even interpret them in a way humans can easily understand. This case brings out the need for babies.
According to Eslami, to understand this concept, you need to think about how an infant interacts with a laptop. At first, it may view the laptop as a set of boxes. However, after seeing it long enough, it may be able to figure out that the configuration of boxes happens often and might be a different concept. Here is where the parent comes in to explain that the object is called a laptop.
Researchers claim this type of learning about a given scene may not be adequately efficient. Nonetheless, it boasts the potential to build systems that understand the world better.