Screenwriters deprived of vast resources and huge budget allocations associated with leading film studios may soon have a reason to smile.
With the new algorithm, video creation can be achieved by using a short script.
Although the new films are not any close to Oscar quality, the technique used to make them has the potential of being used away from the entertainment business.
For instance, the technology could come in handy in aiding witnesses in reconstructing crimes or car accidents.
Currently, artificial intelligence has the potential of providing labels and recognizing the content of images.
With the so-called generative algorithms, producing images from labels or brain scans can be done.
In fact, some of these systems can use one movie frame to predict the other frames. Despite such achievements, producing an image using text has not been achieved before.
According to Tinne Tuytelaars, a computer scientist at the Belgium-based Katholieke Universiteit Leuven, the current progress represents the first text to video with such impressive results.
Since the new algorithm represents some kind of machine learning, it needs training. It entails a neural network of small computing aspects that process data in a manner that is similar to the human brain’s neurons.
During the training process, software evaluates its performance upon each attempt, and the results are then circulated through the numerous network connections to hone computations in the future.
The network used by the new algorithm works in two stages, which imitate how human beings create art. In the first stage, it uses the test to come up with a clue of the video. On the other hand, it combines both the gist and text to make a short video.
A second network operates as a discriminator during training. It sees the video alongside a real one on the same concept.
In this case, the system is trained to select the real video, and as it improves, it develops into a harsher critic. Feedback from the second network is intended to set a higher bar for the first or generator network.
Researchers involved in the process trained the new algorithm on ten types of scenes, which comprised of kitesurfing on the sea and playing golf on grass.
In turn, the system roughly reproduced the scenes, which appeared in grainy VHS footage. The team found out that kitesurfing and sailing were most of the times mistaken for one another.
According to Yitong Li, a Duke University-based computer scientist, the videos are the size of a normal US postage stamp, 32 frames long and can last nearly 1 second.
He added that larger videos interfere with accuracy.
Furthermore, Yitong Li hinted that the next step for the videos would be utilizing human skeletal models to enhance movements since human beings often look like distorted figures.
Thanks to the algorithm, Tuytelaars predicts better compression, if a movie can be stored as a concise description.
In addition, the technology could be utilized in producing training data intended for other machine learning algorithms.
Although we are still far from an AI-produced Hollywood-worthy blockbuster, we now know how nonsensical actions like kitesurfing on grass look.