Malware seems to be one of the leading problems facing cybersecurity today. By transforming from ransomware to botnets, it appears to be flourishing.
This situation has left humans who are tasked with the role of defending our computers incapacitated. However, all is not lost, as most experts are now shifting their focus to more powerful potential problem-solvers like artificial intelligence (AI).
The only problem with the new move by cybersecurity experts is that machine-learning tools require a lot of data. In tasks involving natural language processing and computer vision, massive, open-source datasets exist to teach algorithms what to say, what a cat looks like or even the relationship between words.
On the contrary, such an open-source data has been lacking in the malware space until recently thanks to a cybersecurity firm called Endgame.
Endgame recently launched a groundbreaking artificial-driven tool called EMBER, which is a large, open-source data set. What’s more, EMBER boasts a collection of over a million representations of malicious and benevolent Windows-portable files, which entail a format where malware regularly hides.
Aside from creating the EMBER open-source data set, a team at the Endgame went ahead to release artificial intelligence (AI) software that can be trained on the particular data set.
This incredible move by the company is based on the idea that if artificial intelligence (AI) is to become a powerful weapon in the ongoing fight against malware, it requires knowing what to look for before everything else.
Although almost all security companies have large volumes of potential data that can be used in training their algorithms, it seems to be a case of mixed fortune.
Those individuals or organizations involved in making malware are continuously modifying their codes in an attempt to remain a step ahead of detection tools. Hence, using the available data held by security firms to train algorithms on malware samples could be a futile exercise. The data could be obsolete.
According to Charles Nicholas, a computer science professor at the Baltimore County-based University of Maryland, the whole exercise is similar to a game of whack-a-mole. Nevertheless, EMBER is intended to assist automated cybersecurity tools or programs to keep up with the malware upgrades in a bid to make the detection process seamless and successful.
Other than a collection of actual files that could infect the entire computer with malware when being used by any researcher, EMBER has some kind of avatar for every file.
This acts as a digital representation that provides an algorithm with a clue of the characteristics related to malicious or benign files without revealing it to the original article.
As such EMBER could assist those in the cybersecurity space in quickly training and testing out additional algorithms, leading to the creation of more improved adaptable malware-hunting AI tools.
Hyrum Anderson, the technical director of data science at Endgame, also acknowledged the problem associated with making the dataset open for all people. It is risky since malware creators could utilize the data in designing undetectable systems. However, he said that he hopes the advantages of EMBER outweigh the threats involved.