Artificial intelligence (AI) models that can identify relationships that exist amongst a large number of data points are not easy to design.
In fact, data scientists spend a considerable amount of time pre-processing the data on which the models are to be trained through drawing useful aspects from the data, narrowing down the scope of the algorithms and then creating a system that can perform well both in the lab and in the real world. Nevertheless, Salesforce has come up with a new tool that is meant to ease the burden associated with such a process.
Recently, on GitHub, Salesforce, a cloud computing firm located in San-Francisco published TransmogrifAI, which is an automated machine learning library meant for structured data. Such data refers to the kind that is well categorized commonly found in databases and spreadsheets. It also performs feature selection, feature engineering, and model training as well, in three lines of code only.
According to Salesforce, the new toolkit is written in Scala and is developed on top of Apache Spark, which are similar technologies to the ones used in powering Salesforce’s AI platform dubbed Einstein. Also, for scalability reasons, it was built from the ground up. As such, TransmogrifAI cannot only process various datasets including dozens or even millions of rows but also operate on clustered machines on an off-the-shelf laptop or on top of Spark.
Salesforce Einstein’s director of product management Mayukh Bhaowal told VentureBeat that TransmogrifAI ideally converts raw datasets into custom models. The technology marks the evolution of the company’s in-house machine learning library, which enabled Einstein’s team to implement custom models for enterprise customers in several hours.
Simplified Machine Learning
TransmogrifAI provides a three-step workflow. For starters, it offers an automated feature and feature inference selection. This marks a vital part of model training since choosing the wrong features could bring out an overly biased, inaccurate and optimistic model. Through TransmogrifAI, users can specify their data’s schema, particularly the one the library utilizes to get the features automatically, for instance, zip codes and phone numbers.
In a demonstration by Bhaowal, he showed how TransmogrifAI can rapidly isolate features such as addresses, emails and job titles as well as determine whether they are predictive. According to him, the tool kit is ideal for dimensionality reduction.
The second step in TransmogrifAI’s flow entails automated feature engineering. The library converts structured data into vectors through the feature types that are extracted in the first step.
Once it does so, it prepares to commence automated model training. At this point, TransmogrifAI operates a cadre of machine learning (ML) algorithms on the data as well as automatically choosing the ideal performing model and samples. It also recalibrates the predictions in a bid to prevent imbalanced data.
Lastly, what the Senior Director of Salesforce Einstein’s data science department calls model explainability is core to the toolkit’s training. It refers to the transparency regarding the factors that influence the predictions of a model. She added that it is vital that the generated model is not a black box, especially from data privacy and trust perspective.
Coincidentally, the timing of TransmogrifAI comes after a few days after Oracle’s GraphPipe becomes open-sourced, a tool that makes it easier to deploy machine learning models made by frameworks like Google’s TensorFlow in the cloud.