Data is an extremely important asset for any business these days as we bound into an artificial intelligence (AI) based, data-driven economy. It’s needed to train machine learning algorithms as well as to measure how well a business is performing. Most companies can get hold of data. But, it’s getting hold of quality data that’s the challenge.
Dirty data is considered to be data that has a lot of missing information, duplicate information, or just data that’s irrelevant. And while it doesn’t seem like that much of an issue from the outside, for the US economy, dirty data is costing around $3.6 trillion a year.
Sergey Zelvenskiy is a machine learning engineer at ServiceChannel where he works daily with AI to automate certain facilities management processes.
Here’s what he has to say about the data companies should be focusing on when looking to build AI applications: “The data that companies have may not necessarily be bad, it is just likely incomplete to solve the problem. There is a chicken and egg problem here. The original system is usually built to collect the data needed for human-driven solutions and moving it to an AI driven solution might require filling of the gaps.”
There are a few things companies should keep in mind when using data for AI. They are:
- Get the right kind of data. Not having the right data for the purpose intended will hold any company back and waste valuable time. Always make sure you check at the start of a project whether you have the right data to solve the issue.
- Use existing expertise. As good as AI is, it still needs sufficient human expertise to understand and interpret data its found and to come up with a viable solution.
- Concentrate on the actual product. In order to get good data companies need to entice users to contribute their data. Those that have a good customer experience will be more willing to contribute their data. One way companies can do this is through a user-in-the-loop model. This means that any users wanting to use the features of your product must first give away some of their data. Google and Facebook are two well known companies that use this model.
- Know your limits. There’s only so much that machine learning engineers and AI experts can do, so be patient and be realistic.