Guest Articles

How to improve AI for IT by focusing on data quality

See how high-quality data enhances AI accuracy and effectiveness, reducing risks and maximizing benefits in IT use cases.

Whether you’re choosing a restaurant or deciding where to live, data lets you make better decisions in your everyday life. If you want to buy a new TV, for example, you might spend hours looking up ratings, reading expert reviews, scouring blogs and social media, researching the warranties and return policies of different stores and brands, and learning about different types of technologies. Ultimately, the decision you make is a reflection of the data you have. And if you don’t have the data—or if your data is bad—you probably won’t make the best possible choice.

In the workplace, a lack of quality data can lead to disastrous results. The darker side of AI is filled with bias, hallucinations, and untrustworthy results—often driven by poor-quality data.

The reality is that data fuels AI, so if we want to improve AI, we need to start with data. AI doesn’t have emotion. It takes whatever data you feed it and uses it to provide results. One recent Enterprise Strategy Group research report noted, “Data is food for AI, and what’s true for humans is also true for AI: You are what you eat. Or, in this case, the better the data, the better the AI.”

But AI doesn’t know if its models are fed good or bad data— which is why it’s crucial to focus on improving the data quality to get the best results from AI for IT use cases.

Quality is the leading challenge identified by business stakeholders

When asked about the obstacles their organization has faced while implementing AI, 31% of business stakeholders involved with AI infrastructure purchases had a clear #1 answer: the lack of quality data. In fact, data quality ranked as a higher concern than costs, data privacy, and other challenges.

Why does data quality matter so much? Consider OpenAI’s GPT 4, which scored in the 92nd percentile and above on three medical exams, which failed two of the three tests. GPT 4 is trained on larger and more recent datasets, which makes a substantial difference.

An AI fueled by poor-quality data isn’t accurate or trustworthy. Garbage in, garbage out, as the saying goes. And if you can’t trust your AI, how can you expect your IT team to use it to complement and simplify their efforts?

The many downsides of using poor-quality data to train IT-related AI models

As you dig deeper into the trust issue, it’s important to understand that many employees are inherently wary of AI, as with any new technology. In this case, however, the reluctance is often justified.

Anyone who spends five minutes playing around with a generative AI tool (and asking it to explain its answers) will likely see that hallucinations and bias in AI are commonplace. This is one reason why the top challenges of implementing AI include difficulty validating results and employee hesitancy to trust recommendations.

While price isn’t typically the primary concern regarding data, there is still a significant price cost to training and fine-tuning AI on poor-quality data. The computational resources needed for modern AI aren’t cheap, as any CIO will tell you. If you’re using valuable server time to crunch low-quality data, you’re wasting your budget on building an untrustworthy AI. So starting with well-structured data is imperative.

Four facets of high-quality, trustworthy data for IT use cases

To understand why the quality of data matters, let’s look at AI in IT—an area that has value for nearly every industry. New AI models for IT can reduce the number of help tickets, dramatically lower the time needed to resolve problems and help you make better decisions by proactively highlighting potential issues before purchasing new software. In a field where a mistake can cost your organization millions of dollars at scale, a good AI solution is worth its weight in gold. But how do you ensure that it’s using good data?

The first thing to consider is the breadth of data. More data across more sources typically makes an AI more trustworthy, as long as you’re collecting good data. Think of it this way: a single restaurant review can offer a glimpse into its quality, but a restaurant with numerous reviews provides a more accurate assessment, allowing you to make a more informed decision. Was the one negative issue an outlier? Or is there a pattern that should be identified and evaluated?  Similarly, an AI trained for IT on 10,000 data points collected every 15 seconds from endpoints will be more useful than an AI trained on 800 data points every 15 minutes.

Next, focus on data depth. The amount of data a model has from IT endpoints can make a significant difference. In one instance, a company had 3,000 systems crash after a software patch didn’t play nice within the existing setup. The IT team quickly resolved the issue using a patented AI that identifies correlations between their system changes and device anomalies. This process was possible because the AI had been trained on their unique datasets, including historical data.

As AI trained for IT collects data, it’s crucial that the data is well-structured and as clean as possible. Most data sets will invariably have some noise—data that’s meaningless, irrelevant, or (in some cases) even corrupt, but training AI on high-quality, well-structured label makes all the difference. 

Finally, don’t forget about your people. AI is simply a tool. Change management and the impact of AI on humans are invaluable considerations when making decisions about introducing AI capabilities and use cases and (perhaps most important) evaluating if the AI you’re using for IT is delivering the most useful results for your organization. As AI continues to transform nearly every industry, think of data as the ingredients in the best AI recipe. If the ingredients are bland, the power and nuance of AI is lost. AI that’s fed robust, rich data, however, delivers on all the promises and opportunities of well-trained models. From there, the results will follow.

Explore AITechPark for top AI, IoT, Cybersecurity advancements, And amplify your reach through guest posts and link collaboration.

Related posts

AI for Banks: Is It Only Hype?

Rana Gujral

Shaping the Future of Clinical Monitoring

Sathya Ramnath

5 Reasons to Be Passionate about Coding

Asaf Darash