More than 10% of Fortune 500 Companies Use Cleanlab’s Automated Data Curation Platform to Overcome the Biggest Time and Cost Hurdle for Analytics, LLM, and AI Teams: Reliability.
Cleanlab, the company behind the automated data curation solution used to increase the dollar value of every data point in enterprise artificial intelligence (AI), large language model (LLM), and analytics solutions, has secured $25 million in Series A funding. This financing round was co-led by Menlo Ventures and TQ Ventures; Menlo Ventures’ Matt Murphy and TQ’s Schuster Tanger will join the board. Existing investor Bain Capital Ventures (BCV) and new investor Databricks Ventures participated in this funding round, which brings Cleanlab’s total funding to $30 million.
Cleanlab helps drive profitability. For today’s businesses, revenue is directly tied to data-driven analytics decisions and generative AI solutions. Bad data costs the U.S. alone over $3 trillion1, and 80 percent of time spent by enterprises is manually improving the data quality.2 Cleanlab is the first enterprise solution that reliably adds smart metadata automatically, removing the vast majority of the work and turning messy, real-world data into useful inputs for various models. This process increases the reliability and profit margin of enterprise analytics, LLM, and AI decisions. Cleanlab also automatically identifies the majority of a dataset containing no issues, increasing the profit margins of enterprise pipelines by avoiding expensive data quality and annotation for the majority of data.
Cleanlab’s novel AI algorithms were developed in-house by the founders, all of whom are PhDs in Computer Science from MIT and published researchers. The team’s proprietary approach to automated data curation builds upon the “confident learning” field created by the Cleanlab team, enabling them to pioneer an enterprise-ready product.
Today, over 10% of Fortune 500 companies (including AWS, JPMorgan Chase, Google, Oracle, and Walmart) and a variety of innovative startups (like ByteDance, HuggingFace, and Databricks) use Cleanlab to find and fix problems in sizable structured and unstructured visual, text, and tabular datasets. Whether building an LLM for enterprise, tagging intents in chatbot text data, or objects in visual navigation data, Cleanlab increases the dollar value of every data point in your dataset by automatically analyzing and correcting outliers, ambiguous data, and mislabeled data.
The company is also announcing that its flagship automated data curation platform, Cleanlab Studio, has launched several new features that address unreliable LLM outputs. Cleanlab’s Trustworthy Language Model (TLM) produces high-quality LLM outputs like ChatGPT, Falcon, and similar LLMs. It also adds a trustworthiness reliability score to all LLM outputs. Cleanlab Studio identifies and fixes issues in all types of datasets, including text, image, and tabular data. TLM extends Cleanlab Studio’s capabilities to add intelligent metadata to help automate reliability and quality assurance for systems that rely on LLM outputs, synthetic data, and generated content. Cleanlab’s Trustworthy Language Model is available to try in Beta today with Cleanlab Studio at cleanlab.ai.
“After working with companies like Microsoft and Tesla to get their AI-driven products to function better and helping MIT and Harvard detect cheating, it became clear that mislabeled and poorly curated data was the core issue behind these challenges,” said Cleanlab Co-Founder and CEO Curtis Northcutt. “It’s the culmination of over a decade of work to introduce Cleanlab Studio, which reimagines what AI and analytics can do for people and enterprises now that we can automate data curation and reliability.”
“While most of the investment in generative AI is chasing the biggest, baddest, and best model, the reality is that there is a massive complimentary opportunity that can shave billions off those efforts and lead to a better outcome. That is Cleanlab,” said Matt Murphy, Partner at Menlo Ventures. “Cleanlab’s amazing team of ML researchers and practitioners has built a data curation platform that fundamentally improves models via better, cleaner data.”
“We are thrilled to partner with Curtis, Jonas and Anish, the eminent authorities on data-centric AI,” said Schuster Tanger, Co-Managing Partner of TQ Ventures. “They have developed a solution to a large and pressing problem for enterprises across almost all industries: namely, ambiguous and wrongly labeled data. In addition to an exceptional team and superior technology, Cleanlab also has real world results from customers that point to Cleanlab’s effectiveness around percent accuracy improvement, percent reduction in labeled transactions required to train models, and dollar reduction in enterprise costs.”
“Cleanlab is well-designed, scalable, and theoretically grounded: It accurately finds data errors, even on well-known and established datasets,” said Patrick Violette, Senior Software Engineer at Google, “After using it for a successful project at Google, Cleanlab is now one of my go-to libraries for dataset cleanup.”
Visit AITechPark for cutting-edge Tech Trends around AI, ML, Cybersecurity, along with AITech News, and timely updates from industry professionals!