New Galileo Community Edition Helps Data Scientists Build Better Machine Learning (ML) Models, 10x Faster, Through Better Training Data
Galileo, the first ML data intelligence company for unstructured data, today announced Galileo Community Edition, a free version of its platform that enables data scientists working on Natural Language Processing (NLP) to build high performing ML models quickly with better quality training data. The free edition is available today and will be showcased during the Galileo Demo Hour on November 15: https://hopin.com/events/galileo-demo-hour.
More than 80% of the world’s data today is unstructured (text, image, speech, etc.). Before Galileo launched six months ago, there was not a tool on the market for debugging and fixing unstructured data during the ML workflow so data scientists spent a vast majority of their time data-debugging in Excel sheets and Python scripts, causing the productionization of high quality models to take months.
“While data powers ML, debugging unstructured data is incredibly manual and time-intensive. My co-founders Atindriyo Sanyal, Yash Sheth and I noticed a complete absence of data focused tooling for unstructured data ML while at Apple, Google and Uber AI. We repeatedly heard the same from data science teams across the globe. This is why we started Galileo – to build ML unstructured data tooling. Today we are making Galileo available for free through the Galileo Community Edition for any data scientist to sign up and get the superpowers to fix their ML data instantly,” said Vikram Chatterji, co-founder and CEO of Galileo.
Galileo instantly surfaces the erroneous/bad unstructured data (mislabels, imbalance, drifted data, etc.) with actions and integrations to fix them, all within one platform. This short circuits the time taken by data scientists to curate a high quality training dataset, by fixing data errors and selecting the highest value production data, from weeks today to minutes with Galileo.
Users on Galileo:
- Pratik Bhavsar, founding engineer at Enterpret, said: “Using Galileo quickly gave us a 10% absolute bump in our F1 score! It is like having an expert on the team that identifies data errors that would otherwise go unnoticed.”
- Talal Alqadi, data scientist at involve.ai, said: “Galileo helped us dramatically reduce the time required to find errors in our training data and improve our Named Entity Recognition (NER) model’s F1 score by ~50%, reduce false negatives by 2x and create a model that was able to generalize across multiple domains!”
- Viktoria Rojkova, vice president of data science at MasterControl, said: “Galileo is a very intuitive and powerful tool that helped us quickly curate a high quality training dataset ready for the real world. Galileo has been clearly created by fellow data scientists for data scientists.”
- Loreto Parisi, head of ML at Musixmatch, said: “Galileo has enabled us to build a NLP pipeline that instantly inspects training data and improves prediction quality, to assist human data curation across the entire ML lifecycle.”
With Galileo Community Edition, anyone can sign up for free, add a few lines of code while training their model with labeled data or during an inference run with unlabeled data to instantly inspect, find and fix data errors or select the right data to label next using the powerful Galileo UI.
Galileo’s Demo Hour
Galileo’s online event kicks off at 10 a.m. PT on November 15 with a fireside chat with Anthony Goldbloom (founder of Kaggle), lightning talks by customers on how they are instantly debugging their unstructured data and building better ML models and a live demo of Galileo Community Edition.
Galileo Raises $18 Million in Series A Funding and Dharmesh Thakker and Lip-Bu Tan Join the Board
Today Galileo also announced that it has raised $18 million in Series A funding, bringing the total raised to $23.1 million. This round was led by Battery with participation from previous investor The Factory and new investors Walden Catalyst and FPV Ventures and industry luminaries Anthony Goldbloom, Pegah Ebrahimi (former COO at Morgan Stanley) and Wesley Chan (former general partner at Google Ventures). Galileo plans to use the new funding to continue to grow its engineering and go-to-market teams and to expand its platform to support new data modalities like Computer Vision (CV).
“It’s no secret that the ML training and data quality problems are ballooning along with the rise in ML adoption. The Galileo team has been laser focused on this problem and has taken a unique approach to provide quick time-to-value with a category-defining product. Going forward, ML data intelligence will be table stakes for ML teams, and we feel Galileo is extremely well positioned to capitalize on this trend,” said Dharmesh Thakker, a general partner at Battery Ventures and Galileo board member.
“At Walden Catalyst, we’ve observed an exponential adoption of ML with unstructured data in enterprises as models get commoditized and ML accuracy is now increasingly dependent on the quality of the data the models are fed. At Apple, Google and Uber AI, the founders of Galileo faced the challenges of not having any solutions while working with unstructured data to find and fix ML data errors fast. They are tackling this fundamental problem head on with a first to market solution. This is a huge and critical problem in a rapidly growing enterprise market and we are excited to back them,” said Lip-Bu Tan, founding managing partner of Walden Catalyst and Galileo board member. Tan also sits on Intel’s board and has seen 130 companies he invested in IPO.
- Read Battery Ventures’ blog: https://www.battery.com/blog/introducing-galileo
- Read Walden Catalyst’s blog: https://waldencatalyst.com/blog/galileo-empowers-companies-to-run-machine-learning-better-faster
Visit AITechPark for cutting-edge Tech Trends around AI, ML, Cybersecurity, along with AITech News, and timely updates from industry professionals!