To Build Successful ML, You Have to Fail Fast and Early

Fueled by data, ML models find patterns and make predictions faster as compared to humans. Victor Thu, President, of Datatron talks about the best practices for ML deployment.

Machine learning projects can’t be approached in the same way as traditional software projects. Speed is of the essence – you need to be able to try things out, fix them and try them again. In other words, you’ve got to be able to fail fast and early.

Not your same old software approach

With a conventional software approach, you’re intentionally programming using decision logic. You want to be as precise as you can and therefore build in logic that allows the software to work. Once the logic of the application is built, there is no need to make any changes other than to fix bugs. It’s a highly methodical process in which you need to ensure each sequential step in the process is correct before moving to the next step – you make incremental progress along the way. It’s a well-established approach that’s long been proven to work for software development.

But you can’t follow this same type of process with AI/ML projects – it just doesn’t work. You’ve got to flip the script. To be successful with an ML project, you need the ability to iterate, frequently and quickly. You train ML initially – it’s a process – and so you need to approach it with the expectation that it won’t be accurate when it is released the very first time.

It’s a process that requires repeated iteration. In fact, 99% of the time, your initial model will encounter outcomes that are unexpected. Even if you spend months training your model in the lab, once it hits real-world data and traffic, it’s guaranteed to change.

Don’t let perfection become the enemy of good

This means you need to be able to get a model into production quickly so you can test it out and learn what adjustments are needed. Then you can adjust, release it again and refine it as needed. Because of this, you can’t get too focused on trying to make your model perfect before you test it out in production. It’s not going to be perfect the first time.

The incremental benefits from 92% to 95% accuracy while the model is being worked on in the lab for certain use cases may not be significant. This is because your AI model is still being trained on a small subset of training data. You could end up spending a lot of time and money to get the additional accuracy but lose out on the benefits your models could bring you.

In fact, before we focus too intently on driving model accuracy. It is even more critical to determine the business needs and impacts for using AI. Most recently in working with one of our clients who used the fail fast strategy on their NLP model. Despite only having 65% accuracy, they released the model into production. Over the course of time, they improved the accuracy from 65% to 68%. What they have realized is even with that small amount of improvement, it translated into a 20% improvement in operational margin via a reduction in customer support calls.

Best practices for ML deployment

Sometimes ML scientists don’t want to put a model into production because of the risk of the model failing or not coming up with the right predictions. That’s understandable. What you need is a mechanism that enables you to see what’s taking place in real time. That way, you can quickly pull your model out of production and update it and then, within a short period, push a new model out. It’s actually the best practice for getting machine learning models into production – rather than getting stuck in an “analysis paralysis.”

It’s much better to just get the model up and running and let it gather “real life experience.” This doesn’t negate the need for the data scientists to do the best job they can with the model from the get-go; you still need to build your model properly. It’s just that once you’ve completed it, you should quickly get it running and start to collect that valuable data.

As part of this process, you may want to run your models in A/B testing mode, shadow mode against real world data where you can essentially compare the performances between the different models so you have ample data and evidence before deciding to promote or demote a model.

Another best practice to consider is building a localized model, instead of focusing on building a single global model to predict behavior for the macro environment. A single all-encompassing model would take a tremendous amount of time, data and effort to make sure it works; with local model, you can leverage data from specific scenarios so that the model performs appropriately for each those scenarios. An example of this could be estimating demands for certain specialty burgers. If one based the global model on population in New York city and applied that model to the rest of North America, the model could work. However, it most likely would not reflect demands correctly in other parts of the country. Thus losing out additional profit margins that you would have been able to attain by using a localized model method.

And of course, models need to be constantly updated. Unlike traditional software where you can set it and forget it, models need constant changes as data in the environment is changing regularly.

ML models lose value over time—if they don’t go through regular iteration, they decay.

This needs to happen over the lifetime of the model and must be monitored closely.

ML done fast and right Machine learning models are very different from traditional software. But just as software developers have adopted agile methodology for DevOps, ML professionals benefit from a rapid deployment methodology for AI/ML models. For ML projects, you need a mechanism that enables you to get models up and running quickly. You need to be able to do a comparison between models – essentially testing of one running live versus one that’s not.These and the other best practices noted above will help you avoid getting stuck in analysis paralysis and fail fast and early so you can achieve scale with machine learning.

Visit AITechPark for cutting-edge Tech Trends around AI, ML, Cybersecurity, along with AITech News, and timely updates from industry professionals!

To Build Successful ML, You Have to Fail Fast and Early

Victor Thu

Explainable AI (XAI) vs. Traditional AI: What You...

Gurobi Appoints Dr. Pascal Van Hentenryck to Lead...

Clario Leads Responsible AI with ISO 42001, audited...

CircleCI Publishes 2026 State of Software Delivery

OncoHost Publishes Landmark JPBA Proteomics Study

QUICK LINKS

Our Publications

To Build Successful ML, You Have to Fail Fast and Early

Related posts