Popular open source tool Phoenix continues to expand what is possible in LLM evaluation, troubleshooting, and observability
Arize Phoenix, a popular open-source library for visualizing datasets and troubleshooting large language model (LLM)-powered applications, rolled out several industry-first capabilities in its latest release.
The update comes at a crossroads for generative AI, as new LLMOps tools race to keep up with the latest capabilities of foundation models. Over half (53.3%) of machine learning teams are planning production deployments of LLMs in the next year, but many continue to cite issues like hallucinations and responsible deployment as barriers in moving LLM-powered systems into the real world.
While the rise of LlamaIndex and LangChain has enabled developers to accelerate the development of applications powered by LLMs, the abstractions created by these frameworks can also make them complicated to debug. Phoenix’s new support for LLM traces and spans means that AI engineers and developers can get visibility at a span-level and see exactly where an app breaks, with tools to analyze each step rather than just the end-result.
This capability is particularly useful for early app developers because it doesn’t require them to send data to a SaaS platform to perform LLM evaluation and troubleshooting — instead, the open-source solution provides a mechanism for pre-deployment LLM observability directly from their local machine. Phoenix supports all common spans and has a native integration into LlamaIndex and LangChain.
The new Phoenix LLM evals library is also designed for fast and accurate LLM-assisted evaluations, ultimately making the use of the evaluation LLM easy to implement. Applying data science rigor to the testing of model and template combinations, Phoenix offers proven LLM evals for common use cases and needs around retrieval (RAG) relevance, reducing hallucinations, question-and-answer on retrieved data, toxicity, code generation, summarization, and classification. The Phoenix LLM evals library is optimized to run evaluations quickly with support for the notebook, Python pipeline, and app frameworks such as LangChain and LlamaIndex.
“As LLM-powered applications increase in sophistication and new use cases emerge, deeper capabilities around LLM observability are needed to help debug and troubleshoot. We’re pleased to see this open-source solution from Arize, along with a one-click integration to LlamaIndex, and recommend any AI engineers or developers building with LlamaIndex check it out,” says Jerry Liu, CEO and Co-Founder of LlamaIndex.
“Large language models are poised to transform industries and society, but when it comes to robust performance going from toy to production remains a challenge,” said Jason Lopatecki, CEO and Co-Founder of Arize AI. “These industry-first updates from Phoenix promise to provide better LLM evals and deeper troubleshooting to make complex LLM-powered systems ready and reliable in the real world.”
Visit AITechPark for cutting-edge Tech Trends around AI, ML, Cybersecurity, along with AITech News, and timely updates from industry professionals!