Machine Learning

Datadog Highlights AI/ML and AWS Monitoring Capabilities at re:Invent

Customers like AppFolio, Cash App and andsafe leverage Datadog to monitor their AWS environments

Datadog, Inc. (NASDAQ: DDOG), the monitoring and security platform for cloud applications, today at AWS re:Invent highlighted its continued investment in its Amazon Web Services (AWS) monitoring product portfolio, which covers all aspects of a customers’ tech stack, including AI/ML applications as well as serverless and containerized environments. Customers like AppFolio, andsafe, Asana, Maersk, Cash App, Sweetgreen, The PlayStation Network and Twilio use Datadog to monitor AWS services through more than 100 unique integrations with Datadog.

“We continue to see companies rely on Datadog for enterprise-scale observability at an accelerated rate,” said Yanbing Li, Chief Product Officer at Datadog. “Trends like AI/ML, cloud migration, serverless and containers—and the need to monitor and optimize resources for all these areas—have helped to accelerate this growth as companies search to better understand their LLM usage, infrastructure performance and cloud costs.”

Datadog now offers over 100 unique AWS service integrations, including for AI/ML services:

  • AWS Trainium and AWS Inferentia ML chip monitoring to help customers optimize model performance and resource efficiency, prevent service interruptions and scale their infrastructure as ML workloads grow.
  • Amazon Q to help developers easily query and interact with Datadog directly in the AWS Management Console, using natural language.
  • Amazon Bedrock to allow teams to monitor their AI models’ FM usage, API performance and error rate with runtime metrics and logs.
  • Amazon SageMaker to allow data scientists and engineers to collect, visualize and alert on Amazon SageMaker metrics so they can flag issues quickly and identify opportunities to improve the performance of ML endpoints and jobs.

“The Datadog LLM Observability solution helps our team understand, debug and evaluate the usage and performance of our GenAI applications. With it, we are able to address real-world issues, including monitoring response quality to prevent negative interactions and performance degradations, while ensuring we are providing our end users with positive experiences,” said Kyle Triplett, VP of Product at AppFolio.

“We explored a bunch of different hosted solutions and found that SageMaker solved all the problems that we were encountering. And we did some stress testing with it and it held up to the traffic that we expected to be sending through the system,” said James Adams, Machine Learning Engineering Manager at Cash App. “With Datadog, it has all these AI integrations—including SageMaker—that we’re using heavily.”

“andsafe has been all in on Amazon Web Services since day one and our infrastructure is based on microservices which are running on Amazon EKS,” said Marcel Drechsler, Senior Cloud Solutions Engineer at andsafe. “To monitor the resource consumption, we are utilizing the container monitoring tools of Datadog. As a result, we were able to decrease the resource consumption and make the process much faster.”

Learn more about how Datadog helps teams monitor every layer of their AWS environments, and visit Datadog at AWS re:Invent 2024 at booths #832 and #1728. Datadog will host a webinar to recap the announcements made at re:Invent—register here.

Explore AITechPark for the latest advancements in AI, IOT, Cybersecurity, AITech News, and insightful updates from industry experts!

Related posts

DeFinity Adds Deep-learning AI Models to Detect Anomalies

Business Wire

Blue Orange Digital Appoints Diana Bald as President

PR Newswire

An AI-powered Safeguarding keyboard, SafeToNet Acquires Net Nanny

PR Newswire