Hadoop

Scality RING boosts genomics with petabyte-scale data lake

Data-centric organizations in healthcare, financial, and travel services trust Scality RING as the foundation for AI-powered data lakes

Scality, a global leader in cyber-resilient storage for the AI era, today announced a large-scale deployment of its RING distributed file and object storage solution to optimize and accelerate the data lifecycle for high-throughput genomics sequencing laboratory SeqOIA Médecine Génomique. This is the most recent in a series of deployments where RING is leveraged as a foundational analytics and AI data lake repository for organizations in healthcare, financial services and travel services across the globe.

Selected as part of the France Médecine Génomique 2025 (French Genomic Medicine Plan), SeqOIA is one of two national laboratories integrating whole genome sequencing into the French healthcare system to benefit patients with rare diseases and cancer.

SeqOIA adopted Scality RING to aggregate petabyte-scale genetics data used to better characterize pathologies as well as guide genetic counseling and patient treatment. RING grants SeqOIA biologists efficient access from thousands of compute nodes to nearly 10 petabytes of data throughout its lifecycle, spanning from lab data to processed data, at accelerated speeds and a cost 3-5 times lower than that of all-flash file storage.

“RING is the repository for 90% of our genomics data pipeline, and we see a need for continued growth on it for years to come,” said Alban Lermine, IS and Bioinformatics Director of SeqOIA, “In collaboration with Scality, we have solved our analytics processing needs through a two-tier storage solution, with all-flash access of temporary hot data sets and long-term persistent storage in RING. We trust RING to protect the petabytes of mission-critical data that enable us to carry out our mission of improving care for patients suffering from cancer and other diseases.”

Scality RING powers AI data lakes for other data-intensive industries:
Customers report 59% lower TCO, 366% 5-year ROI and 34% more productive end users.

National insurance provider:
Scality RING powers AI-driven analytics for claim processing

One of the largest publicly held personal line insurance providers in the United States chose RING as the preferred AI-data lake repository for insurance analytics claim processing. The provider chose RING to replace its HDFS (Hadoop File System).

The customer has realized 3X improved space efficiency and cost savings, with higher availability through a multi-site RING deployment to support site failover.

Global travel services:
1 petabyte a day to power the world’s travel

A multinational IT services company whose technology fuels the global travel and tourism industry uses Scality RING to power its core data lake. RING supports one petabyte of new log data ingested each day to maintain a 14-day rotating data lake. This requires RING to purge (delete) the oldest petabyte each day, while simultaneously supporting 10s of gigabytes per second (GB/s) read access for analysis from a cluster of Splunk indexers.

For data lake deployments, these organizations require trusted and proven solutions with a long-term track record of delivering performance and data protection at petabyte-scale. For AI workload processing, they pair RING repositories in an intelligent tiered manner with all-flash file systems as well as leading AI tools and analytics applications, including Weka.io, HPE Pachyderm, Cribl, Cloudera, Splunk, Elastic, Dremio, Starburst and more. With strategic partners like HPE and HPE GreenLake, Scality can deliver managed AI data lakes. Learn more about how to unlock the full value of data wherever it lives at www.hpe.com.

Trusted and proven for AI-powered data lakes at petabyte-scale
Fast data processing is a no-brainer for any AI deployment, but to support world-class, petabyte-scale infrastructures, RING is the only solution that can give customers:

  • Cost savings with 366% five-year ROI
  • Best price/performance through optimal use of flash and HDD
  • Peace of mind with CORE5 end-to-end cyber-resiliency

“Selecting RING was the best decision for us at SeqOIA. RING provides the complete package of features for AI-powered data lakes,” said Alban Lermine. “RING is the most secure, scalable and cost-effective repository for petabyte-scale unstructured data on the market. We can collect, pre-process and analyze data from multiple data sources at dozens of GB/s.” 

RING S3 object storage for AI is unmatched with support for:

  • Retrieval-augmented generation (RAG) access from retrieval- and generative-based artificial intelligence models.
  • Integrated hybrid-cloud capabilities that enable RING to replicate and tier data to external public cloud services for integration with popular AI tools in AWS, Azure and Google.
  • Support for the customer’s choice of hybrid or all-flash storage servers.
  • CORE5 end-to-end cyber-resiliency capabilities to provide ransomware protection.

The combination of capabilities provides customers with a trusted data lake storage solution across multiple stages of the pipeline from data collection, cleansing, analysis, model development and training. RING provides organizations with high-performance and unbreakable data storage at an economic price point to enable 10s to 100s of petabytes for long-term AI data.

For more information about Scality AI Data Lakes go here: scality.com/AI/data-lake

Explore AITechPark for the latest advancements in AI, IOT, Cybersecurity, AITech News, and insightful updates from industry experts!

Related posts

Hadoop Market to Reach $842.25 Bn, Globally, by 2030 at 37.4% CAGR

PR Newswire

Next Generation of Oracle Autonomous Data Warehouse Available

PR Newswire

Infoworks.io announces Infoworks Replicator 4.0

PR Newswire