Machine Learning

New Research by Zectonal™ Uncovers Data Poison Payload Vulnerability

Log4jShell exploit embedded in big data files bypasses traditional firewall security resulting in polluted data lakes and poisoned AI machine learning models

Zectonal™, innovators in data observability for artificial intelligence (AI), has released research findings related to a new threat vector for enterprise data lakes resulting from the Java log4jShell vulnerability. In its research article entitled “An Attack Vector for Data Supply Chains and Data Lakes Using Data Payloads as Exploits,” Zectonal engineers describe how a single string of text embedded within a malicious big data file payload can trigger a data poisoning attack on an enterprise data lake.

According to Zectonal’s findings, the Java log4jShell vulnerability can be triggered once it is ingested into a target data lake or data repository via a data pipeline, bypassing conventional safeguards such as application firewalls and traditional scanning devices. Once the vulnerability is triggered via a no-code open-source extract-transform-load (ETL) software application, an attacker can access the ETL service running in a private subnet from the public Internet via a remote code execution (RCE) exploit.

Since the big data file carrying the poison payload is often encrypted or compressed, the difficulty of detection is much greater. Once the vulnerability is triggered in the data lake, attackers can use it to execute different forms of attack including poisoning AI machine learning training data. This type of subversion renders machine learning models ineffective and can be very difficult to detect. 

“The simplicity of the log4jShell exploit is what makes it so nefarious,” said David Hirko, Founder of Zectonal. “This particular attack vector is difficult to monitor and identify as a threat due to the fact that it blends in with normal operations of data pipelines, big data distributed systems, and machine learning training algorithms. Understanding the nature of the threat is the first step in creating safeguards against it.”

As part of its research, the Zectonal team successfully gained remote access to an extract-transform-load (ETL) software service with private subnet IP addresses that was part of a virtual private cloud hosted by a public cloud provider. An industry report states that the ETL software vendor’s components have been downloaded millions of times since it was first introduced. Patches that remedy this specific exploit were created by the ETL software vendor immediately following the initial Log4jShell disclosure. The Zectonal team was successful in triggering an RCE exploit for multiple unpatched releases of the ETL software that spanned a two-year period. The team’s research also demonstrated that Zectonal’s Deep Data Inspection™ data observability software detected and protected data lakes and data warehouses from these kinds of attacks.

The report states: “We believe the data flow, data processing, and cloud architectures used in our research are common and realistic… We recognize that there are many different architectures, software applications, and advanced security techniques that organizations use to achieve similar outcomes…. Our objective in this article is to bring awareness to the emerging threat vectors by demonstrating a very credible and realistic data and AI poisoning attack vector.”

Visit AITechPark for cutting-edge Tech Trends around AI, ML, Cybersecurity, along with AITech News, and timely updates from industry professionals!

Related posts

Deep Learning-driven Artivatic Redefines Sales & Marketing

PR Newswire

Elastic Named a Visionary for APM and Observability

Business Wire

Global AutoML Market Report 2021-2030

PR Newswire