Learn how to manage data privacy in generative AI applications with an effective governance framework and advanced data infrastructure.
Over the past year, we have seen the transformative effects of generative AI (GenAI) take hold, but we have also seen our fair share of false starts. This is because many organizations haven’t thought through the regulatory compliance issues, including data privacy. They run full speed ahead with their GenAI projects only to slam into a wall when they realize that this issue hasn’t been adequately addressed. But once organizations do address data privacy, the floodgates of innovation will open wide, and to that end, we’d like to offer insight on how to manage data privacy in a GenAI context.
Technology is not a Panacea
First, there is no silver bullet when it comes to addressing data privacy in AI. Technology, including privacy enhancing technology (PET), can help in building out compliance processes, but there is no one technology that acts as a data-privacy panacea. That said, a technology solution can implement an AI governance framework, but it can only be effective to the extent that it follows from such a framework. So before we say any more about technology, we need to provide some background on AI governance.
AI Governance Is a Collaborative Process
Regardless of whether an organization chooses to start a new committee or leverage an existing body, AI governance requires participation from a diverse group of professi+onals with different perspectives including technical, legal, compliance, and–of course–privacy. An interdisciplinary group of this kind would choose the appropriate framework, aligned with current regulations, such as the National Institute of Standards and Technology (NIST) and the EU AI Act, and build out processes for responsible AI adoption.
One milestone might be to build an inventory of approved AI use cases and set up a process for reviewing and approving new ones. Each participant would offer a different perspective, aligning their industry-specific compliance needs with emerging best practices related to AI. Ideally, such a group would not be led by a single person; instead, the group would be a collective, and maintaining the group’s cohesion and adaptability will be key to an organization’s ability to innovate and adopt new AI use cases, quickly, and responsibly.
The Role of the Privacy Professional in the AI Governance Process
At many organizations, the privacy professional plays a key role in setting up processes for AI governance. This may be due to the fact that before there were AI laws, such as the EU AI Act, there was the general data protection regulation (GDPR) and other privacy laws with a “first-to-market” advantage. Organizations with solid privacy governance foundations have found it easier to layer AI governance on top and build on that foundation. Many privacy professionals have stepped up to the plate, applying fundamental privacy principles such as privacy-by-design and privacy-by-default and performing Privacy Impact Assessments for new AI use cases.
Much like privacy governance, AI governance requires change management, which privacy professionals are well accustomed to given the ever-evolving privacy landscape.
The Role of Data in Privacy Technology
After developing an AI governance framework, the next step is to begin to map out a technology solution for putting the framework into action. Such a solution will likely be composed of multiple components, the full architecture of which is outside the scope of this article. However, when progressing from the framework to the solution, it’s important to take a critical look at the data.
An organization may have just finished implementing a new cloud data lakehouse or data warehouse, and stakeholders might feel ready to hit their new resource with all of the organization’s open GenAI projects. However, they might find that it cannot support these projects, and that could be for several reasons. First, for AI accuracy, the available metadata needs to provide significantly more detail than it does for many other data projects. In particular, the metadata needs to include business context, to make sure that the right data is used to answer the right question. A related requirement is the ability to access a broad diversity of data spread across many different systems and applications, not just from within the central data repository. In some cases, this data may even come from outside the organization.
GenAI applications have further requirements: They need to access both structured and unstructured data, they need to be able to access extremely high volumes of data, and the workloads and compute costs of running their processes are beyond what many organizations have seen in the days before GenAI. Also, in order for GenAI applications to be trustworthy and reliable, they also need to access all of this data in real time. This capability is difficult to support when data is widely distributed, and organizations do not want to require replicating all of that data to the central data repository. Organizations need a way to provide AI applications with real-time access to widely distributed data.
Also, organizations need to enable better, more efficient self-service data access, because organizations do not want everyone to keep going back to the same core group of data engineers and data practitioners for all various and sundry data requests. With this capability, organizations can enable AI application developers to be much more effective and productive.
The higher requirements of GenAI data raises the bar on how important it is for data infrastructures to demonstrate privacy-by-design, providing such functionality as automated access control and the automatic redaction of sensitive data.
Finally, explainability – one of the key pillars of the EU AI act – is essentially a problem just waiting to be solved by the right data. Organizations need to be able to seamlessly demonstrate to regulators, more or less on-demand, what data the GenAI application used to deliver its response.
AI Privacy Dos and Don’ts
To summarize, here are a few specific dos and don’ts regarding the protection of privacy in GenAI applications:
- Don’t appoint a single “King” or a “Queen” to manage AI privacy; leverage the collective wisdom of individuals from diverse perspectives.
- Don’t assume that the centerpiece of your existing data infrastructure, be it a data warehouse, a data lake, or a data lakehouse, will support the data privacy requirements of your new GenAI application.
- Don’t assume that the way your organization has been managing data can support these new requirements.
- Do begin all AI data privacy activities, including research into available technology, with the development of an AI governance framework that will drive all subsequent activities.
- Do look closely at the distinct requirements of GenAI on your data architecture, such as broader metadata requirements, specifically business semantics and business context. How will you get near-real-time access to broadly distributed data, without having to run expensive replication jobs to continually copy data to massive data lakes or data warehouses? How can you enhance the productivity of your AI developers without requiring them to become data experts?
- Do, in the spirit of agility, fail fast, move fast, and experiment freely. This is not only a psychological mindset but a technology requirement. How can you make it easy for AI developers to access relevant, trusted data, so they can easily try out different ideas?
Standards and the Future of Data Privacy
Though we talked about a few standards in this short article, there are no standards in place for how to overcome many of the issues we discussed – such as making sure that your organization’s data infrastructure provides privacy-by-design, that you have good structure in place for demonstrating privacy compliance, that you have access to the right metadata, or that you have the ability to access data in real time.
As AI continues to become a more prominent feature of the technology landscape, now is the time for thought leaders, ethicists, civil rights leaders, and others, from all walks of life, to weigh in on AI from their unique perspectives, not only within their own organizations, as we described it above, but in civil society. Those who actively participate in the technology conversation have an opportunity to be a part of monumental changes. In the U.S., there is the National Artificial Intelligence Advisory Committee (NAIAC), and in the E.U., there is the Committee on Artificial Intelligence (CAI), but there are many more.
This might also be a good time for the vendor community to join together and form a consortia or standards body to define what it means to have “good enough” metadata for GenAI, or what “privacy-by-design” means in the context of GenAI. Greater alignment around these issues would drive better interoperability as well as easier adoption of new GenAI innovations, not to mention advanced data privacy provisions.
Explore AITechPark for the latest advancements in AI, IOT, Cybersecurity, AITech News, and insightful updates from industry experts!