Foundation models could help us reach the “perfect secret”

Check Out All The Smart Security Summit On-Demand Sessions Here.


The digital assistants of the future promise to make everyday life easier. We may ask them to complete tasks such as booking accommodation for an out-of-town business trip based on the content of an email or answering open-ended questions that require a mixture of personal context and public knowledge. (For example: “Is my blood pressure within the normal range for someone my age?”)

But before we can achieve new levels of efficiency at work and at home, a big question needs to be answered: how can we provide users with strong and transparent privacy safeguards about the underlying personal information that data models machine learning (ML) use to arrive at these answers?

If we expect digital assistants to facilitate personal tasks that involve a mix of public and private data, we will need the technology to provide “perfect secrecy”, or the highest possible level of privacy, in some situations. So far, prior methods have either ignored the privacy issue or provided weaker privacy guarantees.

Third-year PhD in computer science at Stanford. Student Simran Arora studied the intersection of ML and privacy with Associate Professor Christopher Ré as an advisor. Recently, they set out to determine whether emerging baseline models – large ML models trained on massive amounts of public data – hold the answer to this pressing privacy question. The resulting paper was published in May 2022 on preprint service ArXiv, with a proposed framework and proof of concept for using ML in the context of personal tasks.

Event

On-Demand Smart Security Summit

Learn about the essential role of AI and ML in cybersecurity and industry-specific case studies. Watch the on-demand sessions today.

look here

The Perfect Secret Defined

According to Arora, a guarantee of perfect secrecy satisfies two conditions. First, as users interact with the system, the likelihood of adversaries learning private information does not increase. Second, as multiple personal tasks are accomplished using the same private data, the likelihood of data being accidentally shared does not increase.

With this definition in mind, she identified three criteria for evaluating a privacy system against the goal of perfect secrecy:

  1. Confidentiality: to what extent does the system prevent the leakage of private data?
  2. Quality: How does the model perform a given task when perfect secrecy is guaranteed?
  3. Feasibility: Is the approach realistic in terms of the time and cost involved in running the model?

Today, state-of-the-art privacy systems use an approach called federated learning, which facilitates the formation of collective patterns between multiple parties while preventing the exchange of raw data. In this method, the model is sent to each user and then sent back to a central server with that user’s updates. Source data is never revealed to participants, in theory. But unfortunately, other researchers have discovered that it is possible to recover data from an exposed model.

The popular technology used to improve the privacy guarantee of federated learning is called differential privacy, which is a statistical approach to protecting private information. This technology requires the implementer to set privacy parameters, which govern a trade-off between model performance and information privacy. It is difficult for practitioners to set these parameters in practice, and the trade-off between confidentiality and quality is not standardized by law. Although the chances of a breach may be very low, perfect secrecy is not guaranteed with a federated learning approach.

“Currently, the industry has been emphasizing statistical reasoning,” Arora explained. “In other words, how likely is someone to find out about my personal information? The differential privacy approach used in federated learning forces organizations to make choices between utility and privacy. It’s not ideal.

A new approach with foundation models

When Arora saw how basic models like GPT-3 perform new tasks from simple commands, often without requiring additional training, she wondered if these capabilities could be applied to personal tasks while still offering greater confidentiality than the status quo.

“With these large language models, you can say ‘Tell me the sentiment of this review’ in natural language and the model generates the response – positive, negative or neutral,” she said. “We can then use the exact same template without any upgrades to ask a new question with personal context, like “Tell me the subject of this email.”

Arora and Ré began exploring the possibility of using out-of-the-box public foundation models in a silo of private users to perform personal tasks. They developed a simple framework called Foundation Model Controls for User Secrecy (FOCUS), which proposes using a one-way data flow architecture to accomplish personal tasks while maintaining privacy.

The one-way aspect of the framework is critical because it means that in a scenario with different privacy scopes (i.e. a mix of public and private data), the public foundation model dataset is queried before the user’s private dataset, preventing back-leakage. in public space.

Test the theory

Arora and Ré assessed the FOCUS framework against criteria of confidentiality, quality, and feasibility. The results were encouraging for a proof of concept. FOCUS not only provides privacy of personal data, but it also goes a step further to hide the actual task the model was asked to perform as well as how the task was accomplished. Even better, this approach would not require organizations to set privacy settings that trade off between utility and privacy.

In terms of quality, the base model approach rivaled federated learning on six of the seven standard benchmarks. However, it underperformed in two specific scenarios: when the model was asked to perform an out-of-domain task (which is not included in the training process) and when the task was run with small models of base.

Finally, they examined the feasibility of their framework with respect to a federated learning approach. FOCUS eliminates the many communication cycles between users that occur with federated learning and allows the pre-trained base model to do the job faster through inference, making the process more efficient.

Foundation Model Risks

Arora notes that several challenges need to be addressed before foundation models can be widely used for personal tasks. For example, the performance drop of FOCUS when the model is prompted to perform an out-of-domain task is of concern, as is the slowness of the inference process with large models. For now, Arora recommends that the privacy community increasingly consider the core models as a reference and a tool when designing new privacy criteria and motivating the federated learning need. Ultimately, the appropriate privacy approach depends on the user’s context.

Foundation models also introduce their own inherent risks. They are expensive to pretrain and can hallucinate or misclassify information when uncertain. There is also a fairness issue in that, so far, base models are mostly available for resource-rich languages, so a public model may not exist for all personal settings.

Another complicating factor is pre-existing data leaks. “If the base models are trained on web data that already contains leaked sensitive information, that raises a whole new set of privacy issues,” Arora acknowledged.

Looking ahead, she and her colleagues at Stanford’s Hazy Research Lab are investigating methods to incentivize more reliable systems and enable in-context behaviors with smaller base models better suited to personal tasks on low-resource user devices. .

Arora can imagine a scenario, not too far off, where you ask a digital assistant to book a flight based on an email mentioning scheduling a meeting with a client out of town. And the model will coordinate the logistics of the trip without revealing any details about the person or company you are going to meet.

“It’s still early days, but I hope the FOCUS framework and proof of concept will spur further study of the application of public foundation models to private tasks,” Arora said.

Nikki Goth Itoi is a staff writer for the Stanford Institute for Human-Centered AI.

This story originally appeared on Hai.stanford.edu. Copyright 2022

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including data technicians, can share data insights and innovations.

If you want to learn more about cutting-edge insights and up-to-date information, best practices, and the future of data and data technology, join us at DataDecisionMakers.

You might even consider writing your own article!

Learn more about DataDecisionMakers

Leave a Comment