Established 1963
May 20, 2024
Insights

The 'AI Revolution' Comes With Data Privacy Risks: What Consumers Should Know

Carol C. Villegas
,  
Michael P. Canty
,  
Danielle Izzo
,  
Gloria J. Medina
,  
No items found.

Partners Michael P. Canty and Carol C. Villegas and Associates Daniella Izzo and Gloria Medina are the authors of the article "The 'AI Revolution' Comes With Data Privacy Risks: What Consumers Should Know" published by New York Law Journal.

Tech companies developing artificial intelligence (Al) systems have consumed nearly all publicly available data on the internet, but this data, alone, is not enough to train their systems. Now, with little public data left to ingest, the tech industry is turning to consumers to collect and use their personal data to train their Al systems — a highly invasive data collection practice. Consumers should take note: these data collection practices pose a variety of serious privacy issues.

In recent months, artificial intelligence and the "Al Revolution" have been at the forefront of business, media, and government attention. Al technology is rapidly evolving at a pace far faster than the development of applicable laws and regulations.

Big Tech companies are releasing extensive Al search features across platforms and enhancing their targeted marketing technology. For example, Google recently launched a new generative Al feature for Chrome, expanding Google's browsing and personalization features. Meta introduced an "Ask Meta Al anything" search function on its lnstagram and Facebook social media platforms.

Additionally, OpenAI released the ChatGPT artificial intelligence search assistant, which has garnered tremendous public attention. Now, other tech companies are working to keep pace.

These tech companies are scraping every corner of the internet to obtain the massive amounts of data required to train and launch their Al systems. Indeed, companies require billions of datapoints to build and train the algorithms and machine learning models underlying Al tools.1  Companies need far more; real-time data is placing consumers and their personal data at risk, which begs the question: what are the privacy costs to consumers?

Understanding AI Systems

To assess the privacy risks associated with Al technology, it is first important to understand the process for developing Al systems.

Al is powered by algorithms, mathematical procedures or rule sets that are designed to analyze data, discern patterns, and reach related conclusions.2 An "algorithm is simply a strand of coded instructions for completing a task or solving a problem," but when used to power Al, algorithm structures are far more complex, drawing conclusions based on billions of data points and, on an automated basis, incorporating each conclusion into the next analysis — thus, building "knowledge."3

Machine learning models require an enormous amount of data in order to build this "knowledge."4

Data is used to train the Al algorithms through a repetitive "trial and error'' process, thus exposing the algorithms to various data points for training. In training, algorithms are presented with massive data sets and are tasked with drawing conclusions based on the data.5 Human employees or reviewers conduct quality control checks on the algorithms' conclusions or "outputs" to train the models to distinguish between correct and incorrect outputs.

This training builds the artificial knowledge.6

Where Do Companies Get Training Data?

As discussed above, companies go to great lengths to obtain the massive data sets required to train their nascent Al systems. Recent reports reveal that OpenAI (developer of the ChatGPT Al system) used its technology to scrape any data available on the internet to collect and use to train its Al systems, but this data alone is not enough. As a result, OpenAI is developing speech recognition tools to transcribe audio and collect information from YouTube videos and other video sources.7

Worse than internet scraping, some companies purchase data sets from data brokers, such as LexisNexis, containing consumers' personal data amassed from the businesses they engage with, including car companies.8

Other companies, like Meta, intercept data by embedding mobile app and web tracking tools, like the Meta pixel and SDKs, to surreptitiously intercept users highly sensitive information, including location and health data.9

Increasing Public Privacy Concerns

The data collection practices described above raise serious questions about individuals' data privacy. Indeed, companies have collected billions of data points across the many mobile apps and websites consumers use daily.

According to the Pew Research Center, 81% of American consumers fear that data collected by companies that work with Al will be used in ways that make them uncomfortable.10

And consumers are right to worry. A recent New York Times investigation found that "three major players in this race [for more data,] OpenAI, Google, and Meta[,] . . . [are] willing to do almost anything to get their hands on data, including ignoring, and in some cases, violating corporate rules and wading into a legal gray area."11

Risks of "Data at All Costs"

Companies have collected billions of data points from across the internet and are still seeking more. Now, consumers' most sensitive data is at risk of being used to train Al algorithms. Last year, pharmaceutical company GoodRx was fined $1.5 million by the Federal Trade Commission for "shar[ing] customers' health data with Google, Facebook and other third parties . . . [even after] pledging not to share user data."12  And earlier this year, the public learned that Google collected "billions of personal records" from Chrome Incognito users after assuring them that their browsing information would not be tracked.13

Consumers also face the risk of having their data reviewed by human employees during the Al training process. As discussed above, Al algorithms are trained with massive data sets. However, these data sets require human annotation before they can be used in training. This means that human employees must review, categorize, and annotate data in preparation for Al training — exposing consumers' personal data to unknown company employees.14

Worse yet, consumers often cannot rely on terms of service and privacy policies to sufficiently disclose companies' true data collection and use practices. The FTC recently took action against several companies for misrepresenting their data practices to consumers. Most notably, in 2023, the FTC banned a mental health services app BetterHelp from sharing user's sensitive mental health information with third parties, including Meta and Snapchat, after assuring users that "it would not use or disclose their personal health data except for limited purposes, such as to provide counseling services."15

The FTC also took action against Epic Games, Inc. (creator of the video game Fortnite) for failing to take any steps to obtain consent from parents for its data sharing practices, despite having internal evidence that many of its users were under 13 years of age.16

Protect Data and Privacy

At this point in the "Al Revolution," data has been collected from millions of individuals and used without their knowledge or consent. What can consumers do to remedy these invasions of privacy and protect their data moving forward?

First, consumers can proactively protect their data by "opting out" of data sharing at the data broker level. Data brokers, such as LexisNexis, provide consumers with the ability to complete opt-out forms to stop certain personal data from being shared.17

Consumers may also consider enabling certain privacy settings like Apple's "Ask App Not to Track" feature.18 However, there are no guarantees that Apple is able to enforce this setting across all apps and, at best, these purported protections are only limited to information accessed through Apple's devices.

Consumers can also seek out browsing and messaging platforms that prioritize privacy, like Brave, non-profit browser Tor, and private messaging service, Signal.

But is this enough? With the number of apps and websites in existence surreptitiously collecting data, consumers are left guessing as to which apps and websites track what — and how their personal data is actually used.

One driver of systemic change across the industry that is available to all consumers looking to protect their privacy, learn how their data is being used, and seek redress if their privacy rights are violated, is the legal system. Consumers can seek remedies for invasions of their privacy by filing claims in lawsuits and arbitrations.

Given that many claims are statutory in nature — meaning that the violation carries a penalty requiring an actual dollar payment (anywhere from hundreds to thousands of dollars per claim) — consumers bringing a critical mass of claims can have beneficial attitude — and policy-shifting effects on companies that collect data in an unauthorized manner.

And beyond seeking monetary compensation, consumers can also seek injunctive relief, forcing companies to change their policies and behaviors. In dealing with the future of Al and data collection practices, injunctive relief is particularly attractive, allowing consumers to reshape companies' data collection and use practices on a systemic level for the benefit of all future consumers.

Download full article here.

1 See Karen Hao, The Facebook Whistleblower Says Its Algorithms Are Dangerous. Here's Why., MIT Tech. Rev. (Oct. 5, 2021 ); see, e.g., Aatish Bhatia, Watch an A.I. Learn to Write by Reading Nothing but Jane Austen, N.Y. Times (Apr. 27, 2023),

2 See Algorithm, Merriam Webster Dictionary Online (last visited May 3, 2024).

3 How do Algorithms Work?, Univ. of York (last visited May 3, 2024 ).

4 See Danilo Bzdok et al., Machine Learning: A Primer, 14 Nature Methods 1119 (2017); Cecilia Kang et al., Four Takeaways on the Race to Amass Data for A.I., N.Y. Times (Apr. 6, 2024); ("A.I. models become more accurate and more humanlike with more data.").

5 See Samuel R. Bowman, Eight Things to Know about Large Language Models (Unpublished Manuscript), NYU Courant Inst. of Mathematical Scis. (2023).

6 See, e.g. Munsif Vengattil & Paresh Dave, Facebook 'Labels' Posts by Hand, Posing Privacy Questions, Reuters (May 6, 2019).

7 See Cade Metz et al., How Tech Giants Cut Corners to Harvest Data for A.I., N.Y. Times (Apr. 8, 2024).

8 See Alice Holbrook, When LexisNexis Makes a Mistake, You Pay For It, Newsweek (Sept. 26, 2019), (reporting that LexisNexis "aggregates and sells consumer data ... helping other companies figure out whether to renew your insurance, approve your loan or offer you a job," etc.); Kashmir Hill, Automakers Are Sharing Consumers' Driving Behavior with Insurance Companies, N.Y. Times (March 13 2024).

9 See Kristin Cohen, Location, Health, and Other Sensitive Information: FTC Committed to Fully Enforcing the Law Against Illegal Use and Sharing of Highly Sensitive data, Fed. Trade Comm'n (July 11, 2022), (explaining how SDKs collect sensitive information); see also In re Meta Pixel Healthcare Litig., No. 22-cv-03580 (N.D. Cal.) (alleging Meta's pixel tool intercepts and transmits sensitive patient health information from medical websites).

10 Michelle Faverio, Key Findings About Americans and Data Privacy, Pew Rsch. Ctr. (Oct. 18, 2023).

11 Cade Metz, A.I. Original Sin, N.Y. Times (April 16, 2024), (emphasis added).

12 Ruth Reader, FTC Cracking Down on Companies That Share Customers' Health Data, Politico (Feb. 1, 2023).

13 Michael Liedtke, Google Will Purge Billions of Files Containing Personal Data in Settlement of Chrome Privacy Case, Associated Press (Apr. 1, 2024 ).

14 See Josh Dzieza, Al Is a Lot of Work, The Verge (Jun. 20, 2023).

15 FTC to Ban BetterHelp from Revealing Consumers' Data, Including Sensitive Mental Health Information, to Facebook and Others for Targeted Advertising, Fed. Trade Comm'n (Mar. 2, 2023).

16 Fortnite Video Game Maker Epic Games to Pay More Than Half a Billion Dollars over FTC Allegations of Privacy Violations and Unwanted Charges, Fed. Trade Comm'n (Dec. 19, 2022).

17 See, e.g., LexisNexis Opt-Out Form, LexisNexis (last visited May 3, 2024); Thomas Claburn, How to Spot OpenAl's Crawler Bot and Stop It Slurping Sites for Training Data, The Register (Aug. 8, 2023).

18 See If an App Asks to Track Your Activity, Apple, Inc. (last visited May 3, 2024).