Generative AI tools are trained on large datasets composed of content sourced from the internet, which reflects the ideas and beliefs of the content creators. The seminal research paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?" demonstrates how "large datasets based on texts from the Internet overrepresent hegemonic viewpoints and encode biases potentially damaging to marginalized populations" (Bender et al., 2021, p. 610).
Viewpoints and biases that are dominant in GenAI training data will be similarly dominant in its output. As a result, text-generating GenAI tools can reproduce biases, along with hate speech or misinformation, that are present in the data they were trained on (Bender et al., 2021, p. 617).
Training data is typically owned and kept private by the technology companies that develop GenAI. Although end-users cannot assess training data, the output generated by chatbots based on that data may impact our lives:
The full extent of generative AI's environmental impact is not completely known yet, as AI companies have kept some of this information secret. However, the computing infrastructure and power required to create, operate, and use these technologies consumes substantial energy and natural resources. For example, prompting ChatGPT for information requires four to five times the energy of conducting a Google search (Crawford, 2024).
How does generative AI impact the environment?
Researchers have shown that it is possible to reduce the energy costs of generative AI by using more renewable energy, implementing sustainable construction of data centers, and scheduling computation during certain times of the day (Saenko, 2024). These practices would require transparency and commitment from tech companies and advocacy from users and policymakers.
In the conversation surrounding GenAI, the text produced by chatbots is often presented as the result of machine intelligence only. However, journalists have shown that the work of many human laborers is essential to the text generated by ChatGPT and other chatbots. According to an investigative report, "Behind even the most impressive GenAI systems are people — huge numbers of people labeling data to train it and clarifying data when it gets confused" (Dzieza, 2023).
How do humans contribute to the work of generative AI?
The human labor used to train generative AI tools is often outsourced to underpaid workers in the Global South. For instance, workers in Kenya were paid less than $2 an hour to label disturbing toxic content (Perrigo, 2023). Some academics refer to these practices as "digital neocolonialism": Western tech companies exploit the labor and natural resources (for example, minerals used in computer hardware) of nations in the Global South, further perpetuating the legacy of colonialism (Browne, 2023).
Copyright is a type of intellectual property law that gives authors control over how their original works are used and shared. In “Generative AI is a Crisis for Copyright Law,” AI scholar Kate Crawford and technology law expert Jason Schultz (2024) argue that the rise of generative AI technologies necessitates a reassessment of copyright law. They argue that twenty-first century technology companies are using copyright law, which was developed in the eighteenth century, to “exploit all the works of human creativity that are digitized and online" (Crawford & Schultz, 2024).
Technology companies take billions of works of human creativity from the internet, compile them into data sets, and use the datasets to train generative AI tools. Artists and authors, including George R.R. Martin, Ta-Nehisi Coates, and Junot Diaz, have expressed sorrow and outrage that private companies are using their copyrighted works to “train AI systems, which can then create images and texts that replicate their artistic style” (Crawford & Schultz, 2024). However, as of January 16, 2024, no court has ruled that AI training infringes on copyright (Crawford & Schultz, 2024).
The issues generative AI poses for copyright law are still being debated. Here are some recent updates:
Content adapted from University of California Irvine Libraries’ Generative AI and Information Literacy LibGuide written by Stacy Brinkman and April Urban.
Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?" In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT '21), March 3–10, 2021, Virtual Event, Canada. https://doi.org/10.1145/3442188.3445922.
Glazko, Kate, Yusuf Mohammed, Ben Kosa, Venkatesh Potluri, and Jennifer Mankoff. “Identifying and Improving Disability Bias in GPT-Based Resume Screening.” In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24), June 03–06, 2024, Rio de Janeiro, Brazil. https://doi.org/10.1145/3630106.3658933.
Omiye, Jesutofunmi A., Jenna C. Lester, Simon Spichak, Veronica Rotemberg, and Roxana Daneshjou. “Large Language Models Propagate Race-Based Medicine.” NPJ Digital Medicine 6, no. 1 (2023): 1–4. https://doi.org/10.1038/s41746-023-00939-z.
Content adapted from University of California Irvine Libraries’ Generative AI and Information Literacy LibGuide written by Stacy Brinkman and April Urban.
Berreby, D. (2024, February 6). As Use of A.I. Soars, so Does the Energy and Water It Requires. Yale E360. https://e360.yale.edu/features/artificial-intelligence-climate-energy-emissions
Calma, J. (2023, October 10). The environmental impact of the AI revolution is starting to come into focus. The Verge. https://www.theverge.com/2023/10/10/23911059/ai-climate-impact-google-openai-chatgpt-energy
Crawford, K. (2024). Generative AI’s environmental costs are soaring — and mostly secret. Nature, 693. https://doi.org/10.1038/d41586-024-00478-x
Kimball, S. (2024, May 5). AI could drive a natural gas boom as power companies face surging electricity demand. CNBC. https://www.cnbc.com/2024/05/05/ai-could-drive-natural-gas-boom-as-utilities-face-surging-electric-demand.html
Saenko, K. (2024, February 20). A Computer Scientist Breaks Down Generative AI’s Hefty Carbon Footprint. Scientific American. https://www.scientificamerican.com/article/a-computer-scientist-breaks-down-generative-ais-hefty-carbon-footprint/
Content adapted from University of California Irvine Libraries’ Generative AI and Information Literacy LibGuide written by Stacy Brinkman and April Urban.
Browne, G. (2023, May 25). AI is steeped in big tech’s ‘Digital colonialism.’ WIRED. https://www.wired.com/story/abeba-birhane-ai-datasets/
Dzieza, J. (2023, June 20). Inside the AI Factory: the humans that make tech seem human. The Verge. https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots
Perrigo, B. (2023, January 18). Exclusive: OpenAI used Kenyan workers on less than $2 per hour to make ChatGPT less toxic. TIME. https://time.com/6247678/openai-chatgpt-kenya-workers/
Crawford, K., & Schultz, J. (2024, January 16). Generative AI Is a Crisis for Copyright Law. Issues in Science and Technology. https://issues.org/generative-ai-copyright-law-crawford-schultz/
Knibbs, K. (2024, December 19). Every AI Copyright Lawsuit in the US, Visualized. WIRED. https://www.wired.com/story/ai-copyright-case-tracker/
United States Copyright Office. (2024). Copyright and Artificial Intelligence: Part 1 - Digital Replicas. https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-1-Digital-Replicas-Report.pdf