Skip to Main Content PCC Library Logo

Using Generative AI (GenAI) for Research

Bias

Generative AI tools are trained on large datasets composed of content sourced from the internet, which reflects the ideas and beliefs of the content creators. The seminal research paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?" demonstrates how "large datasets based on texts from the Internet overrepresent hegemonic viewpoints and encode biases potentially damaging to marginalized populations" (Bender et al., 2021, p. 610). 

Viewpoints and biases that are dominant in GenAI training data will be similarly dominant in its output. As a result, text-generating GenAI tools can reproduce biases, along with hate speech or misinformation, that are present in the data they were trained on (Bender et al., 2021, p. 617). 

Training data is typically owned and kept private by the technology companies that develop GenAI. Although end-users cannot assess training data, the output generated by chatbots based on that data may impact our lives: 

  • One study found that integrating these tools into healthcare systems could worsen discrimination against persons of color seeking medical care (Omiye et al., 2023). 
  • Another study showed that using GenAI for workforce recruitment and resume screening perpetuates gender, age, and ableist biases (Glazko et al., 2024).

Environmental Impacts

The full extent of generative AI's environmental impact is not completely known yet, as AI companies have kept some of this information secret. However, the computing infrastructure and power required to create, operate, and use these technologies consumes substantial energy and natural resources. For example, prompting ChatGPT for information requires four to five times the energy of conducting a Google search (Crawford, 2024).

How does generative AI impact the environment?

  • Energy consumption: As computer scientist Kate Saenko (2024) explains, "The more powerful the AI, the more energy it takes." As AI applications permeate our academic, professional, and personal lives, researchers are concerned that the technology’s energy use and carbon footprint will skyrocket (Calma, 2023). In the near future, large AI systems are expected to consume as much energy as entire nations (Crawford, 2024). 
  • Water use: Generative AI data centers, the facilities that house the computing infrastructure behind GenAI, consume enormous amounts of water. Water is used for cooling the computer servers and other hardware that generate substantial heat during operation. Shaolei Ren, a professor of engineering at UC Riverside, "estimates that a person who engages in a session of questions and answers with GPT-3...drives the consumption of a half-liter of fresh water" (Berreby, 2024). Researchers suggest that, by 2027, the use of AI across the globe could require as much water as entire nations (Crawford, 2024).  
  • Greater demand for more energy resources: The computing power needed for generative AI has driven tech companies to seek out more energy sources. Companies have proposed nuclear energy (Crawford, 2024) or increasing the consumption of natural gas (Kimball, 2024) as solutions. Both of these solutions have potential to further impact an environment already strained by climate change.

Researchers have shown that it is possible to reduce the energy costs of generative AI by using more renewable energy, implementing sustainable construction of data centers, and scheduling computation during certain times of the day (Saenko, 2024). These practices would require transparency and commitment from tech companies and advocacy from users and policymakers.

Human Labor

In the conversation surrounding GenAI, the text produced by chatbots is often presented as the result of machine intelligence only. However, journalists have shown that the work of many human laborers is essential to the text generated by ChatGPT and other chatbots. According to an investigative report, "Behind even the most impressive GenAI systems are people — huge numbers of people labeling data to train it and clarifying data when it gets confused" (Dzieza, 2023).

How do humans contribute to the work of generative AI?

  • Annotation: People label data gathered from the Internet: for example, they assign emotions to people's voices in video calls or on social media posts, or they categorize images of items such as clothing or food. These labels are used to train GenAI to recognize and assign categories to data.
  • Detecting toxic content: Similar to annotation, people identify and label "toxic" content from the Internet (including violent, disturbing, and harmful content). This is used to train the GenAI model to exclude such content from its generated text.
  • Reinforcement learning from human feedback (RLHF): People "converse" with a chatbot and rate its responses for qualities such as authenticity or helpfulness. Engineers use these ratings to train the GenAI tool to sound more "humanlike".

The human labor used to train generative AI tools is often outsourced to underpaid workers in the Global South. For instance, workers in Kenya were paid less than $2 an hour to label disturbing toxic content (Perrigo, 2023). Some academics refer to these practices as "digital neocolonialism": Western tech companies exploit the labor and natural resources (for example, minerals used in computer hardware) of nations in the Global South, further perpetuating the legacy of colonialism (Browne, 2023).

Copyright

Copyright is a type of intellectual property law that gives authors control over how their original works are used and shared. In “Generative AI is a Crisis for Copyright Law,” AI scholar Kate Crawford and technology law expert Jason Schultz (2024) argue that the rise of generative AI technologies necessitates a reassessment of copyright law. They argue that twenty-first century technology companies are using copyright law, which was developed in the eighteenth century, to “exploit all the works of human creativity that are digitized and online" (Crawford & Schultz, 2024).

Technology companies take billions of works of human creativity from the internet, compile them into data sets, and use the datasets to train generative AI tools. Artists and authors, including George R.R. Martin, Ta-Nehisi Coates, and Junot Diaz, have expressed sorrow and outrage that private companies are using their copyrighted works to “train AI systems, which can then create images and texts that replicate their artistic style” (Crawford & Schultz, 2024). However, as of January 16, 2024, no court has ruled that AI training infringes on copyright (Crawford & Schultz, 2024). 

The issues generative AI poses for copyright law are still being debated. Here are some recent updates: 

  • WIRED is following every copyright battle involving the AI industry and has created visualizations to monitor the progress of these cases. 
  • On July 31, 2024, the U.S. Copyright Office published Part 1 of its report Copyright and Artificial Intelligence, which covers the topic of digital replicas.

References

Bias 

Content adapted from University of California Irvine Libraries’ Generative AI and Information Literacy LibGuide written by Stacy Brinkman and April Urban.  

Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?" In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT '21), March 3–10, 2021, Virtual Event, Canada.  https://doi.org/10.1145/3442188.3445922

Glazko, Kate, Yusuf Mohammed, Ben Kosa, Venkatesh Potluri, and Jennifer Mankoff. “Identifying and Improving Disability Bias in GPT-Based Resume Screening.” In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24), June 03–06, 2024, Rio de Janeiro, Brazil. https://doi.org/10.1145/3630106.3658933

Omiye, Jesutofunmi A., Jenna C. Lester, Simon Spichak, Veronica Rotemberg, and Roxana Daneshjou. “Large Language Models Propagate Race-Based Medicine.” NPJ Digital Medicine 6, no. 1 (2023): 1–4. https://doi.org/10.1038/s41746-023-00939-z.

 

Environmental Impacts

Content adapted from University of California Irvine Libraries’ Generative AI and Information Literacy LibGuide written by Stacy Brinkman and April Urban.  

Berreby, D. (2024, February 6). As Use of A.I. Soars, so Does the Energy and Water It Requires. Yale E360. https://e360.yale.edu/features/artificial-intelligence-climate-energy-emissions 

Calma, J. (2023, October 10). The environmental impact of the AI revolution is starting to come into focus. The Verge. https://www.theverge.com/2023/10/10/23911059/ai-climate-impact-google-openai-chatgpt-energy 

Crawford, K. (2024). Generative AI’s environmental costs are soaring — and mostly secret. Nature, 693. https://doi.org/10.1038/d41586-024-00478-x 

Kimball, S. (2024, May 5). AI could drive a natural gas boom as power companies face surging electricity demand. CNBC. https://www.cnbc.com/2024/05/05/ai-could-drive-natural-gas-boom-as-utilities-face-surging-electric-demand.html

Saenko, K. (2024, February 20). A Computer Scientist Breaks Down Generative AI’s Hefty Carbon Footprint. Scientific American. https://www.scientificamerican.com/article/a-computer-scientist-breaks-down-generative-ais-hefty-carbon-footprint/ 

 

Human labor 

Content adapted from University of California Irvine Libraries’ Generative AI and Information Literacy LibGuide written by Stacy Brinkman and April Urban.  

Browne, G. (2023, May 25). AI is steeped in big tech’s ‘Digital colonialism.’ WIRED. https://www.wired.com/story/abeba-birhane-ai-datasets/ 

Dzieza, J. (2023, June 20). Inside the AI Factory: the humans that make tech seem human. The Verge. https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots 

Perrigo, B. (2023, January 18). Exclusive: OpenAI used Kenyan workers on less than $2 per hour to make ChatGPT less toxic. TIME. https://time.com/6247678/openai-chatgpt-kenya-workers/ 

 

Copyright

Crawford, K., & Schultz, J. (2024, January 16). Generative AI Is a Crisis for Copyright Law. Issues in Science and Technology. https://issues.org/generative-ai-copyright-law-crawford-schultz/ 

Knibbs, K. (2024, December 19). Every AI Copyright Lawsuit in the US, Visualized. WIRED. https://www.wired.com/story/ai-copyright-case-tracker/ 

United States Copyright Office. (2024). Copyright and Artificial Intelligence: Part 1 - Digital Replicas. https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-1-Digital-Replicas-Report.pdf