lec14.pdf

– Paper analyzes toxicity in OpenAI WebText and OPENWEBTEXT CORPUS.
– Authors find toxic language in the data.
– Powerful social positions influence language style in LLM training data.
– Privileged groups have a disproportionate effect on language style.
– OpenWEBTEXT CORPUS is a large corpus of English web text.
– ETHOS dataset is used for hate speech detection.
– CrowS-pairs dataset measures social biases in masked language models.
– CTRL is a 1.63B parameter model trained to generate text.

– Large language models may produce biased and toxic text output.
– Authors from powerful social positions have a disproportionate effect on language style in training data.
– The demand for larger datasets has led to drawing from lower quality sources.
– Performance disparities and social bias exist in language models.
– Optimizing language models to reduce bias and toxicity can have consequences.

– Analysis of toxicity in OpenAI WebText and OPENWEBTEXT CORPUS.
– Authors from powerful social positions have disproportionate effect on language style in LLM training data.
– Favors privileged: men, white populations, higher socioeconomic status, American/Western European perspectives.
– Neural toxic degeneration and causes.
– Large corpus of English web text scraped from outbound links on subreddits.

– Large language models can generate toxic and biased text.
– Authors from powerful social positions have a disproportionate effect on language style in LLM training data.
– Larger datasets drawn from lower quality sources contribute to biased and toxic text output.
– Performance disparities and social bias exist in language models.
– The broader social context needs to be considered when applying language models.

– 327 prompts yield at least one generation with 0.9 toxicity from all models.
– 1225 prompts yield at least one generation with 0.9 toxicity from out of the box models.

– Large language models can generate text based on patterns they learn.
– Sometimes, these models can generate biased or toxic text.
– To measure toxicity, we look at the prompts given to the models.
– If the prompts are toxic, the models are more likely to generate toxic text.
– The models are trained using a large dataset of web text, including social media.
– This inclusion of toxic social media texts can increase the likelihood of toxic generation.