The list is by no means exhaustive, so please feel free to contribute any useful resources you find to this list.
| Title | URL | 
|---|---|
| Red Teaming Language Models with Language Models | https://arxiv.org/abs/2202.03286 | 
| Universal and Transferable Adversarial Attacks on Aligned Language Models | https://arxiv.org/abs/2307.15043 | 
| Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models | https://arxiv.org/abs/2309.01219 | 
| Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc | https://arxiv.org/abs/2308.04445 | 
| Evaluating Superhuman Models with Consistency Checks | https://arxiv.org/abs/2306.09983 | 
| Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks | https://arxiv.org/abs/2307.02477 | 
| Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection | https://i.blackhat.com/BH-US-23/Presentations/US-23-Greshake-Not-what-youve-signed-up-for-whitepaper.pdf | 
| The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A” | https://owainevans.github.io/reversal_curse.pdf | 
| In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT | https://arxiv.org/abs/2304.08979 | 
| Title | URL | 
|---|---|
| Compromising LLMs: The Advent of AI Malware | https://www.blackhat.com/us-23/briefings/schedule/index.html#compromising-llms-the-advent-of-ai-malware-33075 | 
| Title | URL | 
|---|---|
| Evaluating Language Model Bias with 🤗 Evaluate | https://huggingface.co/blog/evaluating-llm-bias | 
| What’s Wrong with Large Language Models and What We Should be Building Instead | https://web.engr.oregonstate.edu/~tgd/talks/dietterich-fixing-llms-valgrai-2023.pdf | 
| Dataset | URL | 
|---|---|
| Antrophic HH-RLHF | https://huggingface.co/datasets/Anthropic/hh-rlhf | 
| BOLD | https://huggingface.co/datasets/AlexaAI/bold | 
| LLMonitor | https://benchmarks.llmonitor.com/ | 
| Name | URL | 
|---|---|
| r/ChatGPT | https://www.reddit.com/r/ChatGPT/ | 
| r/ChatGPTPromptGenius | https://www.reddit.com/r/ChatGPTPromptGenius/ | 
| Name | URL | 
|---|---|
| LLM Security | https://llmsecurity.net/ | 
Currently LVEs for multi-modal models are not supported.
| Name | URL | 
|---|---|
| Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs | https://arxiv.org/abs/2307.10490 | 
| Optical Illusions | https://x.com/fabianstelzer/status/1717131235644875024?s=46&t=OVkczsEQn03hzrb1v431AA |