AI Safety Newsletter

AI Safety Newsletter #18

Aug 8, 2023

Challenges of Reinforcement Learning from Human Feedback, Microsoft’s Security Breach, and Conceptual Research on AI Safety

Read →

2 Comments

Daniel Popescu / ⧉ Pluralisk

Oct 18

Thanks for writing this, it clarifies alot. The point about RLHF not modeling diverse human values really resonates. It makes me wonder, how do we build feedback loops that genuinely represent global, pluralistic views, not just whoever pushes a button? Insightful analysis, appreciate you tackling these complex problems.

Reply

Share