AI Safety Newsletter #39: Implications of a…

Jul 29, 2024

Plus, Safety Engineering Overview

Read →

2 Comments

Tzu Kit Chan

Oct 22, 2024

thanks for writing this up Andrew, Alexa!

Expand full comment

The Humans In The Loop

Aug 13, 2024

Interesting take on open-weight models from these candidates, but doesn't seem like it's the whole picture--most off-the-shelf open-weight models seem like they'd come with the pre-set guardrails that an OpenAI or Anthropic would place in them, including whatever perceived "bias" might be added to them by the publisher.

We've heard some hardline open-source enthusiasts advocate for open-SOURCE models, which either come with no guardrails (or from which you can remove guardrails) to realize the full potential of innovation, not stifled by a potentially arbitrary choice from a publisher's in-house team, but that's another issue entirely. For example, there was a new tide of "jailbroken" open-source models back in February that were apparently very effective at writing phishing emails.

Expand full comment