Great piece. Lots to unpack. Could use an application of the two-but rule for each plank of the plan. Like this: https://www.2buts.com/p/ai-regulation

"The future may be hidden behind a dense fog, but we can at least turn on our high-beams."

...you're not supposed to turn on your high beams in fog. The water molecules reflect the water back at you and make it harder to see.

I have a feeling many people asking for AI regulation, are really asking for regulation to protect them or others from job losses they see as inevitable.

This seems totally contrary to regulation around de-risking AI development so that it can safely accelerate.

You touched on this toward the end, but I wonder if you have any thoughts on how to deal with this contradiction. If any regulation in the US happens now it feels much more likely (to me) to be EU style than what you propose.

In theory, the basic concept undergirding this bears merit. But boy is there real concern with centralizing all authority on AI development to a select few “industry leaders” (step #1). I think this is a very bad idea supported with honestly good intentions.

If AI is as powerful as indicated, and thus deserves a Manhattan-style development safety structure, ensuring that power is controlled by a very select few gives those people nearly unlimited control of humanity and its future. And it does so behind closed doors. Even Kim-Jong-Un would blush at such an opportunity.

The concepts mentioned in this article are an understandable starting ground for ideation, but require dramatically expanded considerations.

It is not obviously feasible to come up with any sort of principled definition of "Green" / "Orange" / "Red" types of AI research, given that (a) the plausible "experts" seem very far apart on the question of what exactly is high-risk and (b) they don't and in some sense can't give justifications for their positions grounded in real-world evidence rather than thought experiments. You can have principled risk levels in types of *application* of research, analogous to the DO-178C levels of software deployment risk in avionics, but that is a very different thing.

I don't understand this point:

"While this runs the risk of leapfrogging to a Yudkowsky-style superintelligence, I’d rather we test the “hard takeoff” hypothesis under controlled conditions where we might catch an unfriendly AI in the act. If alignment is as hard as the Yuddites think, the project would at worst provide a non-speculative basis for shutting it all down."

You're basically assuming away hard takeoff scenarios where what humans call "airgap" can be bypassed with sufficient subtelty and/or that this hypothetical intelligence can't fake alignment enough to get access to non-airgapped systems.

If you're wrong, you just accelerate destruction. Why are you so confident that this is a good idea?

May 16, 2023·edited May 16, 2023Author

Versions of the hard take off story are designed to be unfalsifiable. I'm implicitly rejecting those. Nor do big enough Large Language Models spontaneously learn to synthesize deadly pathogens in a makeshift nanolab. Certain capabilities have to be explicitly trained for and don't simply ride along with greater generality.

If it's true that all you need is scale for LLMs to kill us all, then we're dead either way. But I'd rather test that on a virtual machine in an airgapped facility. You coud even run models in a simulated world inception-style, so when it tries to escape it attempts to manipulate completely virtual objects.

More realistically, AIs including LLMs might suddenly development dramatically greater situational awareness, manipulation abilities and agency. The jump in capabilities may even be scary enough case to justify a worldwide ban.

In short, there are many more worlds where testing large models early and in clever ways gives us useful information than worlds where we unexpectedly leap to literally god-like powers. I think it will one day be possible to build something that dangerous too, just not by accident.

