Safe Predictive Agents with Joint Scoring Rules

It's time to talk about what I've been working on

Oct 10, 2024

So far, this blog has mostly covered my miscellaneous thoughts related to AI alignment, but it’s also what I work on full time.

What am I researching, you might ask? Well, in an astonishing coincidence, I happen to have just finished written a post summarizing exactly that! You can read it by clicking here to follow a link to the Alignment Forum. I’m hosting it there, rather than Substack, due to better integration with mathematical notation.

I’ve previous made the case for predictive models as an area of research, and my recent work makes progress in that area. It shows how the incentive for performative prediction, the use of predictions to influence their own outcome, can be avoided in a setup with multiple agents. If you have questions or comments, you can leave them over there, or on this post here.

Again, the post is hosted here, on the Alignment Forum.

Two AI systems represented as futuristic machines or robots, positioned on opposite sides of the image, facing each other. Each AI system has holographic displays and screens showing complex data and predictions, such as graphs, charts, and mathematical symbols. In the center, there’s a shared digital space where their competing predictions are compared, with a glowing central display highlighting the winning predictions. The background should have a high-tech, digital atmosphere with a mix of blue, purple, and neon colors, emphasizing the competitive nature of the interaction.

Crossing the Rubicon

Discussion about this post