Safe Predictive Agents with Joint Scoring Rules
It's time to talk about what I've been working on
So far, this blog has mostly covered my miscellaneous thoughts related to AI alignment, but it’s also what I work on full time.
What am I researching, you might ask? Well, in an astonishing coincidence, I happen to have just finished written a post summarizing exactly that! You can read it by clicking here to follow a link to the Alignment Forum. I’m hosting it there, rather than Substack, due to better integration with mathematical notation.
I’ve previous made the case for predictive models as an area of research, and my recent work makes progress in that area. It shows how the incentive for performative prediction, the use of predictions to influence their own outcome, can be avoided in a setup with multiple agents. If you have questions or comments, you can leave them over there, or on this post here.
Again, the post is hosted here, on the Alignment Forum.
Is there a typo here? "predictive models are potentially useful enough to be used [to?] take a pivotal act, easier to align than general agents, and coming anyway."