«Worlds governed by artificial intelligence
often learned a hard lesson:
Logic Doesn’t Care.»
Yin-man Wei, «This Present Darkness: A History of the Interregnum», CY 11956 in Andromeda
One topic last semester was AI. It’s — again — a trending topic (at least in Germany). And yeah, the benefits seem overwhelming. After all, if we are living our lives digitally, if we log what we do and leave digital traces no oppressive government would ever dream of, then there is lots and lots of data for AI to work with.
One problem in many applications is explainabilty. After all, AI isn’t a god (not yet, anyway). You can hardly point to the computer and say: “I do x because that algorithm told me so.”. Just image the algorithm decides whether you get a job, or even only the job interview, or that credit.
But I wonder whether the quest to make AI explain itself is the right track. After all, this is something most humans suck at. Sure, we have reasons why we did something. But often these reasons are only post-hoc rationalizations. Haidt’s elephant and rider.
In contrast, what about an AI that makes predictions of what happens when it does different things and just evaluates these predictions according to overall goals? It would be similar to humans. After all, when we walk through the street and see a banana peel, we don’t think “Dang, I’m gonna slip soon.“, we see the different possibilities and consequences of where we put our foot down and act accordingly.
This process wouldn’t require you to have a perfect model of the world, just one that works well enough (heuristics anyone?).
Even better, this process would allow you to see the different conclusions the AI arrives at. Perhaps even parts of the thought process. It would also allow for a step between the favored prediction of the AI and the actual action, a step in which the human can veto the action if needed.
Wouldn’t that be a better way to evaluate artificial intelligence?
Just a thought.
Note: Some edits incl. in the title.