



As my final high-level post for a while on AGI, I’d like to give my opinions about “Friendly AI”.
For those who don’t know what that means, consider this: Suppose I were to have a series of conceptual breakthroughs and write a boatload of code and the result is an AGI that is “smarter” than me. Call it AGI-1. Next, suppose that it learned enough to make it more knowledgeable than me about AGI theory and about building software systems. Then, supposing that there is an AGI design significantly better than AGI-1 available within the capabilities of AGI-1, it should be able to build a smarter AGI, call it AGI-2. Then, if there is yet another better AGI design within the grasp of AGI-2, it could produce AGI-3. And so on. If this “recursive self-improvement” process proceeds rapidly (that is, if the development time for each cycle is small), the result would be an AGI system that is a LOT LOT smarter than me.
Alternatively, it could be that AGI-1 by itself might be a LOT LOT smarter than me.
Either way — one has to worry whether we could stop this superintelligent AGI from doing things we don’t want it to do. Like kill us, for example.
Certainly this idea is not obscure. The “Terminator” science fiction franchise (along with many other scifi stories) illustrates exactly this scenario. Friendly-AI cognoscenti tend to frown on that point because they disagree with the technical details and the nature of the future that results, but I think that’s irrelevant. The point is that the masses of humanity are quite aware of the potential danger. They don’t think it’s worth worrying about, because it’s a bizarre scifi thing.
My personal view is that this quite likely will be a real and possibly big problem someday. In fact, soon a very minor version of the problem will be getting a lot of attention. More and more, robotic systems will have the ability to harm people (because they will be more common and will have the ability to control more powerful and mobile physical devices). In the immediate future the issue will not be whether they become homicidal maniacs because they won’t be smart enough for that to even be possible — no way to even have the necessary concepts. First the issue will just be whether their software might be buggy in other more mundane ways. What process should we use to validate software that drives cars? How can we minimize the number of accidents caused by poor decisions in household cleaning robots?
I only mention this case because I think it will naturally bring the question (and the inevitable quips about Asimov’s three laws of robotics) into public discourse.
Now, although I think this will be a problem someday, I don’t think it’s going to be a problem soon. We simply aren’t that close to building an AGI with the appropriate level of smartness. It is possible that the solution will be simple and somebody will find it soon but it seems extremely unlikely to me. It’s true that I cannot draw a definite conclusion from the poor results produced by people who are trying very hard right now to build AGI, but it is relevant. For me to give any significant probability to this kind of scenario with multiple revolutionary inventions comprising many huge leaps of understanding, I need to see something that looks like progress in some direction.
Further, for this to be a problem with any sort of suddenness, the AGI would have to be astonishingly more intelligent than a human; able to quickly make multiple technical breakthroughs in many different fields and rapidly master every field of human expertise. To me, it is a huge stretch to posit that near-term commonly-available computer hardware will have the ability to host such a program.
Suppose I’m wrong. Suppose that some secret group somewhere is solving the problems — or more generally the time is almost right and parts of the puzzle start falling into place quickly. And suppose that it turns out that the core of intelligence is amazingly simple and hardware isn’t a limitation.
It would be desirable if this system were programmed with an “ethical system” that keeps it from harming us (or, better yet, from wanting to harm us) — no matter how many times it redesigns and improves itself, no matter how its code drifts and changes over the entire length of the indefinite future. Figuring out how that could work is the “Friendly AI” problem. Here’s a reference. It could turn out somehow that Friendliness is inherent in any superintelligence, but the fact that humans are not Friendly rather dashes that hope as far as I’m concerned.
It becomes even trickier because solving the problem isn’t enough. The first successful AGI project has to correctly include the solution in the implementation. And, all subsequent AGI projects also have to do so. Even though it seems like a good idea, how can we guarantee that? Or, lacking a guarantee, how can we at least make it very likely?
This is all very scary and unreal-sounding. It is not comfortable to think that the two above bulleted scenarios are the only ways to secure ourselves from extinction in the near future. And, as I said before, I don’t personally think superpowerful AGI will arise soon. And, even more important, I don’t think it will arise suddenly, which means that if the draconian measures listed above cannot be decided on, we might nevertheless survive a slower rise of AGI.
It might turn out that “superintelligence” is impossible… that, for some reason, no AGI can be very much smarter than the human race as a whole. If that’s true we probably don’t need to worry that much about it. But I don’t see a good reason to think that such an intelligence cap exists, and it certainly doesn’t seem prudent to bet on it being true.
So, no matter what, it seems as if solving the Friendliness problem is a good idea, and the sooner it can be solved the better. So far, not much progress is apparently being made, although few people if any are actually working hard on the problem.
The ardent believers in the near-future hard-takeoff scenario come across as rather fanatical and alarmist;
to be fair, if they are right then I guess they should be. I’m surprised that they don’t seem to be working very hard on an actual solution beyond just “awareness raising”. As I noted before, the public is quite aware of the issue. The only thing the public needs to be shown is evidence of imminence, but no such case is ever attempted.
I am completely perplexed that the believers don’t have some sort of forum for discussing approaches to solving the Friendliness problem, especially technical issues and underlying concepts.
I have thought about starting such a forum myself, but it’s a lot of work to attempt serious community building, and there is no guarantee of success, especially given the idiosyncratic nuttiness of the interested parties and the apparent intractability of the problem.
Still, I might create such a forum just to see what happens.
Beyond that, though, since I’m out of the AGI game, there’s no need to worry about whether I’ll destroy the planet by letting loose a rogue UnFriendly AGI. I’m not even thinking about AGI.
I have some more tangible things in mind.










More Options ...

Categories
Tag Cloud
Blog RSS
Comments RSS


Void (Default)
Life
Earth
Wind
Water
Fire
Lightweight
11:31 pm - November 21st, 2008
“Terminator” is a wrong example not for obscure technical reasons, but because it portrays seemingly never-ending war with prospects of winning. Reality seems much more fatal: no war, no hope, just death.
11:51 pm - November 21st, 2008
Thanks for the comment Vladimir! Your point is what I meant by “… the nature of the future that results.”
11:58 pm - November 21st, 2008
Knowability of FAI isn’t so much about Friendly AI, current introduction to Friendly AI is Artificial Intelligence as a Positive and Negative Factor in Global Risk
.
It can’t somehow turn out that any intelligence is Friendly. It’s a solved problem: you can have an optimizer that goes either way, producing paperclips or recursive teddy bears.
Evidence like what? End of the world is a futile lesson to learn.
SL4 list was such place (but you are aware of that, so what’s your question?), the problem is that this is a difficult technical question, very few people were able to even really understand the problem, much less contribute to the solution. Keep in mind that FAI a kind of AGI, problem people have been failing miserably for some time now, but in this form it’s only harder.
1:22 am - November 22nd, 2008
Yeah, I couldn’t figure out what to reference exactly. Folks can follow your reference if they want, I think that paper is completely dreadful. [edited to add: on further reflection, although I think the paper is dreadful because what I consider its central points are supported only by a couple of specious surface analogies, you're right that it is a better introduction to the topic than the document I linked to]
I suppose it is possible that unique among all technological achievements in history, AGI will appear fully-formed with no useful intermediate step. To most people that sounds pretty dumb though.
SL4 is a list devoted to generally-shocking futurist issues. I’ll take your word for it that it’s the preferred venue for technical discussion of Friendliness theory. [IMO if the topic is as serious as its proponents think, some sort of non-wacko-infested place for its development does not seem unreasonable.]
[at any rate, I just wanted to mention Friendliness before abandoning it along with the rest of the fringe AGI-related elements]