



My recent blog essay about the Physical Symbol System Hypothesis got me thinking that it would be fun to revisit more ideas from the history of AI and make a series out of it.
Even though I was poor and even though I left without my doctorate, I remember my years in graduate school with fondness. Teaching classes was a rewarding way to pay the rent, but the best part was the (mostly) open atmosphere of intellectual curiosity with which academics viewed and participated in the flow of research agendas.
At the time, most people around me were thinking a lot about neural networks — not so much in the sense of brain imitation but more as complicated combinations of continuous nonlinear functions. And in other ways as well, the very air was heavy with the humid scent of nonlinearity — chaos, complexity, evolution, and so on. It felt (to me at least) as if a cross-disciplinary version of AI was on the brink of some important gestalt understanding, and it was exciting.
In 1992, John Koza’s huge book Genetic Programming came out, and I remember devouring it avidly. Who wouldn’t be thrilled at the grand vision of evolving computer programs?
Looking back on it, I was interested in the properties of new classes of computational substrates. In the language I am using now to muse about these topics, a substrate is a formal structure in which computations occur. It includes not only a specification for the computations themselves but also the means for their creation, modification, and execution. So I was in a frenzy of trying to see beyond the habutual AI substrates — “hand coded declarative logic-ish databases,” mostly.
Ideally I’m interested in something more specific — substrates for models — but I don’t yet have a clear idea about how to express the general requirements for characterizing models (that’s the long-term purpose behind all of the musing I’m doing lately). I know that some sort of intentional relation is required, because models are the way the universe reflects itself. So instead of computations, the substrates are frameworks for descriptions of things, which are computations of a special type.
In the meantime, it is useful to think of substrates in a well-known way: function computation. From this perspective, a substrate:
Now, consider the case where some external entity can measure the “quality” of an expression based on how closely the function it computes matches up to some “desired” function. In the context of modelling, the idea will be that the expression should match up to the part of the world being modelled along some desired dimensions. But we can also think of this as a “supervised learning” situtation.
Given the ability to measure the quality of an expression as described above, we are interested in finding the best of all expressions with respect to the evaluation mechanism (or maybe we might be satisfied to find “good” expressions).
If you are not already familiar with these ideas, that might end up sounding like gibberish so I’ll make up a quick example to illustrate. Here’s a graph of 8 arbitrarily-chosen points. Suppose we would like to find the line that is the “best approximation” of these points (a common sort of statistics task). One such line is drawn for illustration amongst the points. To fit this into our framework, we need to define the substrate — that is, the space of possibile expressions. In this case, the expressions are equations of lines: y = Ax + B, where we can vary A and B to produce different expressions.
We also need to define the measurement (that is, define “best approximation”). So let’s try to minimize the mean squared distance between the line and the points. This is a nice example for visualization because the variations allowed are just two continuous parameters, which lets us use those parameters as the axes of a graph, with the “height” being the mean squared error for that particular expression. Have a look at the resulting graph (I used Mathematica to develop this example, in case you’re curious).
This kind of thing is often called a “fitness landscape”, using the word “fitness” instead of “quality” because of the analagous relationship to evolution. As usually expressed in this context, the evolution of living creatures operates on a substrate whose expressions are the possible genotypes of an organism (actually it’s more complicated that that but it will do as an approximation), and the “fitness” is something like the number of successful children the organism produces.
The optimal expression in my simple example case is the lowest point on the graph, which tells us the best values for A and B and thus the equation for the best line.
But: How do we find this optimum point on the fitness landscape?
In some simple cases (like this one, actually), it might be possible to use analytic means to construct this optimal expression using a bunch of extra domain knowledge (inverting the math of the fitness function itself). But in the general case all we have is the measurement, so we become interested in how to search through the space of possible expressions to find the best one.
Now, this is kind of cool! To find good expressions on a substrate (effective creatures, good models of the world, efficient automobiles, great novels), all we have to do is search around amongst the possibilities.
Usually this is easier said than done, unfortunately. Most search spaces are HUGE.
So, the obvious and simplest search techniques — picking expressions at random or enumerating them one by one in some straightforward way — may take a long time.
If the fitness landscape is nice and simple, like the above example, hill climbing is an effective way of searching. In general, though, fitness landscapes are not so friendly — they have false peaks, ridges, discontinuities, and features not so concisely described with facile geographic analogies. The general subject of search has been tied up with AI from the very beginning. Some people believe that the two things are basically just two ways of describing the same thing: for example, the AIXI formulation of theoretical AI essentially takes this viewpoint.
Getting back to Koza’s Genetic Programming, fitness landscapes for typical programming languages on typical programming problems are very ugly (by which I mean very uncorrelated in regions of interest). Programs are brittle — small changes made to high-fitness programs nearly always lead to disaster. The miracle is that it works at all!
In fact, it usually doesn’t work. Simulated evolution has not replaced or even significantly supplemented traditional software engineering methods.
Why does it work sometimes? That is a complicated question. The nature of the programming task, the details of the programming language, and the operators for editing programs figure heavily into the issue.
Let’s invent some shorthand and say that a substrate is evolvable with respect to a particular task if its fitness landscape has the desirable features: high local correlation, good continuity, and a relatively small number of unacceptable-quality false peaks. Evolvability is highly desirable if it’s important to be able to search for expressions. I care about this because I think it is very likely that search for good expressions is extremely important for intelligent systems (including systems engaged in automatic modelling), because empirical evidence strongly implies that complex problems resist analytical solutions.
Because of this, we will be well served by designing evolvable substrates — evolvable programming languages, evolvable network models, evolvable everything. Not that we actually want to build complex computer modelling systems by aping evolution, of course, but because more targeted and appropriate search methods will be helpful.
And that’s all I have to say.
Oh, wait, no it isn’t: living things are evolvable (duh, otherwise they wouldn’t have evolved). The biological substrate called “life” is exceptionally interesting. I believe that a series of reductionistically- impenetrable transformations in level are crucial to this process. By “reductionistically-impenetrable” I mean that the “higher-level” is neither simply describable in terms of the “lower level” entities, nor are the behavioral and structural characteristics of the “higher level” tractably predictible from the “lower level”. Consider:
I believe that the redescribability afforded by these impenetrable layers helps afford the opportunity for radical restructuring of fitness landscapes and thus the potential for evolvable organisms to arise from substrate modification operators that work on the genetic level.
Perhaps it will be fruitful to design evolvable modelling substrates that share this kind of level-decoupling in efforts to enhance the feasibility of search. Someday we’ll know!
Looking back on it, my 1992 excitement about reducing the mysteries of intelligence to a few key insights from nonlinear dynamics was naive. For one thing, those constellations of ideas — complexity, evolution, self-organized criticality, chaos, etc — turn out not to be very sharp scalpels for doing reductionist dissection. For another, intelligence itself appears to be a tougher problem than my youthful enthusiasm suspected.
No matter. Figuring out how to reflect itself is one of the advanced ongoing features of the universe; I’m satisfied to be a little piece trying pitch in, using whatever means make sense to me in the moment.
Oh, one more thing: Why is search so necessary to this enterprise? Wouldn’t it be much better to just construct a description instead of rooting around for one? Unfortunately, there appears (to me, today) to be no way to do such construction across the divide of “unconnected” levels of formalism. However, it is conceivable that there might be a “bottom-up” alternative, which perhaps I’ll write about in a future post.


More Options ...

Categories
Tag Cloud
Blog RSS
Comments RSS


Void (Default)
Life
Earth
Wind
Water
Fire
Lightweight