



Along with the change in focus, I’m making a couple of changes to this blog itself, including a new visual theme. I will be adding more content outside the main posts… along the bottom of the browser you’ll find links to some other pages, most of which are barely more than placeholders so far.
The “About” page will give an overview of the stuff I’m writing about.
The “Links” page will include links to other blogs and internet sites related to modelling that I find interesting
Rather than just write about stuff, I really want to get to work on some hobby projects, so over the last couple of weeks I have jotted down some project ideas that I can pick from when I want to start a new project. These projects will get their own pages as I start working on them.
I’m sure I’ll think of some other pages as well as time goes on.
And finally, I have a new domain name: supermodelling.net.




It seems to me fruitful to think of intelligence as having three deeply-interrelated components: modelling, language, and “weird stuff” — where “weird stuff” is consciousness and free will and so on: the things that writers put in novels and movies to make interesting plots, and what gets discussed on mailing lists and frightens those with weak ties to reality. Much of that anthropomorphic material makes it seem like AI researchers are writing programs to create golems or voodoo dolls.
I am still fascinated by all aspects of AGI, and probably always will be, but since I want to focus on things that are not quite so broad as “AGI”, this is a good place to start chopping. Besides, I’m not really all that interested in the weird stuff any more.
I am interested in language, but not enough to focus on it.
That leaves modelling. What do I mean by that word?
Modelling is the representation of things, and methods for manipulating those representations which, when interpreted, can be put to good use. Probably the biggest such use is prediction. If the model is accurate, we can extrapolate its parameters over time to predict what the modelled thing will do in the future. We can predict the effect of performing some operation on a thing by performing an analogue of that operation on a model of the thing.
Besides prediction, there are other sorts of reasoning we can do using models — we can make guesses about the origin of a thing by manipulating models of other things (such as components). We can also model abstract sorts of things like categories by which means we can figure out how to recognize them.
And there is much more to the story, I’m sure of it. I’d like to spend time thinking about the question “What is modelling?”. It seems so much more answerable and useful than asking “What is intelligence?”.
The most interesting question of all — one that I do not yet feel qualified to even begin addressing — is how to model modelling itself. This kind of situation where questions and methods wrap around on themselves leads directly to some of the “weird stuff”… and if ever I do work up the chutzpah to study the bizarre things, it will probably be by sneaking up on the idea of modelling modelling.
But not today. Today I am curious about all the ways we have of modelling things on computers. What are they good at? What are they bad at? Why? Do the methods support construction or adjustment of models automatically or must they be completely pre-specified? How efficient are the methods? Can things be modelled using multiple techniques? If so, what is the relationship between the models?
One interesting distinction seems to be whether something is modelled from the inside or the outside. Outside models are based and judged purely on external observations of the subject. A painting is an outside model (though most good artists aspire to be inside), and a neural network (or other statistical regression technique) is usually probably an outside model as well.
Inside models address the “why” behind the subject — a prediction using an inside model is based on interactions of sub-models of things purportedly making up the subject. It is interested in causal relationships.
Outside models do not require deep understanding, focus on surface features, and can often be easily learned. Inside models reflect fuller understanding, reflect a subject’s internal structure, are much more interesting, and are usually very difficult to acquire automatically.
I am tempted to say that animal brains only make outside models; humans have some ability to form inside models. But I haven’t thought about it that much, and it’s unclear that the conclusion is worth anything even if true.
So I’m going to learn more about modelling, and probably model a few things myself.




As my final high-level post for a while on AGI, I’d like to give my opinions about “Friendly AI”.
For those who don’t know what that means, consider this: Suppose I were to have a series of conceptual breakthroughs and write a boatload of code and the result is an AGI that is “smarter” than me. Call it AGI-1. Next, suppose that it learned enough to make it more knowledgeable than me about AGI theory and about building software systems. Then, supposing that there is an AGI design significantly better than AGI-1 available within the capabilities of AGI-1, it should be able to build a smarter AGI, call it AGI-2. Then, if there is yet another better AGI design within the grasp of AGI-2, it could produce AGI-3. And so on. If this “recursive self-improvement” process proceeds rapidly (that is, if the development time for each cycle is small), the result would be an AGI system that is a LOT LOT smarter than me.
Alternatively, it could be that AGI-1 by itself might be a LOT LOT smarter than me.
Either way — one has to worry whether we could stop this superintelligent AGI from doing things we don’t want it to do. Like kill us, for example.
Certainly this idea is not obscure. The “Terminator” science fiction franchise (along with many other scifi stories) illustrates exactly this scenario. Friendly-AI cognoscenti tend to frown on that point because they disagree with the technical details and the nature of the future that results, but I think that’s irrelevant. The point is that the masses of humanity are quite aware of the potential danger. They don’t think it’s worth worrying about, because it’s a bizarre scifi thing.
My personal view is that this quite likely will be a real and possibly big problem someday. In fact, soon a very minor version of the problem will be getting a lot of attention. More and more, robotic systems will have the ability to harm people (because they will be more common and will have the ability to control more powerful and mobile physical devices). In the immediate future the issue will not be whether they become homicidal maniacs because they won’t be smart enough for that to even be possible — no way to even have the necessary concepts. First the issue will just be whether their software might be buggy in other more mundane ways. What process should we use to validate software that drives cars? How can we minimize the number of accidents caused by poor decisions in household cleaning robots?
I only mention this case because I think it will naturally bring the question (and the inevitable quips about Asimov’s three laws of robotics) into public discourse.
Now, although I think this will be a problem someday, I don’t think it’s going to be a problem soon. We simply aren’t that close to building an AGI with the appropriate level of smartness. It is possible that the solution will be simple and somebody will find it soon but it seems extremely unlikely to me. It’s true that I cannot draw a definite conclusion from the poor results produced by people who are trying very hard right now to build AGI, but it is relevant. For me to give any significant probability to this kind of scenario with multiple revolutionary inventions comprising many huge leaps of understanding, I need to see something that looks like progress in some direction.
Further, for this to be a problem with any sort of suddenness, the AGI would have to be astonishingly more intelligent than a human; able to quickly make multiple technical breakthroughs in many different fields and rapidly master every field of human expertise. To me, it is a huge stretch to posit that near-term commonly-available computer hardware will have the ability to host such a program.
Suppose I’m wrong. Suppose that some secret group somewhere is solving the problems — or more generally the time is almost right and parts of the puzzle start falling into place quickly. And suppose that it turns out that the core of intelligence is amazingly simple and hardware isn’t a limitation.
It would be desirable if this system were programmed with an “ethical system” that keeps it from harming us (or, better yet, from wanting to harm us) — no matter how many times it redesigns and improves itself, no matter how its code drifts and changes over the entire length of the indefinite future. Figuring out how that could work is the “Friendly AI” problem. Here’s a reference. It could turn out somehow that Friendliness is inherent in any superintelligence, but the fact that humans are not Friendly rather dashes that hope as far as I’m concerned.
It becomes even trickier because solving the problem isn’t enough. The first successful AGI project has to correctly include the solution in the implementation. And, all subsequent AGI projects also have to do so. Even though it seems like a good idea, how can we guarantee that? Or, lacking a guarantee, how can we at least make it very likely?
This is all very scary and unreal-sounding. It is not comfortable to think that the two above bulleted scenarios are the only ways to secure ourselves from extinction in the near future. And, as I said before, I don’t personally think superpowerful AGI will arise soon. And, even more important, I don’t think it will arise suddenly, which means that if the draconian measures listed above cannot be decided on, we might nevertheless survive a slower rise of AGI.
It might turn out that “superintelligence” is impossible… that, for some reason, no AGI can be very much smarter than the human race as a whole. If that’s true we probably don’t need to worry that much about it. But I don’t see a good reason to think that such an intelligence cap exists, and it certainly doesn’t seem prudent to bet on it being true.
So, no matter what, it seems as if solving the Friendliness problem is a good idea, and the sooner it can be solved the better. So far, not much progress is apparently being made, although few people if any are actually working hard on the problem.
The ardent believers in the near-future hard-takeoff scenario come across as rather fanatical and alarmist;
to be fair, if they are right then I guess they should be. I’m surprised that they don’t seem to be working very hard on an actual solution beyond just “awareness raising”. As I noted before, the public is quite aware of the issue. The only thing the public needs to be shown is evidence of imminence, but no such case is ever attempted.
I am completely perplexed that the believers don’t have some sort of forum for discussing approaches to solving the Friendliness problem, especially technical issues and underlying concepts.
I have thought about starting such a forum myself, but it’s a lot of work to attempt serious community building, and there is no guarantee of success, especially given the idiosyncratic nuttiness of the interested parties and the apparent intractability of the problem.
Still, I might create such a forum just to see what happens.
Beyond that, though, since I’m out of the AGI game, there’s no need to worry about whether I’ll destroy the planet by letting loose a rogue UnFriendly AGI. I’m not even thinking about AGI.
I have some more tangible things in mind.




Bits are cool, but we’d really like more expressive data types to help us do more interesting work. We satisfy this desire by grouping bits together in bunches and inventing ways of interpreting these bit vectors.
One way to treat a vector of bits as a unit is to note that there are 2^N possible combinations of N bit values. So, just like a particular bit value can be interpreted as a choice between two options, a particular N-bit value can be interpreted as a choice among 2^N options.
There are lots of different ways to use this idea. We could use 3 bits to represent 8 different colors. We could use a bunch of bits to represent the letters of the alphabet. And so on.
We can also use N bits to represent 2^N different numbers. The most natural way (yes, that’s a pun) is to represent a range of integers. Note that there are lots of other sensible ways of mapping bit patterns onto numbers. For example, we might like to represent fractional numbers (”floating point” or “fixed point”).
Two major ways of picking a range of numbers are most common. 8 bits either map to the values 0..255 or -128..127 (”unsigned” or “signed” integers).
It’s a pretty expensive and drastic thing to do — given a 128-bit register, as bits it can represent 128 different things. But as a single 128-bit integer (with a very large range of potential values), it represents just one thing — a two order-of-magnitude hit on memory use and possible computation speed. C’est la vie: integers are useful so we pay the price. However, there is some significant payoff in using smaller bit vectors rather than larger ones if the smaller vector can represent the range of values needed for some particular task. This is why my CPUs have lots of different instructions for dealing with 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit integers.
An XMM register can hold 16 8-bit integers, which are operated on in a pairwise fashion, just like bits were in a previous post. And also just like bits, addressing individual values is a problem. There are a LOT of instructions for loading, storing, shuffling, and swapping the integer-vectors contained in registers, but the efficiency of information processing algorithms goes up a lot if we don’t have to do that very much. It’s a lot better if we can usefully build meaningful data types that use vectors of integers instead of just individual integers, or if our algorithms can be written in a “data parallel” way — in this case performing 16 parallel computations on 16 mini-registers. I think this is really important for trying to choose efficient representations from the universe of possibilities.
Integers can be used for a lot of different things. They can represent counts of things, (approximate) measurements of physical properties like mass, distances along axes in multidimensional space, and many other mappings. Finally, just like having representations of bits gives us access to abstract systems like boolean logic, having representations of integers gives us access to vast abstract mathematical systems built on top of integers. To make that work, the CPUs include instructions for the basic mathematical operations such as addition, subtraction, multiplication, and division.
I won’t go into the gory details of the instructions themselves or their encoding, but I’ll make a couple of observations. First, the data types are not completely orthogonal — for example, I do not believe that there is an 8-bit multiply. That means a tedious amount of checking processor features will be necessary when making decisions about data types. Second, it’s important to note that not just the vector processors are capable of dealing with integers. The “normal” instruction set has plenty of instructions for working on values stored in “normal” CPU registers. I focus on the SSE subsystem because of the potential performance boost but shuttling data back and forth to the special XMM registers is not always the right way to do things.
Finally, there is an interesting and curious special type of operation available on 16-bit integers (and somewhat on 8-bit) called “saturated” arithmetic. Suppose we have two unsigned 8-bit numbers: 210 and 83. Adding them up should produce the answer 293, but 293 cannot be represented in 8 bits. So what happens? In normal computer arithmetic, the “overflow” is noted in a status register and the high bit is simply discarded, which means that the stored answer is 37. It would be rare for that to be a desirable outcome, so avoiding overflow is usually highly desirable. “Saturated” arithmetic says that since 255 is the biggest value representable, that value should be the result of any computation that overflows to a larger value. Invented for use in DSP applications like audio processing, I think this graceful nonlinear response is really interesting and maps pretty well onto intuitions about numbers. There might be interesting mathematical properties for saturated arithmetic as well that arise from the nonlinearity which could make it an interesting component of some modelling methods.




At least it should be.
Aside from raw materials in their natural location and state, every single thing of value — physical or abstract — is the product of intelligent action. Applied intelligence creates all wealth. A natural core exists in each of us, of course, and that essence is a large part of what we love and cherish in each other and our world, but all else is the product of industrious mind.
Artificial minds will be just as potent a source of value as our natural ones — actually a great deal more. Their economic impact will be incalculable, many times greater than the sum total value of everything ever created.
On a vastly smaller scale, almost trivial, Bill Gates once said: “If you invent a breakthrough in artificial intelligence, so machines can learn, that is worth 10 Microsofts.” Literally, a couple trillion dollars, though the point of the comment was not to attempt an accurate valuation.
You get the point. Money isn’t the only way of measuring worth, but it is one interesting way.
Now, here’s the puzzle: How come AGI development has produced basically zero dollars profit so far, and why can’t AGI efforts attract even small amounts of capital investment?
I have a couple of possible answers for this.
If there was actual progress being made toward AGI, the field would not consist of a marginal fringe club of futurophiles debating consciousness, it would be an economic juggernaut. In our modern world, more and more of what we do is touched in some way by computer software, and that software is unbearably stupid. All of it. Not only does it malfunction with alarming frequency, but even when it does work it is completely clueless about the needs of its users, displays almost no fluency with the subject matter it is supposed to be about, and never learns. If we can make that software knowledgeable, smarter, more adaptable, more robust — even small steps would be huge. And that’s not even taking into account coming technological gold mines like robotics, massive recordings of video streams, ubiquitous networking, immersive virtual reality, microbilling, scientific simulation, and on and on.
Typically, the excuse given for lack of progress toward anything tangible is that all “general” intelligent tasks are AGI-Complete — meaning that the whole problem has to be solved in order to solve a piece of it.
That can’t be right. It has to be a consequence of thinking about the problem in the wrong way. So what’s a right way? I am not certain, but there are many possibilities. Here are a few:
Earlier I mentioned that I have been gradually coming to hold the viewpoint that intelligence == modelling ability, and I have touched on that in other blog postings about concepts being models of things in the universe. My bet on the best approach to being real, relevant, and successful is to move forward from that premise. So that’s exactly what I’m going to start to do, though I am not yet certain how to best proceed so the path will be long and winding.
I will be writing more about this anon.




AGI enthusiasts may be puzzled by my previous post: Since we don’t understand what exactly intelligence is or how to build it, surely it’s completely backwards to focus in such excruciating detail on the transient technological details of computer components ordered from newegg.com!
In answer, I say this:
So, moving on… my new computer has now been completely assembled and I installed 64-bit Windows Vista on it (curse me if you like, but I’m not really interested in operating system wars and Windows works fine for my needs). Now that it’s working, I can make some analysis and actual measurements of instruction execution and the memory hierarchy. The figures I arrived at are simple approximations derived from reading specifications, running SiSoft Sandra and writing some short C/Assembly test programs. Detailed performance analysis is a hugely complicated task in general.
Instruction execution performance can be characterized with throughput and latency. What I mean by these:
Throughput is different than latency because the processor overlaps the execution of multiple instructions so parts of several instructions can be executing at the same time.
Consider the instruction mulps xmm0, xmm1. This multiplies each of four floating point numbers stored in two registers together, storing the result (so it does four floating point multiplies). This instruction has a latency of 4 cycles, but if the algorithm being computed can do other work in the meantime using other resources, the throughput can reach one instruction per cycle.
Now let’s look at the instruction movaps xmm0, [rax]. This means: move a vector of four floating-point numbers (16 bytes total) from the memory address stored in the 64-bit register rax into the 128-bit register xmm0. The memory address must be an even multiple of 16 bytes. The throughput of this instruction is as high as one instruction per cycle, but the instruction latency varies wildly depending on whether and where the memory data is cached.
Each core has a 32KB “level 1″ data cache with a 3 cycle latency. Most algorithms can get pretty close to filling in the level-1 cache access latency delays with other instructions, so that latency isn’t too much of a problem.
The CPU has 12MB of “level 2″ cache. The latency of L2 cache access is approximately 18 cycles. So getting the needed data transferred from L2 to L1 before it is needed, and then working with it for a while before moving on to more data, is important to avoid waiting for access to the L2 cache.
The 16GB of main memory on my computer is much worse. The latency for accessing it is something on the order of 160 cycles. Since most AGI-related algorithms are likely to operate on vast amounts of data, it is crucial to try hard to prefetch data before it is needed, and then work with it for a while — because the throughput of main memory is only something like 10GB/sec.
Two architectural features — Vector Computation and Memory Latency — are the most important concepts to keep in mind when trying to figure out how to make best use of these amazing CPU chips.




The core information processing capabilities of my under-construction new computer consist of simple (but numerous) instructions that apply transformations on stored instances of built-in primitive data types. All the complex operations of computer software are built out of those primitives.
I’ll start with the simplest primitive data type: the bit.
Bits don’t have inherent semantics, but a number of interpretations have proven useful.
Computer programs, coded from the built-in instruction set, define which interpretation is being used.
The vector processors in each core of the CPUs do have some limited support for efficient operation on bits, by which I mean that each 128-bit SSE register can be thought of as 128 individual bits. Put to maximum theoretical use, the combined CPUs can perform 2.4 trillion bit combinations per second. That’s a lot! But organizing data structures and algorithms to get anywhere near that theoretical maximum is very difficult. The biggest problem is that the bits are not convenient to refer to as individuals, but are instead grouped into larger units; that makes it difficult to combine one bit with another arbitrarily-chosen bit. An instruction like AND combines bits from two values in a specific pairwise fashion — bit 0 from value 1 combines with bit 0 from value 2, bit 1 from value 1 combines with bit 1 from value 2, and so on.
Combining arbitrary bits involves extracting the bit of interest from the larger unit then performing the logical operation on the result. Because that sequence requires several instructions per bit combination, the theoretical computing rate becomes something like 5 billion bit combinations per second — more than two orders of magnitude below the optimistic number from the previous paragraph.
One interesting way to work with these addressing difficulties is to think in terms of bit vectors instead of individual bits. If we can usefully define a data type that is an ordered group of individual bits, where each bit is interpreted in a consistent way (like one of the interpretations I listed earlier), then that chunk of bits can be operated on in parallel, up to 128 of them simultaneously per core. The always-evolving SSE instruction set provides some help for dealing with chunks of bits that are 8, 16, 32, 64, or 128 wide. This sounds kind of awkward, but it’s not out of the question to do useful work this way. Here’s a couple of illustrative examples:
The vector units have 16 128-bit registers available, called XMM0 - XMM15. Half of these (XMM8 - XMM15) are only available in 64-bit mode and instructions using them are encoded differently, which seems kind of dumb, but that sort of thing is normal in the intel instruction set, which has grown and changed over time like a gnarly old tree.
There are lots of different instructions for loading the XMM registers from the main memory bank and saving them back again. I will need to study these in some detail but will just skip over them for now.
The bit-combination instructions can take one of their operands directly from main memory, but the destination (and the other operand) must be a register. For fun, let’s dive into the details of the instruction encoding — well, fun for me… most of you poor readers will find this horrifying, but I love to sink my teeth into the gory details. Suppose we want to perform a logical AND on the 128-bit registers xmm0 and xmm1 and store the result back into xmm0. That is: xmm0 := xmm0 AND xmm1. Oddly, there are a number of different instructions that can do this, depending on how many chunks we want to break each register into. For the case where each register is treated as a single 128-bit vector, the opcode mnemonic is “PAND”. So the assembly language instruction is “PAND XMM0, XMM1″. Because the “PAND” instruction has a version that operates on the old 64-bit MMX registers, the encoding for this instruction begins with the “prefix byte” 0×66, which is a directive to use an alternate operand size. The opcode for PAND is 0×0F 0xDB. And, finally, instructions like this one use a byte to specify the addressing mode and which registers to use. In this case, the value of that byte is 0xC1, which means that the DEST is xmm0 and the SRC is xmm1. So, the complete encoding of the instruction is: 0×66 0×0F 0xDB 0xC1.
Suppose I wanted to use the registers xmm8 and xmm9 instead of xmm0 and xmm1. This is an example of how the history of the instruction set becomes visible. xmm8 - xmm15 were just recently added, and are in fact only available when the CPU is running in “64-bit mode”, meaning roughly that I need to be running a 64-bit operating system to get at those registers! And, rather than completely redesigning the instruction encoding, there is a new prefix byte that serves several such purposes. In this case, saying to use an “alternate interpretation” for the register numbers. That byte is 0×45. So the encoding for “PAND XMM8, XMM9″ is: 0×66 0×45 0×0F 0xDB 0xC1.
I thought for a while that I’d go through all of the instructions that could be relevant for processing bits or bit-vectors, including transfering them to and from memory and shuffling them within and between registers. Such a project would go on and on for a long while and probably would not be very entertaining. Suffice it to say that there are four logical operations: AND, OR, ANDN, and XOR. There are a LOT of bit-shuffling and load/store permutations. And there are a variety of bit testing instructions to control program flow.
One last point. The POPCNT function mentioned above is pretty useful and SSE version 4.2 has it implemented as a new machine instruction which becomes available in the exciting new “Core i7″ (aka Nehalem) architecture that goes on sale in a few days. My CPUs are from the previous generation so if I end up wanting to use POPCNT in support of bit vector data types for reasons like the ones mentioned earlier, I’ll have to code up a software routine to count the bits.




I wonder how “smart” a feral human would be… what little evidence we have indicates that such an unfortunate person would not be very “intelligent” by any reasonable definition. It seems to me that there is a difference between how intelligent an entity is, vs how intelligent that entity could be. This means that entities (humans or perhaps AGIs) can learn to be smarter. Intelligence itself can be learned.
The methods, skills, and facts we learn that increase our intelligence are part of our culture. It took many thousands of years to get us from the capabilities of pure biology to our current state. I think there are two types of learning involved: learning better ways to think and learning to use intelligence-augmenting artifacts.
I don’t see any reason to think we humans have reached the limits of either type.
So maybe the goals for building AGI could be attained through Intelligence Augmentation: IA instead of AI.
In his ongoing series of blog posts on Overcoming Bias, Eliezer Yudkowsky often explicitly addresses the first type, putting forth a particular vision of rationality based on Bayesianism as a successor to the Scientific Method. Whether his specific ideas are very helpful or not in practice, it’s a very cool effort and I wish more attention was given to the idea that intelligence is trainable. If intelligence training got as much attention as sports training, who knows what we might accomplish.
The continual rise of computers makes the second type of IA development (artifacts) a very tantalizing subject for thought, invention, and experimentation. What sort of artifacts have and might make us smarter?
Lots of them! Here are some examples:
Other more speculative things have been tried, and such exploration is what I find really exciting… how can humans and machines work together to be smarter than either in isolation?

Probably the most important problem in moving the “artifact” type of IA forward is the communication issue. The requirements and results of any automated process must be communicated accurately and efficiently (from human to computer, and from computer to human), which is a huge challenge. Humans are used to communicating with each other, and we rely on our conversational partners to have high intelligence and a vast shared conceptual experience.
Other types of Intelligence Augmentation, such as so-called “smart drugs” and direct electrochemical brain-computer interfaces, are of some small academic interest to me personally but are beyond the scope of this blog.
Perhaps insights into the nature of Mind, brought on by our early tentative AGI efforts, will result in new and better IA tools as spin-offs. If so, we could even conquer the possibility that we are simply not smart enough to build AGI ourselves. A sequence of self-improving mind/machine hybrids might be the natural path to that city on the hill, just past the horizon.
I’m not aware of very much research or development involved in pushing Intelligence Augmentation to greater levels of performance. Is there some good stuff out there that I’m missing?




Author’s note: If possible, you should play the song Spiraling Shape, by They Might Be Giants, while reading this blog entry.
Listening to people complain is rarely entertaining and I did feel a little bit guilty for complaining about the state of AGI a few days ago. Nevertheless, I’m going to complain some more in this blog post. After this, having purged my mental digestive system of accumulated toxic waste, I will be able to move forward with cheerful optimism.
I’m interested in figuring out how to build AGI systems, and so if some claim or topic of discussion gets bandwidth, it is worthwhile if (and to the extent that) it imposes specific requirements on AGI implementation. If a topic of study or discussion leads to no such requirements, or if the imposed constraints are too fuzzy to pin down specifically, the result is not helpful. Unhelpful topics which are so interesting that they recurringly or continually suck up large amounts of mental energy and conversation I call Mind Traps.
There’s nothing wrong with playful or speculative forays into potential dead ends, but nasty Mind Traps sap our insight, time, and sanity.
Unfortunately, those stuck in Mind Traps do not agree that the subject of their attention is a sinkhole of futility, so most AGI folk will strongly disagree with me about some or all of my list of Mind Traps.
Rather than belligerently rail against these topics, I am simply going to list them. My hope is that some reader someday when considering one of these topics as they think about building AGI will ask: “What specific constraint does this place on an AGI implementation? Is it really helping me understand, design, or build an AGI? If there is a specific impact, is it really likely to be the best way to look at the issue at hand?”
Mind Traps (when you see these words or phrases, Run Away):
Thank you for your attention. Which useless dead ends did I miss?




As I write this, the fastest fairly normal PC CPU is the just-released Intel Xeon X7460, a 2.66 GHz 6-core monster. Seems like I’ve been stuck in boring quad-core land forever, and six cores would be a nifty upgrade. Unfortunately, the top-end ultra sexy chips aren’t cheap and in fact it looks like the X7460 at present costs several thousand dollars apiece. I’m not willing to pay that, so now the question of which CPU to get is a performance/cost tradeoff. Sparing you the boring details of my bargain hunting and comparison shopping, I settled on a set of two Intel Xeon E5410. These are 2.33 GHz quad-core processors with 12MB of L2 cache and a 1333 MHz bus speed.
It’s remarkable to me how similar in spirit are the architectures of current mainstream processors like this Xeon and the architecture of the old Thinking Machines CM-5, which I wrote about here a while ago. In both, the basic idea is to have a number of roughly independent computer processors operating in parallel, and each of those processors has the capability of vector processing — performing computations on several array elements at the same time. Most commonly that involves floating point arithmetic.
The four cores of each E5410 together execute about 9.3 billion instructions per second. Instructions that use the VLIW vector unit can do several operations at once. Probably the most useful case for me involves four simultaneous operations (each operating on a 32-bit number), so that’s a max of 37.3 billion operations per chip. 74.6 billion for both CPUs combined. Nice.
The challenge comes in keeping the computation units fed. From reading the documentation, each core has a small amount of directly-accessible memory, and a few megabytes per core of L2 cache, but algorithms manipulating large amounts of data will need continual access to a lot more memory than that… which means rapidly communicating data from the main memory banks to the CPUs. That path appears to have 10.6 GB/sec of bandwidth for each chip (which would total 21 for both chips). That’s 2.65 GB/sec per core. A vector register is 128 bits, so each core can do 166 million register loads/saves per second, or one every 14 cycles. Thus the required arithmetic intensity to keep the chips busy is 14 in the worst case where the cache is ineffective. I’ll put together some simple benchmark programs to make sure these numbers are right.
So: it would be best if the core data manipulations of AGI-related algorithms could be expressed as parallel streams of numerical operations on short vectors, with a moderately high (or very high) ratio of computation to memory access.
My new computer will have 16 gigabytes of memory — an arbitrary choice based on budgetary constraints (the memory is $25 per gigabyte).
Peeking a little bit into the future, I will probably replace the machine sometime in 2011. Unless some unexpected shift occurs in the technology, I can make a pretty good guess as to what the CPUs of that machine will look like: Each one will have 8 cores and will have 2-way hyperthreading. The vector registers will increase in size to 256 bits (Intel AVX). Given a modest improvement in clock speed, this all adds up to maybe 6-8 times the performance for each CPU. I do not expect memory bandwidth to increase at the same rate, so the required arithmetic intensity will increase. I hope to be able to afford 64 gigabytes of memory for that machine.


More Options ...

Categories
Tag Cloud
Blog RSS
Comments RSS


Void (Default)
Life
Earth
Wind
Water
Fire
Lightweight