26 Nov 2008 @ 2:31 AM 

Along with the change in focus, I’m making a couple of changes to this blog itself, including a new visual theme.  I will be adding more content outside the main posts… along the bottom of the browser you’ll find links to some other pages, most of which are barely more than placeholders so far.

The “About” page will give an overview of the stuff I’m writing about.

The “Links” page will include links to other blogs and internet sites related to modelling that I find interesting

Rather than just write about stuff, I really want to get to work on some hobby projects, so over the last couple of weeks I have jotted down some project ideas that I can pick from when I want to start a new project.  These projects will get their own pages as I start working on them.

I’m sure I’ll think of some other pages as well as time goes on.

And finally, I have a new domain name:  supermodelling.net.

Tags Categories: Blog Posted By: Derek
Last Edit: 07 Dec 2008 @ 09 42 PM

E-mailPermalinkComments (0)
 26 Nov 2008 @ 2:28 AM 

It seems to me fruitful to think of intelligence as having three deeply-interrelated components: modelling, language, and “weird stuff” — where “weird stuff” is consciousness and free will and so on: the things that writers put in novels and movies to make interesting plots, and what gets discussed on mailing lists and frightens those with weak ties to reality. Much of that anthropomorphic material makes it seem like AI researchers are writing programs to create golems or voodoo dolls.

I am still fascinated by all aspects of AGI, and probably always will be, but since I want to focus on things that are not quite so broad as “AGI”, this is a good place to start chopping. Besides, I’m not really all that interested in the weird stuff any more.

I am interested in language, but not enough to focus on it.

That leaves modelling. What do I mean by that word?

Modelling is the representation of things, and methods for manipulating those representations which, when interpreted, can be put to good use. Probably the biggest such use is prediction. If the model is accurate, we can extrapolate its parameters over time to predict what the modelled thing will do in the future. We can predict the effect of performing some operation on a thing by performing an analogue of that operation on a model of the thing.

Besides prediction, there are other sorts of reasoning we can do using models — we can make guesses about the origin of a thing by manipulating models of other things (such as components). We can also model abstract sorts of things like categories by which means we can figure out how to recognize them.

And there is much more to the story, I’m sure of it. I’d like to spend time thinking about the question “What is modelling?”. It seems so much more answerable and useful than asking “What is intelligence?”.

The most interesting question of all — one that I do not yet feel qualified to even begin addressing — is how to model modelling itself. This kind of situation where questions and methods wrap around on themselves leads directly to some of the “weird stuff”… and if ever I do work up the chutzpah to study the bizarre things, it will probably be by sneaking up on the idea of modelling modelling.

But not today. Today I am curious about all the ways we have of modelling things on computers. What are they good at? What are they bad at? Why? Do the methods support construction or adjustment of models automatically or must they be completely pre-specified? How efficient are the methods? Can things be modelled using multiple techniques? If so, what is the relationship between the models?

One interesting distinction seems to be whether something is modelled from the inside or the outside. Outside models are based and judged purely on external observations of the subject. A painting is an outside model (though most good artists aspire to be inside), and a neural network (or other statistical regression technique) is usually probably an outside model as well.

Inside models address the “why” behind the subject — a prediction using an inside model is based on interactions of sub-models of things purportedly making up the subject. It is interested in causal relationships.

Outside models do not require deep understanding, focus on surface features, and can often be easily learned. Inside models reflect fuller understanding, reflect a subject’s internal structure, are much more interesting, and are usually very difficult to acquire automatically.

I am tempted to say that animal brains only make outside models; humans have some ability to form inside models. But I haven’t thought about it that much, and it’s unclear that the conclusion is worth anything even if true.

So I’m going to learn more about modelling, and probably model a few things myself.

Tags Categories: Modelling Posted By: Derek
Last Edit: 05 Jun 2009 @ 12 09 AM

E-mailPermalinkComments Off
 21 Nov 2008 @ 10:05 PM 

As my final high-level post for a while on AGI, I’d like to give my opinions about “Friendly AI”.

For those who don’t know what that means, consider this: Suppose I were to have a series of conceptual breakthroughs and write a boatload of code and the result is an AGI that is “smarter” than me. Call it AGI-1. Next, suppose that it learned enough to make it more knowledgeable than me about AGI theory and about building software systems. Then, supposing that there is an AGI design significantly better than AGI-1 available within the capabilities of AGI-1, it should be able to build a smarter AGI, call it AGI-2. Then, if there is yet another better AGI design within the grasp of AGI-2, it could produce AGI-3. And so on. If this “recursive self-improvement” process proceeds rapidly (that is, if the development time for each cycle is small), the result would be an AGI system that is a LOT LOT smarter than me.

Alternatively, it could be that AGI-1 by itself might be a LOT LOT smarter than me.

Either way — one has to worry whether we could stop this superintelligent AGI from doing things we don’t want it to do. Like kill us, for example.

Certainly this idea is not obscure. The “Terminator” science fiction franchise (along with many other scifi stories) illustrates exactly this scenario. Friendly-AI cognoscenti tend to frown on that point because they disagree with the technical details and the nature of the future that results, but I think that’s irrelevant. The point is that the masses of humanity are quite aware of the potential danger. They don’t think it’s worth worrying about, because it’s a bizarre scifi thing.

My personal view is that this quite likely will be a real and possibly big problem someday. In fact, soon a very minor version of the problem will be getting a lot of attention. More and more, robotic systems will have the ability to harm people (because they will be more common and will have the ability to control more powerful and mobile physical devices). In the immediate future the issue will not be whether they become homicidal maniacs because they won’t be smart enough for that to even be possible — no way to even have the necessary concepts. First the issue will just be whether their software might be buggy in other more mundane ways. What process should we use to validate software that drives cars? How can we minimize the number of accidents caused by poor decisions in household cleaning robots?

I only mention this case because I think it will naturally bring the question (and the inevitable quips about Asimov’s three laws of robotics) into public discourse.

Now, although I think this will be a problem someday, I don’t think it’s going to be a problem soon. We simply aren’t that close to building an AGI with the appropriate level of smartness. It is possible that the solution will be simple and somebody will find it soon but it seems extremely unlikely to me. It’s true that I cannot draw a definite conclusion from the poor results produced by people who are trying very hard right now to build AGI, but it is relevant. For me to give any significant probability to this kind of scenario with multiple revolutionary inventions comprising many huge leaps of understanding, I need to see something that looks like progress in some direction.

Further, for this to be a problem with any sort of suddenness, the AGI would have to be astonishingly more intelligent than a human; able to quickly make multiple technical breakthroughs in many different fields and rapidly master every field of human expertise. To me, it is a huge stretch to posit that near-term commonly-available computer hardware will have the ability to host such a program.

Suppose I’m wrong. Suppose that some secret group somewhere is solving the problems — or more generally the time is almost right and parts of the puzzle start falling into place quickly. And suppose that it turns out that the core of intelligence is amazingly simple and hardware isn’t a limitation.

It would be desirable if this system were programmed with an “ethical system” that keeps it from harming us (or, better yet, from wanting to harm us) — no matter how many times it redesigns and improves itself, no matter how its code drifts and changes over the entire length of the indefinite future. Figuring out how that could work is the “Friendly AI” problem. Here’s a reference. It could turn out somehow that Friendliness is inherent in any superintelligence, but the fact that humans are not Friendly rather dashes that hope as far as I’m concerned.

It becomes even trickier because solving the problem isn’t enough. The first successful AGI project has to correctly include the solution in the implementation. And, all subsequent AGI projects also have to do so. Even though it seems like a good idea, how can we guarantee that? Or, lacking a guarantee, how can we at least make it very likely?

  • We can prevent any AGI from ever being built. In the case where it is extremely difficult to develop AGI and/or it requires large supercomputers to execute at dangerous speeds, an intelligence/military effort similar to how we deal with nuclear proliferation might be successful for a while, until a better solution can be found. But the “easy code, hard takeoff” scenario favored by Friendly AI folks is much more difficult to handle. It would require a terrifying level of surveillance, presumably by all governments under some sort of treaty, or imposed by one or more superpowers. Unfortunately, it might be the case that such a surveillance society is coming anyway to combat more mundane threats like bioterror. The technical capabilities for building a total-surveillance infrastructure might only be a few decades away. I don’t really want to speculate on the details because it’s too depressing. If several significant terrorist attacks occur using WMDs, I could imagine the political will to use the technology becoming real.
  • The first AGI can be built to be Friendly, then helped and encouraged to “take over the world”. After that, the AGI can take on the surveillance tasks described above. The one AGI becomes humanity’s partner. In this scenario, we do have to develop Friendliness first, then maximize the probability that it is done right. An open process (or perhaps a closed but very heavily funded process) seems much safer than having a small group work on Friendliness theory and implementation. Even so, the effort should definitely contain a rigorous mathematical/logical framework for proving the correctness of the design.

This is all very scary and unreal-sounding. It is not comfortable to think that the two above bulleted scenarios are the only ways to secure ourselves from extinction in the near future. And, as I said before, I don’t personally think superpowerful AGI will arise soon. And, even more important, I don’t think it will arise suddenly, which means that if the draconian measures listed above cannot be decided on, we might nevertheless survive a slower rise of AGI.

It might turn out that “superintelligence” is impossible… that, for some reason, no AGI can be very much smarter than the human race as a whole. If that’s true we probably don’t need to worry that much about it. But I don’t see a good reason to think that such an intelligence cap exists, and it certainly doesn’t seem prudent to bet on it being true.

So, no matter what, it seems as if solving the Friendliness problem is a good idea, and the sooner it can be solved the better. So far, not much progress is apparently being made, although few people if any are actually working hard on the problem.

The ardent believers in the near-future hard-takeoff scenario come across as rather fanatical and alarmist; to be fair, if they are right then I guess they should be. I’m surprised that they don’t seem to be working very hard on an actual solution beyond just “awareness raising”. As I noted before, the public is quite aware of the issue. The only thing the public needs to be shown is evidence of imminence, but no such case is ever attempted.

I am completely perplexed that the believers don’t have some sort of forum for discussing approaches to solving the Friendliness problem, especially technical issues and underlying concepts.

I have thought about starting such a forum myself, but it’s a lot of work to attempt serious community building, and there is no guarantee of success, especially given the idiosyncratic nuttiness of the interested parties and the apparent intractability of the problem.

Still, I might create such a forum just to see what happens.

Beyond that, though, since I’m out of the AGI game, there’s no need to worry about whether I’ll destroy the planet by letting loose a rogue UnFriendly AGI. I’m not even thinking about AGI.

I have some more tangible things in mind.

Tags Categories: AGI Posted By: Derek
Last Edit: 07 Dec 2008 @ 09 43 PM

E-mailPermalinkComments (4)
 20 Nov 2008 @ 6:08 PM 

Bits are cool, but we’d really like more expressive data types to help us do more interesting work. We satisfy this desire by grouping bits together in bunches and inventing ways of interpreting these bit vectors.

One way to treat a vector of bits as a unit is to note that there are 2^N possible combinations of N bit values. So, just like a particular bit value can be interpreted as a choice between two options, a particular N-bit value can be interpreted as a choice among 2^N options.

There are lots of different ways to use this idea. We could use 3 bits to represent 8 different colors. We could use a bunch of bits to represent the letters of the alphabet. And so on.

We can also use N bits to represent 2^N different numbers. The most natural way (yes, that’s a pun) is to represent a range of integers. Note that there are lots of other sensible ways of mapping bit patterns onto numbers. For example, we might like to represent fractional numbers (”floating point” or “fixed point”).

Two major ways of picking a range of numbers are most common. 8 bits either map to the values 0..255 or -128..127 (”unsigned” or “signed” integers).

It’s a pretty expensive and drastic thing to do — given a 128-bit register, as bits it can represent 128 different things. But as a single 128-bit integer (with a very large range of potential values), it represents just one thing — a two order-of-magnitude hit on memory use and possible computation speed. C’est la vie: integers are useful so we pay the price. However, there is some significant payoff in using smaller bit vectors rather than larger ones if the smaller vector can represent the range of values needed for some particular task. This is why my CPUs have lots of different instructions for dealing with 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit integers.

An XMM register can hold 16 8-bit integers, which are operated on in a pairwise fashion, just like bits were in a previous post. And also just like bits, addressing individual values is a problem. There are a LOT of instructions for loading, storing, shuffling, and swapping the integer-vectors contained in registers, but the efficiency of information processing algorithms goes up a lot if we don’t have to do that very much. It’s a lot better if we can usefully build meaningful data types that use vectors of integers instead of just individual integers, or if our algorithms can be written in a “data parallel” way — in this case performing 16 parallel computations on 16 mini-registers. I think this is really important for trying to choose efficient representations from the universe of possibilities.

Integers can be used for a lot of different things. They can represent counts of things, (approximate) measurements of physical properties like mass, distances along axes in multidimensional space, and many other mappings. Finally, just like having representations of bits gives us access to abstract systems like boolean logic, having representations of integers gives us access to vast abstract mathematical systems built on top of integers. To make that work, the CPUs include instructions for the basic mathematical operations such as addition, subtraction, multiplication, and division.

I won’t go into the gory details of the instructions themselves or their encoding, but I’ll make a couple of observations. First, the data types are not completely orthogonal — for example, I do not believe that there is an 8-bit multiply. That means a tedious amount of checking processor features will be necessary when making decisions about data types. Second, it’s important to note that not just the vector processors are capable of dealing with integers. The “normal” instruction set has plenty of instructions for working on values stored in “normal” CPU registers. I focus on the SSE subsystem because of the potential performance boost but shuttling data back and forth to the special XMM registers is not always the right way to do things.

Finally, there is an interesting and curious special type of operation available on 16-bit integers (and somewhat on 8-bit) called “saturated” arithmetic. Suppose we have two unsigned 8-bit numbers: 210 and 83. Adding them up should produce the answer 293, but 293 cannot be represented in 8 bits. So what happens? In normal computer arithmetic, the “overflow” is noted in a status register and the high bit is simply discarded, which means that the stored answer is 37. It would be rare for that to be a desirable outcome, so avoiding overflow is usually highly desirable. “Saturated” arithmetic says that since 255 is the biggest value representable, that value should be the result of any computation that overflows to a larger value. Invented for use in DSP applications like audio processing, I think this graceful nonlinear response is really interesting and maps pretty well onto intuitions about numbers. There might be interesting mathematical properties for saturated arithmetic as well that arise from the nonlinearity which could make it an interesting component of some modelling methods.

Tags Categories: Data Types Posted By: Derek
Last Edit: 07 Dec 2008 @ 09 44 PM

E-mailPermalinkComments (0)

At least it should be.

Aside from raw materials in their natural location and state, every single thing of value — physical or abstract — is the product of intelligent action. Applied intelligence creates all wealth. A natural core exists in each of us, of course, and that essence is a large part of what we love and cherish in each other and our world, but all else is the product of industrious mind.

Artificial minds will be just as potent a source of value as our natural ones — actually a great deal more. Their economic impact will be incalculable, many times greater than the sum total value of everything ever created.

On a vastly smaller scale, almost trivial, Bill Gates once said: “If you invent a breakthrough in artificial intelligence, so machines can learn, that is worth 10 Microsofts.” Literally, a couple trillion dollars, though the point of the comment was not to attempt an accurate valuation.

You get the point. Money isn’t the only way of measuring worth, but it is one interesting way.

Now, here’s the puzzle: How come AGI development has produced basically zero dollars profit so far, and why can’t AGI efforts attract even small amounts of capital investment?

I have a couple of possible answers for this.

  • Maybe nobody yet knows how to make significant progress toward AGI. Or at least, nobody can convince investors that they know how.
  • Maybe it is beyond our ability. Humans, even in groups, are only so capable. As finite creatures, there are limits to what we can do. Maybe building AGI is just too hard.
  • Maybe we are making money. There are many different overlapping and even contradictory ways of thinking about intelligence. Over the last few months I have moved toward the viewpoint that, roughly speaking, intelligence == modelling ability. Starting from there, it makes sense to think that most of the entire history of the computer industry comprises the first steps toward artificial intelligence, generating quite a lot of value and wealth in the process as we develop the means for modelling the universe on our machines. The computer industry moves forward every day, they just don’t call themselves AGI.
  • What do you think the answer is?

If there was actual progress being made toward AGI, the field would not consist of a marginal fringe club of futurophiles debating consciousness, it would be an economic juggernaut. In our modern world, more and more of what we do is touched in some way by computer software, and that software is unbearably stupid. All of it. Not only does it malfunction with alarming frequency, but even when it does work it is completely clueless about the needs of its users, displays almost no fluency with the subject matter it is supposed to be about, and never learns. If we can make that software knowledgeable, smarter, more adaptable, more robust — even small steps would be huge. And that’s not even taking into account coming technological gold mines like robotics, massive recordings of video streams, ubiquitous networking, immersive virtual reality, microbilling, scientific simulation, and on and on.

Typically, the excuse given for lack of progress toward anything tangible is that all “general” intelligent tasks are AGI-Complete — meaning that the whole problem has to be solved in order to solve a piece of it.

That can’t be right. It has to be a consequence of thinking about the problem in the wrong way. So what’s a right way? I am not certain, but there are many possibilities. Here are a few:

  • AGI practitioners by and large think that by breaking up intelligence into “narrow” issues and problem domains, mainstream AI research has lost the dream. But maybe that’s wrong. Maybe general intelligence really is just relatively straightforward combinations of “narrow” intelligences. If so, AGI should be all about the process of rapidly developing narrow AI technologies and making them work together to solve problems in real-world task domains. Yes, it’s hard. So make it easier!
  • Maybe focusing on a particular application with large economic potential — e.g. natural language question answering, robotic control systems, or forecasting — from an AGI perspective would provide the right leverage for producing self-sustaining progress. Rather than starting with a system that does absolutely nothing (but does it in a completely general way) and try to make it do something from there, it might be better to focus single-mindedly on gradually increasing the generality of a system that does only one thing.
  • What do you think is the right way to think about the problem?

Earlier I mentioned that I have been gradually coming to hold the viewpoint that intelligence == modelling ability, and I have touched on that in other blog postings about concepts being models of things in the universe. My bet on the best approach to being real, relevant, and successful is to move forward from that premise. So that’s exactly what I’m going to start to do, though I am not yet certain how to best proceed so the path will be long and winding.

I will be writing more about this anon.

Tags Categories: AGI Posted By: Derek
Last Edit: 07 Dec 2008 @ 09 44 PM

E-mailPermalinkComments (2)
 14 Nov 2008 @ 9:50 PM 

AGI enthusiasts may be puzzled by my previous post: Since we don’t understand what exactly intelligence is or how to build it, surely it’s completely backwards to focus in such excruciating detail on the transient technological details of computer components ordered from newegg.com!

In answer, I say this:

  • I suspect that there are many different designs that can exhibit general intelligence. I don’t have any proof of that, but just about anybody fascinated by AGI accepts that at least two such designs exist — brains and one (yet to be found) computer-hosted AGI implementation. If there are two, it seems likely to me that there are more than two. So, given no strong reason to prefer one type of design over another, why not search for designs that are natural fits for the available computing machinery? And that does not mean Turing Machines.
  • I am moving away from AGI as an explicit goal, at least for the time being. Now that my head has come out of the clouds a bit, mundane technology issues seem suddenly relevant to things I might do.
  • It’s interesting. These chips are one of the pinnacles of our technical achievement as a civilization. As a techno-geek, I think the details are glorious.
  • An unhurried step-by-step look at computational building blocks might generate some useful ideas about how to usefully assemble them into more complicated structures.

So, moving on… my new computer has now been completely assembled and I installed 64-bit Windows Vista on it (curse me if you like, but I’m not really interested in operating system wars and Windows works fine for my needs). Now that it’s working, I can make some analysis and actual measurements of instruction execution and the memory hierarchy. The figures I arrived at are simple approximations derived from reading specifications, running SiSoft Sandra and writing some short C/Assembly test programs. Detailed performance analysis is a hugely complicated task in general.

Instruction execution performance can be characterized with throughput and latency. What I mean by these:

  • Throughput: how many instructions can get executed, assuming no inter-instruction dependencies.
  • Latency: how long it takes for an individual instruction to complete.

Throughput is different than latency because the processor overlaps the execution of multiple instructions so parts of several instructions can be executing at the same time.

Consider the instruction mulps xmm0, xmm1. This multiplies each of four floating point numbers stored in two registers together, storing the result (so it does four floating point multiplies). This instruction has a latency of 4 cycles, but if the algorithm being computed can do other work in the meantime using other resources, the throughput can reach one instruction per cycle.

Now let’s look at the instruction movaps xmm0, [rax]. This means: move a vector of four floating-point numbers (16 bytes total) from the memory address stored in the 64-bit register rax into the 128-bit register xmm0. The memory address must be an even multiple of 16 bytes. The throughput of this instruction is as high as one instruction per cycle, but the instruction latency varies wildly depending on whether and where the memory data is cached.

Each core has a 32KB “level 1″ data cache with a 3 cycle latency. Most algorithms can get pretty close to filling in the level-1 cache access latency delays with other instructions, so that latency isn’t too much of a problem.

The CPU has 12MB of “level 2″ cache. The latency of L2 cache access is approximately 18 cycles. So getting the needed data transferred from L2 to L1 before it is needed, and then working with it for a while before moving on to more data, is important to avoid waiting for access to the L2 cache.

The 16GB of main memory on my computer is much worse. The latency for accessing it is something on the order of 160 cycles. Since most AGI-related algorithms are likely to operate on vast amounts of data, it is crucial to try hard to prefetch data before it is needed, and then work with it for a while — because the throughput of main memory is only something like 10GB/sec.

Two architectural features — Vector Computation and Memory Latency — are the most important concepts to keep in mind when trying to figure out how to make best use of these amazing CPU chips.

Tags Categories: Computer Hardware Posted By: Derek
Last Edit: 07 Dec 2008 @ 09 45 PM

E-mailPermalinkComments (2)
 13 Nov 2008 @ 4:52 AM 
 

Bits

 

The core information processing capabilities of my under-construction new computer consist of simple (but numerous) instructions that apply transformations on stored instances of built-in primitive data types. All the complex operations of computer software are built out of those primitives.

I’ll start with the simplest primitive data type: the bit.

Bits don’t have inherent semantics, but a number of interpretations have proven useful.

  • A bit can represent a distinction between “true” and “false”. Simple logic consists of transformations of bits representing truth values: AND, OR, NOT, XOR, etc.
  • More generically, a bit can represent the presence or absence of some property… like whether something is “alive” or whether something “exists”.
  • The value of a bit can represent two different choices (possibly chosen from a larger set of options). For example, a bit can represent the numbers 0 or 1, or -1 or 1; a bit can represent yes or no, heads or tails, red or green.

Computer programs, coded from the built-in instruction set, define which interpretation is being used.

The vector processors in each core of the CPUs do have some limited support for efficient operation on bits, by which I mean that each 128-bit SSE register can be thought of as 128 individual bits. Put to maximum theoretical use, the combined CPUs can perform 2.4 trillion bit combinations per second. That’s a lot! But organizing data structures and algorithms to get anywhere near that theoretical maximum is very difficult. The biggest problem is that the bits are not convenient to refer to as individuals, but are instead grouped into larger units; that makes it difficult to combine one bit with another arbitrarily-chosen bit. An instruction like AND combines bits from two values in a specific pairwise fashion — bit 0 from value 1 combines with bit 0 from value 2, bit 1 from value 1 combines with bit 1 from value 2, and so on.

Combining arbitrary bits involves extracting the bit of interest from the larger unit then performing the logical operation on the result. Because that sequence requires several instructions per bit combination, the theoretical computing rate becomes something like 5 billion bit combinations per second — more than two orders of magnitude below the optimistic number from the previous paragraph.

One interesting way to work with these addressing difficulties is to think in terms of bit vectors instead of individual bits. If we can usefully define a data type that is an ordered group of individual bits, where each bit is interpreted in a consistent way (like one of the interpretations I listed earlier), then that chunk of bits can be operated on in parallel, up to 128 of them simultaneously per core. The always-evolving SSE instruction set provides some help for dealing with chunks of bits that are 8, 16, 32, 64, or 128 wide. This sounds kind of awkward, but it’s not out of the question to do useful work this way. Here’s a couple of illustrative examples:

  • If each bit represents the presence or absence of some feature (e.g. a perceptual feature), then large feature vectors can be operated on efficiently… matching, measuring differences, and so on. The “Hamming Distance” between two such vectors A and B is a measure of “difference” and can be defined as the number of “1″ bits in A XOR B — sometimes written as POPCNT(A XOR B).
  • Suppose we treat each bit as the numeric value -1.0 or 1.0, then an N-bit value is a normalized vector in N-dimensional space where only certain directions are representable. The dot product of two vectors A and B can be derived from the number of 0 bits in A XOR B — or POPCNT(NOT(A XOR B)).

The vector units have 16 128-bit registers available, called XMM0 - XMM15. Half of these (XMM8 - XMM15) are only available in 64-bit mode and instructions using them are encoded differently, which seems kind of dumb, but that sort of thing is normal in the intel instruction set, which has grown and changed over time like a gnarly old tree.

There are lots of different instructions for loading the XMM registers from the main memory bank and saving them back again. I will need to study these in some detail but will just skip over them for now.

The bit-combination instructions can take one of their operands directly from main memory, but the destination (and the other operand) must be a register. For fun, let’s dive into the details of the instruction encoding — well, fun for me… most of you poor readers will find this horrifying, but I love to sink my teeth into the gory details. Suppose we want to perform a logical AND on the 128-bit registers xmm0 and xmm1 and store the result back into xmm0. That is: xmm0 := xmm0 AND xmm1. Oddly, there are a number of different instructions that can do this, depending on how many chunks we want to break each register into. For the case where each register is treated as a single 128-bit vector, the opcode mnemonic is “PAND”. So the assembly language instruction is “PAND XMM0, XMM1″. Because the “PAND” instruction has a version that operates on the old 64-bit MMX registers, the encoding for this instruction begins with the “prefix byte” 0×66, which is a directive to use an alternate operand size. The opcode for PAND is 0×0F 0xDB. And, finally, instructions like this one use a byte to specify the addressing mode and which registers to use. In this case, the value of that byte is 0xC1, which means that the DEST is xmm0 and the SRC is xmm1. So, the complete encoding of the instruction is: 0×66 0×0F 0xDB 0xC1.

Suppose I wanted to use the registers xmm8 and xmm9 instead of xmm0 and xmm1. This is an example of how the history of the instruction set becomes visible. xmm8 - xmm15 were just recently added, and are in fact only available when the CPU is running in “64-bit mode”, meaning roughly that I need to be running a 64-bit operating system to get at those registers! And, rather than completely redesigning the instruction encoding, there is a new prefix byte that serves several such purposes. In this case, saying to use an “alternate interpretation” for the register numbers. That byte is 0×45. So the encoding for “PAND XMM8, XMM9″ is: 0×66 0×45 0×0F 0xDB 0xC1.

I thought for a while that I’d go through all of the instructions that could be relevant for processing bits or bit-vectors, including transfering them to and from memory and shuffling them within and between registers. Such a project would go on and on for a long while and probably would not be very entertaining. Suffice it to say that there are four logical operations: AND, OR, ANDN, and XOR. There are a LOT of bit-shuffling and load/store permutations. And there are a variety of bit testing instructions to control program flow.

One last point. The POPCNT function mentioned above is pretty useful and SSE version 4.2 has it implemented as a new machine instruction which becomes available in the exciting new “Core i7″ (aka Nehalem) architecture that goes on sale in a few days. My CPUs are from the previous generation so if I end up wanting to use POPCNT in support of bit vector data types for reasons like the ones mentioned earlier, I’ll have to code up a software routine to count the bits.

Tags Categories: Data Types Posted By: Derek
Last Edit: 30 May 2009 @ 01 47 AM

E-mailPermalinkComments Off
 08 Nov 2008 @ 9:50 PM 
 

IA

 

I wonder how “smart” a feral human would be… what little evidence we have indicates that such an unfortunate person would not be very “intelligent” by any reasonable definition.  It seems to me that there is a difference between how intelligent an entity is, vs how intelligent that entity could be.  This means that entities (humans or perhaps AGIs) can learn to be smarter.  Intelligence itself can be learned.

The methods, skills, and facts we learn that increase our intelligence are part of our culture.  It took many thousands of years to get us from the capabilities of pure biology to our current state.  I think there are two types of learning involved:  learning better ways to think and learning to use intelligence-augmenting artifacts.

I don’t see any reason to think we humans have reached the limits of either type.

So maybe the goals for building AGI could be attained through Intelligence Augmentation:  IA instead of AI.

In his ongoing series of blog posts on Overcoming Bias, Eliezer Yudkowsky often explicitly addresses the first type, putting forth a particular vision of rationality based on Bayesianism as a successor to the Scientific Method.  Whether his specific ideas are very helpful or not in practice, it’s a very cool effort and I wish more attention was given to the idea that intelligence is trainable.  If intelligence training got as much attention as sports training, who knows what we might accomplish.

The continual rise of computers makes the second type of IA development (artifacts) a very tantalizing subject for thought, invention, and experimentation.  What sort of artifacts have and might make us smarter?

Lots of them!  Here are some examples:

  • Finding and reading things other people have written down.  From oral traditions to writing in all its long history to Google, we learn easily from each other.  The more ideas we don’t have to reinvent, the higher up the hill we start our climb.
  • Long term memory enhancement.  Note taking, video recording, and so on.  Refreshing and correcting our memories makes us better at using them.
  • Short term memory enhancement.  Scratch paper.  I go through several notebooks per week, and am continually trying to find software tools that improve on pen+paper for increasing the “working space” of thinking (with little success, by the way).  My computer has two display monitors totaling 6.4 million pixels, used mostly to keep relevant things at hand and visible.
  • Organization tools.  Outlines, gantt charts, etc.  I use a program called MindManager to create “mind maps” which are somewhat helpful organizational frameworks.  Things like UML also could be thought of as organization tools.
  • Calculation.  From pocket calculators to software such as Mathcad, we become smarter if the speed and accuracy of calculations increases.
  • Simulation.  Whether building models out of clay or performing a finite element analysis of a bridge, subjecting our intuitions to accurate reality checks has many benefits to our thinking process.

Other more speculative things have been tried, and such exploration is what I find really exciting… how can humans and machines work together to be smarter than either in isolation?

  • Human as utility function of search process.  Sometimes a task can be thought of as a search through a well-defined space.  In cases where the quality of a particular solution cannot be formalized, a person can act as the “fitness” function.  Sometimes this is referred to as “generative design”.  For one narrow example, graphical textures are often generated as an interactive search.
  • Human as inference control of search process.  Sometimes a person might be able to reduce a large search space by pruning out unpromising areas of the space or focusing on possibilities that seem fruitful.
  • Automatic programming.  Long a holy grail of AI research, using computer software to generate programs from human-friendly ambiguous specifications would be a huge boost in many cases to productivity and, arguably, intelligence.


Probably the most important problem in moving the “artifact” type of IA forward is the communication issue.  The requirements and results of any automated process must be communicated accurately and efficiently (from human to computer, and from computer to human), which is a huge challenge.  Humans are used to communicating with each other, and we rely on our conversational partners to have high intelligence and a vast shared conceptual experience.

Other types of Intelligence Augmentation, such as so-called “smart drugs” and direct electrochemical brain-computer interfaces, are of some small academic interest to me personally but are beyond the scope of this blog.

Perhaps insights into the nature of Mind, brought on by our early tentative AGI efforts, will result in new and better IA tools as spin-offs.  If so, we could even conquer the possibility that we are simply not smart enough to build AGI ourselves.  A sequence of self-improving mind/machine hybrids might be the natural path to that city on the hill, just past the horizon.

I’m not aware of very much research or development involved in pushing Intelligence Augmentation to greater levels of performance.  Is there some good stuff out there that I’m missing?

Tags Categories: Uncategorized Posted By: Derek
Last Edit: 08 Nov 2008 @ 09 55 PM

E-mailPermalinkComments (0)
 04 Nov 2008 @ 2:06 AM 

Author’s note:  If possible, you should play the song Spiraling Shape, by They Might Be Giants, while reading this blog entry.

Listening to people complain is rarely entertaining and I did feel a little bit guilty for complaining about the state of AGI a few days ago.  Nevertheless, I’m going to complain some more in this blog post.  After this, having purged my mental digestive system of accumulated toxic waste, I will be able to move forward with cheerful optimism.

I’m interested in figuring out how to build AGI systems, and so if some claim or topic of discussion gets bandwidth, it is worthwhile if (and to the extent that) it imposes specific requirements on AGI implementation.  If a topic of study or discussion leads to no such requirements, or if the imposed constraints are too fuzzy to pin down specifically, the result is not helpful.  Unhelpful topics which are so interesting that they recurringly or continually suck up large amounts of mental energy and conversation I call Mind Traps.

There’s nothing wrong with playful or speculative forays into potential dead ends, but nasty Mind Traps sap our insight, time, and sanity.

Unfortunately, those stuck in Mind Traps do not agree that the subject of their attention is a sinkhole of futility, so most AGI folk will strongly disagree with me about some or all of my list of Mind Traps.

Rather than belligerently rail against these topics, I am simply going to list them.  My hope is that some reader someday when considering one of these topics as they think about building AGI will ask:  “What specific constraint does this place on an AGI implementation?  Is it really helping me understand, design, or build an AGI?  If there is a specific impact, is it really likely to be the best way to look at the issue at hand?”

Mind Traps (when you see these words or phrases, Run Away):

  • Turing Machine
  • The Halting Problem
  • Computability Theory
  • Godel’s Theorem
  • Kolmogorov Complexity
  • Model Theory
  • Mind As Evolution
  • Mind As Economy
  • Mind As … (hint:  Mind is Mind, not something else)
  • Consciousness
  • Qualia
  • Meaning
  • Identity
  • Game Theory
  • Evolutionary Psychology

Thank you for your attention.  Which useless dead ends did I miss?

Tags Categories: AGI Posted By: Derek
Last Edit: 07 Dec 2008 @ 09 46 PM

E-mailPermalinkComments (1)
 02 Nov 2008 @ 2:51 AM 

As I write this, the fastest fairly normal PC CPU is the just-released Intel Xeon X7460, a 2.66 GHz 6-core monster.  Seems like I’ve been stuck in boring quad-core land forever, and six cores would be a nifty upgrade.  Unfortunately, the top-end ultra sexy chips aren’t cheap and in fact it looks like the X7460 at present costs several thousand dollars apiece.  I’m not willing to pay that, so now the question of which CPU to get is a performance/cost tradeoff.  Sparing you the boring details of my bargain hunting and comparison shopping, I settled on a set of two Intel Xeon E5410.  These are 2.33 GHz quad-core processors with 12MB of L2 cache and a 1333 MHz bus speed.

It’s remarkable to me how similar in spirit are the architectures of current mainstream processors like this Xeon and the architecture of the old Thinking Machines CM-5, which I wrote about here a while ago.  In both, the basic idea is to have a number of roughly independent computer processors operating in parallel, and each of those processors has the capability of vector processing — performing computations on several array elements at the same time.  Most commonly that involves floating point arithmetic.

The four cores of each E5410 together execute about 9.3 billion instructions per second.  Instructions that use the VLIW vector unit can do several operations at once.  Probably the most useful case for me involves four simultaneous operations (each operating on a 32-bit number), so that’s a max of 37.3 billion operations per chip.  74.6 billion for both CPUs combined.  Nice.

The challenge comes in keeping the computation units fed.  From reading the documentation, each core has a small amount of directly-accessible memory, and a few megabytes per core of L2 cache, but algorithms manipulating large amounts of data will need continual access to a lot more memory than that… which means rapidly communicating data from the main memory banks to the CPUs. That path appears to have 10.6 GB/sec of bandwidth for each chip (which would total 21 for both chips).  That’s 2.65 GB/sec per core.  A vector register is 128 bits, so each core can do 166 million register loads/saves per second, or one every 14 cycles.  Thus the required arithmetic intensity to keep the chips busy is 14 in the worst case where the cache is ineffective.  I’ll put together some simple benchmark programs to make sure these numbers are right.

So:  it would be best if the core data manipulations of AGI-related algorithms could be expressed as parallel streams of numerical operations on short vectors, with a moderately high (or very high) ratio of computation to memory access.

My new computer will have 16 gigabytes of memory — an arbitrary choice based on budgetary constraints (the memory is $25 per gigabyte).

Peeking a little bit into the future, I will probably replace the machine sometime in 2011.  Unless some unexpected shift occurs in the technology, I can make a pretty good guess as to what the CPUs of that machine will look like:  Each one will have 8 cores and will have 2-way hyperthreading.  The vector registers will increase in size to 256 bits (Intel AVX).  Given a modest improvement in clock speed, this all adds up to maybe 6-8 times the performance for each CPU.  I do not expect memory bandwidth to increase at the same rate, so the required arithmetic intensity will increase.  I hope to be able to afford 64 gigabytes of memory for that machine.

Tags Categories: Computer Hardware Posted By: Derek
Last Edit: 07 Dec 2008 @ 09 46 PM

E-mailPermalinkComments (0)
\/ More Options ...
Change Theme...
  • Role »
  • Posts »
  • Comments »
Change Theme...
  • VoidVoid (Default)
  • LifeLife
  • EarthEarth
  • WindWind
  • WaterWater
  • FireFire
  • LiteLightweight
  • No Child Pages...
  • No Child Pages...
  • No Child Pages...