13 Nov 2008 @ 4:52 AM 
 

Bits

 

The core information processing capabilities of my under-construction new computer consist of simple (but numerous) instructions that apply transformations on stored instances of built-in primitive data types. All the complex operations of computer software are built out of those primitives.

I’ll start with the simplest primitive data type: the bit.

Bits don’t have inherent semantics, but a number of interpretations have proven useful.

  • A bit can represent a distinction between “true” and “false”. Simple logic consists of transformations of bits representing truth values: AND, OR, NOT, XOR, etc.
  • More generically, a bit can represent the presence or absence of some property… like whether something is “alive” or whether something “exists”.
  • The value of a bit can represent two different choices (possibly chosen from a larger set of options). For example, a bit can represent the numbers 0 or 1, or -1 or 1; a bit can represent yes or no, heads or tails, red or green.

Computer programs, coded from the built-in instruction set, define which interpretation is being used.

The vector processors in each core of the CPUs do have some limited support for efficient operation on bits, by which I mean that each 128-bit SSE register can be thought of as 128 individual bits. Put to maximum theoretical use, the combined CPUs can perform 2.4 trillion bit combinations per second. That’s a lot! But organizing data structures and algorithms to get anywhere near that theoretical maximum is very difficult. The biggest problem is that the bits are not convenient to refer to as individuals, but are instead grouped into larger units; that makes it difficult to combine one bit with another arbitrarily-chosen bit. An instruction like AND combines bits from two values in a specific pairwise fashion — bit 0 from value 1 combines with bit 0 from value 2, bit 1 from value 1 combines with bit 1 from value 2, and so on.

Combining arbitrary bits involves extracting the bit of interest from the larger unit then performing the logical operation on the result. Because that sequence requires several instructions per bit combination, the theoretical computing rate becomes something like 5 billion bit combinations per second — more than two orders of magnitude below the optimistic number from the previous paragraph.

One interesting way to work with these addressing difficulties is to think in terms of bit vectors instead of individual bits. If we can usefully define a data type that is an ordered group of individual bits, where each bit is interpreted in a consistent way (like one of the interpretations I listed earlier), then that chunk of bits can be operated on in parallel, up to 128 of them simultaneously per core. The always-evolving SSE instruction set provides some help for dealing with chunks of bits that are 8, 16, 32, 64, or 128 wide. This sounds kind of awkward, but it’s not out of the question to do useful work this way. Here’s a couple of illustrative examples:

  • If each bit represents the presence or absence of some feature (e.g. a perceptual feature), then large feature vectors can be operated on efficiently… matching, measuring differences, and so on. The “Hamming Distance” between two such vectors A and B is a measure of “difference” and can be defined as the number of “1″ bits in A XOR B — sometimes written as POPCNT(A XOR B).
  • Suppose we treat each bit as the numeric value -1.0 or 1.0, then an N-bit value is a normalized vector in N-dimensional space where only certain directions are representable. The dot product of two vectors A and B can be derived from the number of 0 bits in A XOR B — or POPCNT(NOT(A XOR B)).

The vector units have 16 128-bit registers available, called XMM0 - XMM15. Half of these (XMM8 - XMM15) are only available in 64-bit mode and instructions using them are encoded differently, which seems kind of dumb, but that sort of thing is normal in the intel instruction set, which has grown and changed over time like a gnarly old tree.

There are lots of different instructions for loading the XMM registers from the main memory bank and saving them back again. I will need to study these in some detail but will just skip over them for now.

The bit-combination instructions can take one of their operands directly from main memory, but the destination (and the other operand) must be a register. For fun, let’s dive into the details of the instruction encoding — well, fun for me… most of you poor readers will find this horrifying, but I love to sink my teeth into the gory details. Suppose we want to perform a logical AND on the 128-bit registers xmm0 and xmm1 and store the result back into xmm0. That is: xmm0 := xmm0 AND xmm1. Oddly, there are a number of different instructions that can do this, depending on how many chunks we want to break each register into. For the case where each register is treated as a single 128-bit vector, the opcode mnemonic is “PAND”. So the assembly language instruction is “PAND XMM0, XMM1″. Because the “PAND” instruction has a version that operates on the old 64-bit MMX registers, the encoding for this instruction begins with the “prefix byte” 0×66, which is a directive to use an alternate operand size. The opcode for PAND is 0×0F 0xDB. And, finally, instructions like this one use a byte to specify the addressing mode and which registers to use. In this case, the value of that byte is 0xC1, which means that the DEST is xmm0 and the SRC is xmm1. So, the complete encoding of the instruction is: 0×66 0×0F 0xDB 0xC1.

Suppose I wanted to use the registers xmm8 and xmm9 instead of xmm0 and xmm1. This is an example of how the history of the instruction set becomes visible. xmm8 - xmm15 were just recently added, and are in fact only available when the CPU is running in “64-bit mode”, meaning roughly that I need to be running a 64-bit operating system to get at those registers! And, rather than completely redesigning the instruction encoding, there is a new prefix byte that serves several such purposes. In this case, saying to use an “alternate interpretation” for the register numbers. That byte is 0×45. So the encoding for “PAND XMM8, XMM9″ is: 0×66 0×45 0×0F 0xDB 0xC1.

I thought for a while that I’d go through all of the instructions that could be relevant for processing bits or bit-vectors, including transfering them to and from memory and shuffling them within and between registers. Such a project would go on and on for a long while and probably would not be very entertaining. Suffice it to say that there are four logical operations: AND, OR, ANDN, and XOR. There are a LOT of bit-shuffling and load/store permutations. And there are a variety of bit testing instructions to control program flow.

One last point. The POPCNT function mentioned above is pretty useful and SSE version 4.2 has it implemented as a new machine instruction which becomes available in the exciting new “Core i7″ (aka Nehalem) architecture that goes on sale in a few days. My CPUs are from the previous generation so if I end up wanting to use POPCNT in support of bit vector data types for reasons like the ones mentioned earlier, I’ll have to code up a software routine to count the bits.

Tags Categories: Data Types Posted By: Derek
Last Edit: 30 May 2009 @ 01 47 AM

E-mailPermalink
 

Responses to this post » (None)

 

Sorry, but comments are now closed. Check out another post and speak up!

 Comment Meta:
RSS Feed for comments
 

Leave A Comment ...

 

 XHTML:
You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
\/ More Options ...
Change Theme...
  • Role »
  • Posts »
  • Comments »
Change Theme...
  • VoidVoid (Default)
  • LifeLife
  • EarthEarth
  • WindWind
  • WaterWater
  • FireFire
  • LiteLightweight
  • No Child Pages...
  • No Child Pages...
  • No Child Pages...