10 Dec 2008 @ 5:41 AM 
 

Floats

 

Suppose that we take a bunch of bits and break it up into two different pieces (A and B), then interpret the pair as A x 2B, where we interpret A as a fraction between, say, -1 and 1; and B is an integer. This rather complicated way of interpreting a bunch of bits is a powerful idea — yielding numbers with good precision and a wide range of possible values. These are called floating point numbers and are really the workhorses of modern processor chips.

This is because “real numbers” (points from a continuous line of values) are incredibly useful for modelling things, and floating point numbers are for many purposes excellent approximations to real numbers. Suppose that we treat a 32-bit floating point number as a distance measured in inches. The smallest size that number can represent is way smaller than the size of a subatomic particle and the largest size is way bigger than the entire universe. And there’s a pretty decent amount of precision available too (call it 24 bits).

There are two most-common sizes for floating point numbers — 32 bits (called single precision) and 64 bits (called double precision).  The main reason for needing double precision is that 24 bits of precision is sometimes not adequate; it only lets us distinguish between about 16 million values.  That’s plenty for a lot of computation but there are two main cases where it isn’t enough:  those cases where higher precision is simply required, and cases where we have to perform arithmetic on numbers with very different scales.  As an example of the latter, suppose that we want to add 0.001 to a 32-bit floating point value 100000.0.  You’d think that the answer would be 100000.001, but it actually doesn’t work!  With 24 bits of precision, that value cannot be represented, so the answer is just 100000.0.  That is the reason that the single precision version of the Mandelbrot Set program I wrote about in another post cannot zoom in very far.

Besides distances, floats are great at representing other physical quantities (temperature, mass, etc). They are also fine for representing probabilities (although a fixed-point representation is usually better unless the need arises for truly tiny probabilities).

Just as we saw with integers, a huge amount of mathematical structure has been built up around real numbers, and approximating real numbers with a floating point representation gives access for modelling purposes to that big pile of math. To help this along, modern processors include primitive operations for addition, multiplication, and division — and also fancier things like square root, and so on. Large portions of the silicon used to build CPUs are dedicated to fast circuitry for doing these things… if one is interested in keeping a big fraction of the transistors of a CPU busy doing work, using vectors of floating point numbers (either as vectorized data types or data parallel stream processing) is surely the way to go.

That might be a frustrating conclusion if you’re interested in AGI… most peoples’ approaches to thinking about mind are not naturally expressed as vectorized floating point arithmetic. However, as a root level building block of useful modelling abstractions, floats are often excellent (which is why they exist!).

Readers might be wondering why I am plodding along on these really basic considerations, but we have to start somewhere when thinking about using computers for modelling. Our basic fundamental building blocks are bits, integers, and floating point numbers (plus a few other things like null-terminated strings which have a few specialized opcodes). What can we build from these, and what are our motivations for the next layer of complexity on top of raw bits?

Tags Categories: Data Types Posted By: Derek
Last Edit: 10 Dec 2008 @ 01 37 PM

E-mailPermalinkComments (0)
 20 Nov 2008 @ 6:08 PM 

Bits are cool, but we’d really like more expressive data types to help us do more interesting work. We satisfy this desire by grouping bits together in bunches and inventing ways of interpreting these bit vectors.

One way to treat a vector of bits as a unit is to note that there are 2^N possible combinations of N bit values. So, just like a particular bit value can be interpreted as a choice between two options, a particular N-bit value can be interpreted as a choice among 2^N options.

There are lots of different ways to use this idea. We could use 3 bits to represent 8 different colors. We could use a bunch of bits to represent the letters of the alphabet. And so on.

We can also use N bits to represent 2^N different numbers. The most natural way (yes, that’s a pun) is to represent a range of integers. Note that there are lots of other sensible ways of mapping bit patterns onto numbers. For example, we might like to represent fractional numbers (”floating point” or “fixed point”).

Two major ways of picking a range of numbers are most common. 8 bits either map to the values 0..255 or -128..127 (”unsigned” or “signed” integers).

It’s a pretty expensive and drastic thing to do — given a 128-bit register, as bits it can represent 128 different things. But as a single 128-bit integer (with a very large range of potential values), it represents just one thing — a two order-of-magnitude hit on memory use and possible computation speed. C’est la vie: integers are useful so we pay the price. However, there is some significant payoff in using smaller bit vectors rather than larger ones if the smaller vector can represent the range of values needed for some particular task. This is why my CPUs have lots of different instructions for dealing with 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit integers.

An XMM register can hold 16 8-bit integers, which are operated on in a pairwise fashion, just like bits were in a previous post. And also just like bits, addressing individual values is a problem. There are a LOT of instructions for loading, storing, shuffling, and swapping the integer-vectors contained in registers, but the efficiency of information processing algorithms goes up a lot if we don’t have to do that very much. It’s a lot better if we can usefully build meaningful data types that use vectors of integers instead of just individual integers, or if our algorithms can be written in a “data parallel” way — in this case performing 16 parallel computations on 16 mini-registers. I think this is really important for trying to choose efficient representations from the universe of possibilities.

Integers can be used for a lot of different things. They can represent counts of things, (approximate) measurements of physical properties like mass, distances along axes in multidimensional space, and many other mappings. Finally, just like having representations of bits gives us access to abstract systems like boolean logic, having representations of integers gives us access to vast abstract mathematical systems built on top of integers. To make that work, the CPUs include instructions for the basic mathematical operations such as addition, subtraction, multiplication, and division.

I won’t go into the gory details of the instructions themselves or their encoding, but I’ll make a couple of observations. First, the data types are not completely orthogonal — for example, I do not believe that there is an 8-bit multiply. That means a tedious amount of checking processor features will be necessary when making decisions about data types. Second, it’s important to note that not just the vector processors are capable of dealing with integers. The “normal” instruction set has plenty of instructions for working on values stored in “normal” CPU registers. I focus on the SSE subsystem because of the potential performance boost but shuttling data back and forth to the special XMM registers is not always the right way to do things.

Finally, there is an interesting and curious special type of operation available on 16-bit integers (and somewhat on 8-bit) called “saturated” arithmetic. Suppose we have two unsigned 8-bit numbers: 210 and 83. Adding them up should produce the answer 293, but 293 cannot be represented in 8 bits. So what happens? In normal computer arithmetic, the “overflow” is noted in a status register and the high bit is simply discarded, which means that the stored answer is 37. It would be rare for that to be a desirable outcome, so avoiding overflow is usually highly desirable. “Saturated” arithmetic says that since 255 is the biggest value representable, that value should be the result of any computation that overflows to a larger value. Invented for use in DSP applications like audio processing, I think this graceful nonlinear response is really interesting and maps pretty well onto intuitions about numbers. There might be interesting mathematical properties for saturated arithmetic as well that arise from the nonlinearity which could make it an interesting component of some modelling methods.

Tags Categories: Data Types Posted By: Derek
Last Edit: 07 Dec 2008 @ 09 44 PM

E-mailPermalinkComments (0)
 13 Nov 2008 @ 4:52 AM 
 

Bits

 

The core information processing capabilities of my under-construction new computer consist of simple (but numerous) instructions that apply transformations on stored instances of built-in primitive data types. All the complex operations of computer software are built out of those primitives.

I’ll start with the simplest primitive data type: the bit.

Bits don’t have inherent semantics, but a number of interpretations have proven useful.

  • A bit can represent a distinction between “true” and “false”. Simple logic consists of transformations of bits representing truth values: AND, OR, NOT, XOR, etc.
  • More generically, a bit can represent the presence or absence of some property… like whether something is “alive” or whether something “exists”.
  • The value of a bit can represent two different choices (possibly chosen from a larger set of options). For example, a bit can represent the numbers 0 or 1, or -1 or 1; a bit can represent yes or no, heads or tails, red or green.

Computer programs, coded from the built-in instruction set, define which interpretation is being used.

The vector processors in each core of the CPUs do have some limited support for efficient operation on bits, by which I mean that each 128-bit SSE register can be thought of as 128 individual bits. Put to maximum theoretical use, the combined CPUs can perform 2.4 trillion bit combinations per second. That’s a lot! But organizing data structures and algorithms to get anywhere near that theoretical maximum is very difficult. The biggest problem is that the bits are not convenient to refer to as individuals, but are instead grouped into larger units; that makes it difficult to combine one bit with another arbitrarily-chosen bit. An instruction like AND combines bits from two values in a specific pairwise fashion — bit 0 from value 1 combines with bit 0 from value 2, bit 1 from value 1 combines with bit 1 from value 2, and so on.

Combining arbitrary bits involves extracting the bit of interest from the larger unit then performing the logical operation on the result. Because that sequence requires several instructions per bit combination, the theoretical computing rate becomes something like 5 billion bit combinations per second — more than two orders of magnitude below the optimistic number from the previous paragraph.

One interesting way to work with these addressing difficulties is to think in terms of bit vectors instead of individual bits. If we can usefully define a data type that is an ordered group of individual bits, where each bit is interpreted in a consistent way (like one of the interpretations I listed earlier), then that chunk of bits can be operated on in parallel, up to 128 of them simultaneously per core. The always-evolving SSE instruction set provides some help for dealing with chunks of bits that are 8, 16, 32, 64, or 128 wide. This sounds kind of awkward, but it’s not out of the question to do useful work this way. Here’s a couple of illustrative examples:

  • If each bit represents the presence or absence of some feature (e.g. a perceptual feature), then large feature vectors can be operated on efficiently… matching, measuring differences, and so on. The “Hamming Distance” between two such vectors A and B is a measure of “difference” and can be defined as the number of “1″ bits in A XOR B — sometimes written as POPCNT(A XOR B).
  • Suppose we treat each bit as the numeric value -1.0 or 1.0, then an N-bit value is a normalized vector in N-dimensional space where only certain directions are representable. The dot product of two vectors A and B can be derived from the number of 0 bits in A XOR B — or POPCNT(NOT(A XOR B)).

The vector units have 16 128-bit registers available, called XMM0 - XMM15. Half of these (XMM8 - XMM15) are only available in 64-bit mode and instructions using them are encoded differently, which seems kind of dumb, but that sort of thing is normal in the intel instruction set, which has grown and changed over time like a gnarly old tree.

There are lots of different instructions for loading the XMM registers from the main memory bank and saving them back again. I will need to study these in some detail but will just skip over them for now.

The bit-combination instructions can take one of their operands directly from main memory, but the destination (and the other operand) must be a register. For fun, let’s dive into the details of the instruction encoding — well, fun for me… most of you poor readers will find this horrifying, but I love to sink my teeth into the gory details. Suppose we want to perform a logical AND on the 128-bit registers xmm0 and xmm1 and store the result back into xmm0. That is: xmm0 := xmm0 AND xmm1. Oddly, there are a number of different instructions that can do this, depending on how many chunks we want to break each register into. For the case where each register is treated as a single 128-bit vector, the opcode mnemonic is “PAND”. So the assembly language instruction is “PAND XMM0, XMM1″. Because the “PAND” instruction has a version that operates on the old 64-bit MMX registers, the encoding for this instruction begins with the “prefix byte” 0×66, which is a directive to use an alternate operand size. The opcode for PAND is 0×0F 0xDB. And, finally, instructions like this one use a byte to specify the addressing mode and which registers to use. In this case, the value of that byte is 0xC1, which means that the DEST is xmm0 and the SRC is xmm1. So, the complete encoding of the instruction is: 0×66 0×0F 0xDB 0xC1.

Suppose I wanted to use the registers xmm8 and xmm9 instead of xmm0 and xmm1. This is an example of how the history of the instruction set becomes visible. xmm8 - xmm15 were just recently added, and are in fact only available when the CPU is running in “64-bit mode”, meaning roughly that I need to be running a 64-bit operating system to get at those registers! And, rather than completely redesigning the instruction encoding, there is a new prefix byte that serves several such purposes. In this case, saying to use an “alternate interpretation” for the register numbers. That byte is 0×45. So the encoding for “PAND XMM8, XMM9″ is: 0×66 0×45 0×0F 0xDB 0xC1.

I thought for a while that I’d go through all of the instructions that could be relevant for processing bits or bit-vectors, including transfering them to and from memory and shuffling them within and between registers. Such a project would go on and on for a long while and probably would not be very entertaining. Suffice it to say that there are four logical operations: AND, OR, ANDN, and XOR. There are a LOT of bit-shuffling and load/store permutations. And there are a variety of bit testing instructions to control program flow.

One last point. The POPCNT function mentioned above is pretty useful and SSE version 4.2 has it implemented as a new machine instruction which becomes available in the exciting new “Core i7″ (aka Nehalem) architecture that goes on sale in a few days. My CPUs are from the previous generation so if I end up wanting to use POPCNT in support of bit vector data types for reasons like the ones mentioned earlier, I’ll have to code up a software routine to count the bits.

Tags Categories: Data Types Posted By: Derek
Last Edit: 30 May 2009 @ 01 47 AM

E-mailPermalinkComments Off
\/ More Options ...
Change Theme...
  • Role »
  • Posts »
  • Comments »
Change Theme...
  • VoidVoid (Default)
  • LifeLife
  • EarthEarth
  • WindWind
  • WaterWater
  • FireFire
  • LiteLightweight
  • No Child Pages...
  • No Child Pages...
  • No Child Pages...