From Bits to Numbers

It's pretty trivial to represent the state of a light switch with a bit. You just call one of that bit's states "on" and the other "off" and you're done. But what about something more complicated or abstract, like a number?

It's trivial to label one state of a bit "zero" and the other "one", but that alone is very limited. For instance, how does one count to two? And if two is a problem, then imagine counting up to several billion. And forget about how a person is going to count that high with just two symbols - how will a computer do it?

Well, there are two parts to this problem. The first, counting to two when you've only got a zero and a one, is actually fairly easy. That problem was solved long before the first computers, and it works the same way we count past nine - we simply use more digits. This is called the binary numeral system, and you can read all about it on Wikipedia if you're interested.

The second part is where the trick is. We often think of counting as repeatedly adding one to a number, but how is a computer going to add one to a number when it doesn't even know what a number is? A computer works with bits, and "adding" isn't something it can do with a pair of arbitrary symbols.

Algorithms

Algorithms are at the very heart of every program. There are algorithms for storing and retrieving data, for drawing lines and figures, for finding specific elements in massive lists. Some algorithms are incredibly complex, while others are so trivial that we hardly think to call them "algorithms". But what is an algorithm? An algorithm is what bridges the gap between the computer, which can only slavishly manipulate bits, and the people using that computer, who expect it to manipulate abstract things like numbers, strings of text, and images.

Let's continue with our example, counting. These are the operators that can be applied to a single bit:

  • It can set its value, discarding what was there before.
  • It can copy its value into another bit, discarding that bit's old value.
  • It can check the value and, based on what the value is, do either one thing or another.

No one of these operations is powerful enough to add one to a number. However, a series of them together can easily manage it... And this is what an algorithm is. An algorithm is a series of simple steps which accomplish some useful task.

Counting

Now, let's take a step back and think about what it means to count. We typically think of counting as repeatedly "adding one", but this is incorrect. The process of counting actually has very little to do with addition. In fact, one doesn't even need a concept of numbers in order to count. Counting is just an algorithm.

Here, I'll prove it.

We start with an ordered series of two or more distinct symbols. I choose the following symbols: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. Now, these look like numbers, but forget that. They're just symbols. Any other set of squiggles would do. I've picked these ones so that it's instantly apparent how the process works.

We're going to represent the current count using a series of these symbols. This series is also ordered. We can use any order we like so long as it's consistent. I'm, again, going to take the conventional approach. Symbols are going to be stored in a row, and will thus be either to the left or to the right of each other. I'm going to refer to these symbols as digits.

We start by marking down the first of our symbols, which is "0".

Every time we want to "add one", we take the following steps:

  1. Start at the right-most digit in our current count.
  2. Look at the digit's value.
  3. If it is not equal to the last of our symbols (in this case "9"), we replace it with the next symbol in the sequence. And we're done, so we skip the rest of the steps.
  4. If we got to this step, then that digit must be equal to the last of our symbols ("9"). We replace it with the first symbol in the sequence ("0").
  5. Shift the focus one digit to the left.
  6. If there's no digit there (it's an empty space), put in the first symbol in our sequence ("0").
  7. Go back to step 2 and continue.

Try it out. It really does work. Here, I'll demonstrate:

Starting with "24", we want to count up by one. Our right-most digit is "4", which is not equal to the last symbol ("9"). Therefore we swap it out with the next symbol ("5") and we're done, giving us the value "25".

What if we start at "99"? Our right-most digit is "9", which is the last of our symbols. So we replace that digit's value with the first symbol ("0"), giving us "90". Then we look at the next digit over to the left, which also happens to be "9". So, again, we swap that with a "0", giving us "00". This time, when we move one digit to the left, we find that there isn't one, so we put in a "0", and now we have "000". Back to step two again. Only this time, the digit we're looking at is not a "9", so we swap it with the next symbol, giving us "100", and call it quits.

Once More, in Binary

If we can count by slavishly following simple instructions, then so can a computer. Now computers aren't making squiggles on a page, they're manipulating bits, and bits have only two possible values. So we'll use a more convenient set of symbols: 0, and 1.

The second thing is that bits can't be "blank". Now, we could grab some extra bits, one for each of our digits, and call their values "blank" or "not blank", and keep track of "blankness" that way, but it's much easier to just pick a set number of digits (bits) for our count and abort the process if we run out of digits to the left. It does mean that we'll produce the wrong value if we try to count too high, but that can be the programmer's problem.

So the steps simplify to the following:

Start by setting all of our digits (bits) to 0.

Each time we want to "add one", we:

  1. Start at the right-most digit (bit) in our current count.
  2. Look at the bit's value.
  3. If it's 0, we set it to 1 and stop further processing.
  4. If we got to this step, then that digit (bit) must be 1. Set its value to 0.
  5. Shift the focus one digit (bit) to the left.
  6. If we've run out of digits to the left, abort.
  7. Go back to step 2 and continue.

And every step in that algorithm is very easily expressed in terms of the three things we can do to bits.

Other Operations

Now most of what computers do is a good deal more complex than counting. And to make it easier (and more efficient) to program complex algorithms, computers provide us with a built-in way to represent numbers and a set of operations for those numbers. The basic units of arithmetic that most computers implement include general addition, subtraction, multiplication, division, and modulus (finding the remainder) of pairs of integers. Counting is just a simple example illustrating how these operations are implemented.

If you aren't convinced, think back to when you first learned to add large numbers, or when you learned long division. What was the first thing you did? You wrote the numbers down. After that, everything was just a matter of going through the digits in one order or another, performing a few simple operations on each one. This is as easily done in computer memory with bits as it is with paper and a pencil.

In General

The trick is always to find a representation of your information which can easily be manipulated in the ways you'll need.

Letters on paper, for instance, are represented by a given pattern of lines. Letters in a computer are represented using a given sequence of bit values. But we can pick the specific bit sequences with which we represent each letter in order to make one operation or another more efficient. For instance, if the main thing is to quickly identify the vowels in a word, we might pick patterns where all the vowels have their last bit set to one value, and all the consonants the other. If we want to sort more efficiently, then we might pick bit sequences which, when viewed as numbers, represent numbers whose order matches that of the letters in our alphabet.

Finally, keep in mind when working with a computer that it does not understand what you're having it do. All it has are symbols (bits) and a limited set of operations it can perform on them. It's always your job keep track of what the information that you're tracking is, how you're representing it in symbols (bits), and which bits actually hold what aspects of that representation.

You might be storing the state of eight switches in a byte, but if you ask a computer to add one to that byte, it will happily interpret those bits as a number and do so. It has no way telling of apart bits which represent numbers from bits which represent switches or bits which represent letters. Tracking that is all on you!