Instruction Set

Here is what I have for the instruction set so far. There is plenty of room for expansion. I think the weak area is the comparison instruction. I’m not sure if I’ll add comparing with a literal or comparing other registers yet.

Latching ALU arguments

My ALU design requires to clock cycles to load the arguments. In the first clock cycle, the first argument is placed on the data bus and latched by the argument latch. In the second clock cycle, the second argument is placed on the bus and the ALU is enabled.

Here is a simulation of the circuit:


Set A, 5
Set B, 7
Add A, B (2 clock cycles)
  -> moves A to latch
  -> adds B and latch; stores result in A
Result: 5 + 7 = 0xC

High level architecture


The internal CPU architecture is 4 bits, but the address space is 8 bits.

I included two buses. There is a data bus for moving data inside the CPU and a system bus for communicating with main memory and other components. The system bus can only read and write from Register A. The address to read/write to is dictated by a combination of the segment register (SR) and bus address register (BA). I/O devices will be memory mapped.

Instructions are stored in a dedicated memory with 8 bit address space. The instruction pointer (IP, 8 bits) controls the next address to load. Instructions are 8 bits each. The upper bits are loaded into the instruction register and the lower bits are loaded into the argument register. Only the control logic can access the instruction, but if the argument is a literal, it may be written to the data bus. Alternatively the argument can be used by the control logic to direct ALU or system bus activity.

Because the instruction address space is 8 bits, but an instruction can only have a 4 bit argument, I’m including a jump segment register (JSR) to hold the high bits of the jump address.

Register A is used as the accumulator for the ALU. There will be an input latch inside the ALU so two arguments can be successively loaded from the data bus. The first argument will be latched and the second will be input directly from the bus. Lastly, there is a 4 bit flags register which is controlled by the ALU and readable by the control logic.

I’m building my own CPU

I’ve decided to fulfill my childhood dream of building a CPU from scratch. I’m fascinated by the fundamentals of how computers, math, and nature work so this project is a natural fit for me. Originally I was going to use 7400 logic chips, but after finding several blogs of people who have used discrete components, I’ve decided to go that route. I know that will increase the complexity, cost, and reduce the operating speed, but it will also give me a chance to dig deeper into the subject.

The ALU will be 4 bits, but I’ll use a segment register to achieve an 8 bit address bus. If I find this too restricting, I may move to a 12 bit bus, but for starting out I’ll limit myself to 8 bits even though that doesn’t give me much address space.

The instruction memory will be separate from data memory. I’ll initially use a memory chip so I can focus on the CPU architecture. Eventually I would like to try building out a small amount of SRAM and DRAM to learn about it.

I/O operations will be memory mapped. I’ll reserve some of the address segments I/O devices.

Instructions will be 8 bits. The upper 4 bits will be instruction only and the lower 4 bits will either be a literal or a command for the ALU. The supported ALU operations will take 3 instructions to encode: addition, comparison, and the four boolean operations: not, and, or, xor. There will be four data registers (A, B, C, D) with A being used as the accumulator.

I/O operations will always read and write to the accumulator register.

There are five additional control registers:

  • Instruction Pointer (8 bits)
  • Jump Segment Register (4 bits) – upper 4 bits of jump address
  • Segment Register (4 bits) – upper 4 bits of address bus
  • Memory Address (4 bits) – lower 4 bits of address bus
  • Flags (4 bits) – Set by the ALU: overflow, greater than, less than, zero

For I/O I intend to support a serial port, GPIO ports, and two ADCs.

Inspired by the Apollo Guidance Computer, I’m going to use RTL logic. My base building block will be NOR/NOT gates. Because RTL is power hungry, my goal is to keep power consumption low at the cost of speed. At this point I don’t know what a realistic clock rate will be (50 KIPS?) or what kinds of programs I’ll be able to run, but I envision the computer to basically be more of an instrument controller than a general purpose CPU. I won’t be implementing interrupts or a stack so I’m very limited in program architecture.

In addition to instrument control, two stretch projects would be outputting vector graphics to an X-Y scope (via the ADCs) and keeping a quadcopter level. (How can I be inspired by the Apollo guidance computer and not try to land something?)

An introduction to digital computing

This article is an introduction the basic concepts of digital computers.  I’ll cover how to do arithmetic on any numbers with just 1s and 0s, what a logic gate is, how to combine logic gates to do arithmetic and store the results. I’ll also show how logic gates can be implemented with electrical circuits. Finally I’ll introduce instructions and show how computers execute them.

The most fundamental operation in arithmetic is addition. We learned how to do long hand arithmetic as students in elementary school. We can add the numbers 27 and 15 as follows.


In the ones column, add 7+5 = 12. We put the 2 from the 1s place in the result and carry the 1 from the 10s place above the 10s column. To get the 10s digit of the result we then sum the carry digit (1) with 2 and 1 to get 4. The result is 42.

The above is called base 10 arithmetic because each digit can have one of 10 values (0 through 9). For reasons that will become obvious later, modern computers do not represent numbers in base 10. Instead they use what is called base 2 (or binary) arithmetic. In base 2, each digit can only have the value 0 or 1. Instead of each number having a 1’s place, 10’s place, 100’s place, etc, we use powers of 2, so we have the 1’s place, 2’s place, 4’s place, 8’s place, etc.

Here are the same numbers from above written in expanded base 10 and base 2 forms. See how you can use the same pattern to form the same numbers in each base?


Now let’s look at what the arithmetic would look like using base 2 numbers. Note that when we add 1+1 in base 2, the result is not 2, but rather 10. We put the 0 in the result row and carry the 1 to the next column.

Using the rules 1+0=1, 1+1=10, and 1+1+1=11, we can solve the previous arithmetic in base 2.


We’ll see these rules again in a few minutes.

The next fundamental subject in computation is logic gates. Logic gates represent devices (either mechanical or electronic) which can be used to carry out simple logic. And I mean really simple. The three simplest logic operators are AND, OR, and NOT.

Logic operators operate on variables (inputs or outputs). We conventionally allow a variable to be in either the TRUE or FALSE state. When we implement an electronic logic gate we call these states HIGH and LOW. You might have already guessed, but when we combine logic gates to perform computations, we consider these states to be 1 (TRUE) and 0 (FALSE).

The NOT operator is the simplest. It has one input and one output. The output is always the negation of the input. We can write this in a truth table. A truth table shows all combinations of inputs and outputs. Each row forms a true statement.

1 0
0 1

So, the first row means “NOT TRUE is FALSE”. The second row means “NOT FALSE is TRUE”. Simple stuff.

The AND operator has two inputs (called A and B) and one output. The AND operator’s output is FALSE unless both inputs are TRUE. If one input is TRUE and one is FALSE, the output is FALSE.

0 0 0
1 0 0
0 1 0
1 1 1

The OR operator also has two inputs and one output. When either input is TRUE, the output is also TRUE. When both inputs are FALSE, the output is FALSE.

0 0 0
1 0 1
0 1 1
1 1 1

Did you notice that when both inputs of the OR operator are TRUE, the output is also TRUE? There is a second type of OR operator for which this is not the case. The exclusive OR (written XOR) is TRUE only when one input is TRUE and one input is FALSE. This is a very useful operator.

0 0 0
1 0 1
0 1 1
1 1 0


These gates can be combined to form a few additional commonly used gates.

The NAND gate is a combination of and AND gate with a NOT gate on the output. This negates the output of the AND operation.

0 0 1
1 0 1
0 1 1
1 1 0

The two remaining gates are NOR (OR + NOT) and XNOR (XOR + NOT).

We can combine logic gates to perform what is called Boolean arithmetic. For example, the following is a statement in logic.


Sometimes it can be helpful to visually express logic statements. Here are the symbols typically used for logic gates. Inputs are on the left and outputs on the right.

Screen Shot 2015-09-03 at 9.59.42 PM

So the above statement can be represented graphically as follows.

The A and B inputs go to the AND gate. The C input goes into a NOT gate. The outputs of the AND and NOT gates go into the OR gate and the output of the OR gate is the result.

Let’s look at how we can combine logic gates to form a system of equations.

Let S = A XOR B
Let C = A AND B

This can be expressed graphically as:

Half Adder

Since both statements have the same input variables (A and B), we can write them in a unified truth table.

0 0 0 0
1 0 0 1
0 1 0 1
1 1 1 0

This table is equivalent to the statement A + B = C*2^1 + S*2^0. The right hand side of the equation should look familiar.

A + B = C*2 + S
0 + 0 =  0
1 + 0 =  1
0 + 1 =  1
1 + 1 = 10

This is base 2 arithmetic implemented in logic!

Now we don’t quite have enough power to solve the original problem I gave you. The above system is called the Half Adder. The half adder accepts two input values and outputs a sum and a carry value. In order to sum numbers with multiple digits, we need to be able the accept three inputs: A, B, and a carry from the previous column. The output will be the sum plus a carry, for a total of five variables. The system that can do this is called a Full Adder. It is implemented as follows:

Let S = A XOR B XOR Cin
Let Cout = (A AND B) OR (Cin AND (A XOR B))

Which can be expressed graphically as:

Full Adder

The truth table for the full adder is repetitive, but can be simplified to the statements:

0 + 0 =  0
0 + 1 =  1
1 + 1 = 10
  1 + 1 + 1 = 11

By chaining a series of full adders together (taking Cout from the previous adder as Cin on the next), we can carry out multiple digit binary arithmetic.

4 bit adder

(Image credit: Wikipedia)

The full adder is one of the primary building blocks of computers. Though tricks of notation and clever combinations with other gates, binary logic can also be made to perform subtraction, multiplication, division, and comparing numbers.

Logic gates can also be used to control other circuits based on a fixed set of rules. A good introductory logic system is the 7 segment LED found in clocks, radios, DVD players, and many other places. The logic controlling these displays converts a binary encoded number into 7 output signals which are used to turn on or off each segment of the display. There are 4 inputs representing the 4 binary digits necessary to represent 0 through 9. So for example, when the inputs are 0110 (meaning 6), the output of the gates will turn on all the segments except the top right one.



The basic idea behind designing these circuits is to start by writing a truth table for all possible relationships between the input and output variables, and then combining logic gates to create those values. Here the inputs are labeled A-D and the outputs a-d. The decimal point above is not included.

7 segment table

(Source: Texas Instruments

It can get complicated very quickly, though several techniques exist to find simplified truth tables. Here is what the complete 7 segment control system looks like. The blanking input is used to turn off all the outputs no matter what the other 4 inputs are. It’s very useful if you’d like to show a flashing digit.

7 segment

(Source: ibid)

This concept can be extended to any system that needs to turn other circuits on and off based on an input value.

Now that you’ve become familiar with logic gates as abstract concepts, let’s look at how we can implement them using electronic components.

There are many unique ways to build logic gates from electrical components, but the component of choice for modern computers is the transistor. The transistor is a type of semiconductor discovered at Bell Labs in the 1940’s. They have properties which can be used to amplify signals (like an audio amplifier) or to switch circuits on and off very quickly. They can also be made extremely tiny. A modern CPU contains several billion transistors etched on a wafer of silicon.


A transistor has three pins called the collector, base, and emitter. The base acts as a switch to turn on a circuit connecting the collector to the emitter. When a LOW signal is sent to the base, the collector-emitter switch is off. But when a HIGH signal is sent to the base, the collector-emitter switch is on.

Transistor Switch

The other component used in these circuits is a resistor. The purpose of a resistor is to limit the flow of current so the circuit does not short out and burn up or catch fire! You must have a resistor between a HIGH point and a LOW point in a circuit, but just because a circuit has a resistor does not mean it changes a HIGH signal into a LOW one.


Each gate circuit has a HIGH source, LOW source, at least one input, and one output. We can understand how the input affects the output by looking at the complete or incomplete circuits made by the transistor switches.

Here is a resistor-transistor implementation of a NOT gate. You can ignore the resistor on the input line. It’s purpose is to protect the transistor from burning out.

NOT Gate

When the input is LOW, the transistor switch is off, breaking the connection between the resistor and the LOW source. The resulting circuit looks like this:


There is a complete circuit only from HIGH to output, so the output is HIGH. The LOW signal is disconnected from everything since no signal can pass through an open switch.

When the input is HIGH, the transistor switch is closed. In this case we have to consider the placement of the output pin relative to the resistor because one side of the resistor has a HIGH signal and the other side is LOW. The output pin is on the LOW side of the resistor so the output will be LOW.


Next let’s look at the AND gate.

AND Gate

When A or B (or both) are LOW, there is an open switch between HIGH and the resistor, thus the remaining circuit only connects output to LOW so the output is LOW. When both A and B are HIGH, then both switches are closed, connecting HIGH to the resistor. Since the output pin is on the HIGH side of the resistor, the output will be HIGH. This behavior matches the truth table shown earlier.

Similar circuits can be made for each gate. Here are schematics for the OR and NAND gates. Note that the crossed lines in the OR gate only indicate a connection if there is a dot. Doesn’t NAND look a lot like AND? Try tracing the signals through these circuits.

OR Gate

OR Gate



Everything we’ve looked at so far is called combinatorial logic. Combinatorial logic allows us to perform addition, multiplication, and turn circuit components on and off, but there is another critical function computers need. The logic gates demonstrated above only provide an output as long as an input state is provided. The input may correspond to a button press or to a value stored in the computer’s memory. But how can the computer remember a key was pressed after the key is released, and how can computers remember numbers at all? The key to computer memory is a special combination of logic gates called the flip flop.

A flip flop is a pair of coupled NOR gates which feed the output of each gate into an input of the other gate. The two remaining inputs are conventionally called Set (S) and Reset (R). Once a HIGH signal is sent to SET, the flip flop’s output will output HIGH continuously, even after the HIGH signal is removed from SET. This is called a bistable circuit because the output does not depend on an input signal being provided. When a HIGH signal is sent to the Reset input, the output state will go to the LOW state and remain there until a HIGH signal is received on the Set input.


Flip Flop

In a computer, flip flops (called registers in a CPU) sit in between groups of logic gates to act as a buffer for the results from each step of a computation. Setting and resetting of the flip flops occurs at a regular interval which is driven by a clock signal, a system-wide alternating pulse of HIGH and LOW signals. In a modern computer, the clock cycles more than a billion times per second!

So how do we get from gates and flip flops to a programmable computer?

Computers are different from fixed logic devices such as clocks because they can execute instructions written after the circuit is built. They can do this because internally they contain many separate circuits which can carry out fixed operations such as addition, multiplication, comparing numbers, and copying numbers between registers. These circuits are turned on and off by some control circuitry based on coded instructions stored in the instruction registers. The whole process is timed based on the computer’s central clock.

Here is a simple computer architecture. It’s called the Von Neumann architecture and it’s used by most computers today.

Von Neumann

  • Instructions: This is an array of registers which hold coded instructions for the computer to follow. Each instruction is coded as number.
  • Instruction pointer: This is a register which tells the CPU the address of the next instruction to execute. Each time the computer reads an instruction, it increments this value by 1 so it knows to read the next instruction on the next clock cycle.
  • Controller: This decodes the current instruction from the instruction registers and activates the other circuitry to execute the hardwired logic for that instruction, much like the LED logic shown earlier.
  • Arithmetic logic unit (ALU): This contains the logic circuits that can carry out mathematical operations.
  • Logic registers (AX, BX, CX, DX): These are registers which can be accessed by the ALU. It is much faster to access a register than main memory. Registers are referred to by their name when writing instructions.
  • Main memory/system bus: This lets the CPU talk to the main memory or other devices attached to the computer like a keyboard, mouse, monitor, or network port. Numbers stored in memory must be copied to a register before they can be used by the ALU. Only a few registers are available, but memory can hold up to billions of values.

The instructions that a computer can follow are very simple. Here are enough instructions (with arguments written in parentheses) to write simple programs.

  • INPUT (register, address)
    • Copies the value stored at address in the main memory and copies it to the named register.
  • OUTPUT (address, register)
    • Copies the value in the named register to the specified main memory address.
  • SET (register, value)
    • Sets the named register to the specified value.
  • ADD (register1, register2)
    • Adds the values of register1 and register2 and stores the result in register1.
  • JUMP (line_number)
    • Changes the instruction pointer’s value to line_number.
  • COMPARE-JUMP (register, value, line_number)
    • Compares the value in the named register with the specified value. If they are equal, it changes the instruction pointer’s value to line_number. This lets a program change the order of execution of instructions in response to input data.
  • HALT
    • Indicates that this is the last instruction. Stop. Early computers would ring a bell or beep to let the user know they were finished.

Let’s use these instructions to write a program that can add up a long series of numbers. I’ll make two assumptions about the input.

  1. The first number of the input will be the total number of items in the input.
  2. Some process which we don’t care about writes the numbers into the computer’s memory. This could be copying from a file, the internet, or a user manually typing them in.

Here is what the computer’s memory will look like before the program runs. I only included three numbers for brevity, but this could be anywhere from zero numbers to millions, limited only by how big the computer’s memory is.

Address Value
1000 3
1001 183
1002 492
1003 266

In the program, I use all four available registers. I use each register for a specific purpose.

  • AX = the running total of the list (called the “accumulator”)
  • BX = the next memory address to read
  • CX = counter for how many more memory address to read
  • DX = register to load the current list value into

The basic structure of the program is some setup code, followed by a loop which does the summing of the list, and finally writing the result to main memory. The main loop structure is created by using two COMPARE-JUMP and JUMP instructions. The COMPARE-JUMP decides when to leave the loop. The JUMP instruction restarts the loop at the end.

Here is the program code:

1 SET BX, 1000 Store the starting memory location in register BX
2 INPUT CX, BX Copy first memory location to counter register (CX)
3 ADD BX, 1 Increment BX to next memory address
4 INPUT AX, 0 Initialize an accumulator register (AX) to 0
5 COMPARE-JUMP CX, 0, 11 If the counter equals zero, jump to instruction 11
6 INPUT DX, BX Copy the number to add to register DX
7 ADD AX, DX Add DX to the accumulator register AX
8 ADD BX, 1 Increment BX to next memory address
9 ADD CX, -1 Decrement the counter register CX
10 JUMP 5 Go to instruction 5
11 OUTPUT BX, AX Copy the accumulator register to main memory

Raw computer instructions can be difficult to understand. Most programs are written in a more expressive format and then converted to computer instructions by another program called a compiler. In pseudocode the above instructions would look like:

int address = 1000;
int counter = list [ address ]; // read from list
address = address + 1;
int sum = 0;
while(counter > 0)
	int value = read [ address ];
	sum = sum + value;
	address = address + 1;
	counter = count - 1;
list [ address ] = sum; // write to list


I assembled the following table showing how the computer’s registers change with each instruction. Notice that only one register changes in each instruction. (With the exception of the instruction pointer which always increments unless there was a jump instruction.)

1 SET BX, 1000 1000 2
2 INPUT CX, BX 1000 3 3
3 ADD BX, 1 1001 3 4
4 INPUT AX, 0 0 1001 3 5
5 COMPARE-JUMP CX, 0, 11 0 1001 3 6 No jump
6 INPUT DX, BX 0 1001 3 183 7
7 ADD AX, DX 183 1001 3 183 8
8 ADD BX, 1 183 1002 3 183 9
9 ADD CX, -1 183 1002 2 183 10
10 JUMP 5 183 1002 2 183 5 Jump
5 COMPARE-JUMP CX, 0, 11 183 1002 2 183 6 No jump
6 INPUT DX, BX 183 1002 2 492 7
7 ADD AX, DX 675 1002 2 492 8
8 ADD BX, 1 675 1003 2 492 9
9 ADD CX, -1 675 1003 1 492 10
10 JUMP 5 675 1003 1 492 5 Jump
5 COMPARE-JUMP CX, 0, 11 675 1003 1 492 6 No jump
6 INPUT DX, BX 675 1003 1 266 7
7 ADD AX, DX 941 1003 1 266 8
8 ADD BX, 1 941 1004 1 266 9
9 ADD CX, -1 941 1004 0 266 10
10 JUMP 5 941 1004 0 266 5 Jump
5 COMPARE-JUMP CX, 0, 11 941 1004 0 266 11 Jump
11 OUTPUT BX, AX 941 1004 0 266 12 Write 941 to main memory at address 1004
12 HALT 941 1004 0 266 Done

The output of the program is 183 + 492 + 266 = 941.

So that’s it. I hope this gave you some insight to how computers work. Of course I brushed over many of the complexities. I did not cover how to do multiplication or division, or how to represent negative numbers (there’s a very elegant way to do this called two’s complement), or how to compare numbers with logic gates. The model of the transistor I presented is very simple and does not account for power consumption or chaining multiple gates together. Within the computer’s logic there are more complicated structures as well such as clocked flip flops, 3-state gates, counters, dividers, multiplexers, and comparators. These are all topics beyond an introductory post, but none are too difficult to understand.

If this is a topic that interests you and you’d like to learn more, check out this free online course from MIT.

Cantor’s diagonal argument

Cantor’s diagonal argument is a beautiful proof of the uncountability of the real numbers.

First we need some definitions.

Two sets are said to be one-to-one if each member of the first set maps to exactly one member of the second set. We say a set maps onto another set if each member of the first set maps onto a unique member of the second set. If two sets are both one-to-one and onto, then we say they have one-to-one correspondence. For example, the set of uppercase letters and the set of lowercase letters have one-to-one correspondence.

We say a set is countable if it has a finite number of elements or if it has a one-to-one correspondence with the set of positive integers. Essentially, if you can count the members of the set, even if you’d have to count forever, the set is countable.

Cantor’s diagonal argument uses proof by contradiction. That means we’ll start with an assumption and then demonstrate how that assumption leads to a logical contradiction.

Assume the set of real numbers is countable. The set of real numbers is infinite so it must have one-to-one correspondence with the set of positive integers.

If the set of real numbers is countable, then it follows that the subset of real numbers from zero to one is also countable. (If you can count everything in the set, then you can count a subset of the set.)

Since we are assuming the set of real numbers is countable, it must be possible to create a numbered list the real numbers. (This sounds like counting!) Let this list be designated {s1, s2, s3, …}. Here is the start of one such list. The numbers chosen are not significant.


We must accept that this infinitely long list contains all real numbers between zero and one.

Next, let’s define a number a which will be formed by using the following rule:

The n-th digit of a will not be the same number as the n-th digit of sn.

Here is one possible construction of a. Each digit of a is chosen to be different than the bracketed number in that column.


We can clearly see that a is a real number in the range zero to one, so there must be a number sn which equals a. However, by definition, the n-th digit of a cannot equal the n-th digit of sn. So a is not a member of our list. We have just shown that the set of positive integers does not map onto the set of real numbers. Thus the set of positive integers does not have one-to-one correspondence with the set of real numbers.

This is a contradiction with our original assumption that the set of real numbers is countable.

Thus the set of real numbers is not countable.

Simulating a half adder

I’m starting to simulate larger circuits in LTSpice. Here’s my attempt at a half adder. Obviously I need to work on my layout skills. It started off well, but the components were too close together.

Total current consumption of the half adder with both inputs high and LEDs on the outputs is 9.4mA.

The circuit is constructed from RTL inverter gates.

half adder

Flipping a sprite in DirectX

I have a sprite which faces to the right. I wanted to make it face left when moving left. After reading a dozen-plus forum threads without finding an answer, I’m posting this for reference. I’m a DirectX noob so this may not be the best solution, but it worked for me.

D3DXVECTOR2 position(caveman.x, caveman.y);
D3DXVECTOR2 spriteCenter(caveman.width / 2.0f, caveman.height / 2.0f);
D3DXVECTOR2 scaling(caveman.faceLeft ? -1 : 1, 1);
D3DXMATRIX matrix;
D3DXMatrixTransformation2D(&matrix, &spriteCenter, 0, &scaling, NULL, 0, &position);

Also, you will need to turn off culling on your DirectX device object.

d3ddev->SetRenderState(D3DRS_CULLMODE, D3DCULL_NONE);

The Feynman Algorithm

Excerpted from:

1. Write down the problem.

2. Think real hard.

3. Write down the solution.

How to use the Feynman Algorithm

1. Write the problem down in an unambiguous way. Often this is just as hard as the next step. Indeed, really, really understanding the problem is sometimes the only hard bit. Once you really, really understand the problem, the answer may be obvious.

2. Become convinced it’s important. Really important. Think about odd ways to solve it, things you wouldn’t tell other people for fear of being laughed into the next century. Look at simple things. Look at really complicated intricate solutions. Then talk to others. Talking to others will allow you to crystallize some of the ideas you have, and produce more ideas for you to think about. Repeat until you have an answer you can write down. If you do this right, immediately before you come up with the answer people will think you’re almost obsessed with the problem and the answer to it.

Note: If you don’t have people to talk to, write down some intermediate results or something to make them become real.

Some problems don’t have answers, only compromises, or proofs of impossibility. These are also valid answers if you can show that a real answer doesn’t exist.

3. Write the answer.