Here is what I have for the instruction set so far. There is plenty of room for expansion. I think the weak area is the comparison instruction. I’m not sure if I’ll add comparing with a literal or comparing other registers yet.

# Month: September 2015

## Latching ALU arguments

My ALU design requires to clock cycles to load the arguments. In the first clock cycle, the first argument is placed on the data bus and latched by the argument latch. In the second clock cycle, the second argument is placed on the bus and the ALU is enabled.

Here is a simulation of the circuit:

## Logisim

## High level architecture

The internal CPU architecture is 4 bits, but the address space is 8 bits.

I included two buses. There is a data bus for moving data inside the CPU and a system bus for communicating with main memory and other components. The system bus can only read and write from Register A. The address to read/write to is dictated by a combination of the segment register (SR) and bus address register (BA). I/O devices will be memory mapped.

Instructions are stored in a dedicated memory with 8 bit address space. The instruction pointer (IP, 8 bits) controls the next address to load. Instructions are 8 bits each. The upper bits are loaded into the instruction register and the lower bits are loaded into the argument register. Only the control logic can access the instruction, but if the argument is a literal, it may be written to the data bus. Alternatively the argument can be used by the control logic to direct ALU or system bus activity.

Because the instruction address space is 8 bits, but an instruction can only have a 4 bit argument, I’m including a jump segment register (JSR) to hold the high bits of the jump address.

Register A is used as the accumulator for the ALU. There will be an input latch inside the ALU so two arguments can be successively loaded from the data bus. The first argument will be latched and the second will be input directly from the bus. Lastly, there is a 4 bit flags register which is controlled by the ALU and readable by the control logic.

## I’m building my own CPU

I’ve decided to fulfill my childhood dream of building a CPU from scratch. I’m fascinated by the fundamentals of how computers, math, and nature work so this project is a natural fit for me. Originally I was going to use 7400 logic chips, but after finding several blogs of people who have used discrete components, I’ve decided to go that route. I know that will increase the complexity, cost, and reduce the operating speed, but it will also give me a chance to dig deeper into the subject.

The ALU will be 4 bits, but I’ll use a segment register to achieve an 8 bit address bus. If I find this too restricting, I may move to a 12 bit bus, but for starting out I’ll limit myself to 8 bits even though that doesn’t give me much address space.

The instruction memory will be separate from data memory. I’ll initially use a memory chip so I can focus on the CPU architecture. Eventually I would like to try building out a small amount of SRAM and DRAM to learn about it.

I/O operations will be memory mapped. I’ll reserve some of the address segments I/O devices.

Instructions will be 8 bits. The upper 4 bits will be instruction only and the lower 4 bits will either be a literal or a command for the ALU. The supported ALU operations will take 3 instructions to encode: addition, comparison, and the four boolean operations: not, and, or, xor. There will be four data registers (A, B, C, D) with A being used as the accumulator.

I/O operations will always read and write to the accumulator register.

There are five additional control registers:

- Instruction Pointer (8 bits)
- Jump Segment Register (4 bits) – upper 4 bits of jump address
- Segment Register (4 bits) – upper 4 bits of address bus
- Memory Address (4 bits) – lower 4 bits of address bus
- Flags (4 bits) – Set by the ALU: overflow, greater than, less than, zero

For I/O I intend to support a serial port, GPIO ports, and two ADCs.

Inspired by the Apollo Guidance Computer, I’m going to use RTL logic. My base building block will be NOR/NOT gates. Because RTL is power hungry, my goal is to keep power consumption low at the cost of speed. At this point I don’t know what a realistic clock rate will be (50 KIPS?) or what kinds of programs I’ll be able to run, but I envision the computer to basically be more of an instrument controller than a general purpose CPU. I won’t be implementing interrupts or a stack so I’m very limited in program architecture.

In addition to instrument control, two stretch projects would be outputting vector graphics to an X-Y scope (via the ADCs) and keeping a quadcopter level. (How can I be inspired by the Apollo guidance computer and not try to land something?)

## An introduction to digital computing

This article is an introduction the basic concepts of digital computers. I’ll cover how to do arithmetic on any numbers with just 1s and 0s, what a logic gate is, how to combine logic gates to do arithmetic and store the results. I’ll also show how logic gates can be implemented with electrical circuits. Finally I’ll introduce instructions and show how computers execute them.

The most fundamental operation in arithmetic is addition. We learned how to do long hand arithmetic as students in elementary school. We can add the numbers 27 and 15 as follows.

In the ones column, add 7+5 = 12. We put the 2 from the 1s place in the result and carry the 1 from the 10s place above the 10s column. To get the 10s digit of the result we then sum the carry digit (1) with 2 and 1 to get 4. The result is 42.

The above is called base 10 arithmetic because each digit can have one of 10 values (0 through 9). For reasons that will become obvious later, modern computers do not represent numbers in base 10. Instead they use what is called base 2 (or binary) arithmetic. In base 2, each digit can only have the value 0 or 1. Instead of each number having a 1’s place, 10’s place, 100’s place, etc, we use powers of 2, so we have the 1’s place, 2’s place, 4’s place, 8’s place, etc.

Here are the same numbers from above written in expanded base 10 and base 2 forms. See how you can use the same pattern to form the same numbers in each base?

Now let’s look at what the arithmetic would look like using base 2 numbers. Note that when we add 1+1 in base 2, the result is not 2, but rather 10. We put the 0 in the result row and carry the 1 to the next column.

Using the rules 1+0=1, 1+1=10, and 1+1+1=11, we can solve the previous arithmetic in base 2.

We’ll see these rules again in a few minutes.

The next fundamental subject in computation is logic gates. Logic gates represent devices (either mechanical or electronic) which can be used to carry out simple logic. And I mean really simple. The three simplest logic operators are AND, OR, and NOT.

Logic operators operate on variables (inputs or outputs). We conventionally allow a variable to be in either the TRUE or FALSE state. When we implement an electronic logic gate we call these states HIGH and LOW. You might have already guessed, but when we combine logic gates to perform computations, we consider these states to be 1 (TRUE) and 0 (FALSE).

The NOT operator is the simplest. It has one input and one output. The output is always the negation of the input. We can write this in a truth table. A truth table shows all combinations of inputs and outputs. Each row forms a true statement.

INPUT | OUTPUT |

1 | 0 |

0 | 1 |

So, the first row means “NOT TRUE is FALSE”. The second row means “NOT FALSE is TRUE”. Simple stuff.

The AND operator has two inputs (called A and B) and one output. The AND operator’s output is FALSE unless both inputs are TRUE. If one input is TRUE and one is FALSE, the output is FALSE.

A | B | OUTPUT |

0 | 0 | 0 |

1 | 0 | 0 |

0 | 1 | 0 |

1 | 1 | 1 |

The OR operator also has two inputs and one output. When either input is TRUE, the output is also TRUE. When both inputs are FALSE, the output is FALSE.

A | B | OUTPUT |

0 | 0 | 0 |

1 | 0 | 1 |

0 | 1 | 1 |

1 | 1 | 1 |

Did you notice that when both inputs of the OR operator are TRUE, the output is also TRUE? There is a second type of OR operator for which this is not the case. The exclusive OR (written XOR) is TRUE only when one input is TRUE and one input is FALSE. This is a very useful operator.

A | B | OUTPUT |

0 | 0 | 0 |

1 | 0 | 1 |

0 | 1 | 1 |

1 | 1 | 0 |

These gates can be combined to form a few additional commonly used gates.

The NAND gate is a combination of and AND gate with a NOT gate on the output. This negates the output of the AND operation.

A | B | OUTPUT |

0 | 0 | 1 |

1 | 0 | 1 |

0 | 1 | 1 |

1 | 1 | 0 |

The two remaining gates are NOR (OR + NOT) and XNOR (XOR + NOT).

We can combine logic gates to perform what is called Boolean arithmetic. For example, the following is a statement in logic.

```
( A AND B ) OR NOT C
```

Sometimes it can be helpful to visually express logic statements. Here are the symbols typically used for logic gates. Inputs are on the left and outputs on the right.

So the above statement can be represented graphically as follows.

The A and B inputs go to the AND gate. The C input goes into a NOT gate. The outputs of the AND and NOT gates go into the OR gate and the output of the OR gate is the result.

Let’s look at how we can combine logic gates to form a system of equations.

Let S = A XOR B Let C = A AND B

This can be expressed graphically as:

Since both statements have the same input variables (A and B), we can write them in a unified truth table.

A | B | C | S |

0 | 0 | 0 | 0 |

1 | 0 | 0 | 1 |

0 | 1 | 0 | 1 |

1 | 1 | 1 | 0 |

This table is equivalent to the statement A + B = C*2^1 + S*2^0. The right hand side of the equation should look familiar.

`A + B = C*2 + S`

`0 + 0 = 0`

`1 + 0 = 1`

`0 + 1 = 1`

`1 + 1 = 10`

This is base 2 arithmetic implemented in logic!

Now we don’t quite have enough power to solve the original problem I gave you. The above system is called the Half Adder. The half adder accepts two input values and outputs a sum and a carry value. In order to sum numbers with multiple digits, we need to be able the accept three inputs: A, B, and a carry from the previous column. The output will be the sum plus a carry, for a total of five variables. The system that can do this is called a Full Adder. It is implemented as follows:

```
Let S = A XOR B XOR Cin
Let Cout = (A AND B) OR (Cin AND (A XOR B))
```

Which can be expressed graphically as:

The truth table for the full adder is repetitive, but can be simplified to the statements:

`0 + 0 = 0`

`0 + 1 = 1`

`1 + 1 = 10`

` 1 + 1 + 1 = 11`

By chaining a series of full adders together (taking Cout from the previous adder as Cin on the next), we can carry out multiple digit binary arithmetic.

(Image credit: Wikipedia)

The full adder is one of the primary building blocks of computers. Though tricks of notation and clever combinations with other gates, binary logic can also be made to perform subtraction, multiplication, division, and comparing numbers.

Logic gates can also be used to control other circuits based on a fixed set of rules. A good introductory logic system is the 7 segment LED found in clocks, radios, DVD players, and many other places. The logic controlling these displays converts a binary encoded number into 7 output signals which are used to turn on or off each segment of the display. There are 4 inputs representing the 4 binary digits necessary to represent 0 through 9. So for example, when the inputs are 0110 (meaning 6), the output of the gates will turn on all the segments except the top right one.

(Source: http://www.digitalcounter.eu/)

The basic idea behind designing these circuits is to start by writing a truth table for all possible relationships between the input and output variables, and then combining logic gates to create those values. Here the inputs are labeled A-D and the outputs a-d. The decimal point above is not included.

(Source: Texas Instruments http://www.ti.com/lit/ds/symlink/sn5447a.pdf)

It can get complicated very quickly, though several techniques exist to find simplified truth tables. Here is what the complete 7 segment control system looks like. The blanking input is used to turn off all the outputs no matter what the other 4 inputs are. It’s very useful if you’d like to show a flashing digit.

(Source: ibid)

This concept can be extended to any system that needs to turn other circuits on and off based on an input value.

Now that you’ve become familiar with logic gates as abstract concepts, let’s look at how we can implement them using electronic components.

There are many unique ways to build logic gates from electrical components, but the component of choice for modern computers is the transistor. The transistor is a type of semiconductor discovered at Bell Labs in the 1940’s. They have properties which can be used to amplify signals (like an audio amplifier) or to switch circuits on and off very quickly. They can also be made extremely tiny. A modern CPU contains several billion transistors etched on a wafer of silicon.

A transistor has three pins called the collector, base, and emitter. The base acts as a switch to turn on a circuit connecting the collector to the emitter. When a LOW signal is sent to the base, the collector-emitter switch is off. But when a HIGH signal is sent to the base, the collector-emitter switch is on.

The other component used in these circuits is a resistor. The purpose of a resistor is to limit the flow of current so the circuit does not short out and burn up or catch fire! You must have a resistor between a HIGH point and a LOW point in a circuit, but just because a circuit has a resistor does not mean it changes a HIGH signal into a LOW one.

Each gate circuit has a HIGH source, LOW source, at least one input, and one output. We can understand how the input affects the output by looking at the complete or incomplete circuits made by the transistor switches.

Here is a resistor-transistor implementation of a NOT gate. You can ignore the resistor on the input line. It’s purpose is to protect the transistor from burning out.

When the input is LOW, the transistor switch is off, breaking the connection between the resistor and the LOW source. The resulting circuit looks like this:

There is a complete circuit only from HIGH to output, so the output is HIGH. The LOW signal is disconnected from everything since no signal can pass through an open switch.

When the input is HIGH, the transistor switch is closed. In this case we have to consider the placement of the output pin relative to the resistor because one side of the resistor has a HIGH signal and the other side is LOW. The output pin is on the LOW side of the resistor so the output will be LOW.

Next let’s look at the AND gate.

When A or B (or both) are LOW, there is an open switch between HIGH and the resistor, thus the remaining circuit only connects output to LOW so the output is LOW. When both A and B are HIGH, then both switches are closed, connecting HIGH to the resistor. Since the output pin is on the HIGH side of the resistor, the output will be HIGH. This behavior matches the truth table shown earlier.

Similar circuits can be made for each gate. Here are schematics for the OR and NAND gates. Note that the crossed lines in the OR gate only indicate a connection if there is a dot. Doesn’t NAND look a lot like AND? Try tracing the signals through these circuits.

OR Gate

NAND Gate

Everything we’ve looked at so far is called combinatorial logic. Combinatorial logic allows us to perform addition, multiplication, and turn circuit components on and off, but there is another critical function computers need. The logic gates demonstrated above only provide an output as long as an input state is provided. The input may correspond to a button press or to a value stored in the computer’s memory. But how can the computer remember a key was pressed after the key is released, and how can computers remember numbers at all? The key to computer memory is a special combination of logic gates called the flip flop.

A flip flop is a pair of coupled NOR gates which feed the output of each gate into an input of the other gate. The two remaining inputs are conventionally called Set (S) and Reset (R). Once a HIGH signal is sent to SET, the flip flop’s output will output HIGH continuously, even after the HIGH signal is removed from SET. This is called a bistable circuit because the output does not depend on an input signal being provided. When a HIGH signal is sent to the Reset input, the output state will go to the LOW state and remain there until a HIGH signal is received on the Set input.

In a computer, flip flops (called registers in a CPU) sit in between groups of logic gates to act as a buffer for the results from each step of a computation. Setting and resetting of the flip flops occurs at a regular interval which is driven by a clock signal, a system-wide alternating pulse of HIGH and LOW signals. In a modern computer, the clock cycles more than a billion times per second!

So how do we get from gates and flip flops to a programmable computer?

Computers are different from fixed logic devices such as clocks because they can execute instructions written after the circuit is built. They can do this because internally they contain many separate circuits which can carry out fixed operations such as addition, multiplication, comparing numbers, and copying numbers between registers. These circuits are turned on and off by some control circuitry based on coded instructions stored in the instruction registers. The whole process is timed based on the computer’s central clock.

Here is a simple computer architecture. It’s called the Von Neumann architecture and it’s used by most computers today.

- Instructions: This is an array of registers which hold coded instructions for the computer to follow. Each instruction is coded as number.
- Instruction pointer: This is a register which tells the CPU the address of the next instruction to execute. Each time the computer reads an instruction, it increments this value by 1 so it knows to read the next instruction on the next clock cycle.
- Controller: This decodes the current instruction from the instruction registers and activates the other circuitry to execute the hardwired logic for that instruction, much like the LED logic shown earlier.
- Arithmetic logic unit (ALU): This contains the logic circuits that can carry out mathematical operations.
- Logic registers (AX, BX, CX, DX): These are registers which can be accessed by the ALU. It is much faster to access a register than main memory. Registers are referred to by their name when writing instructions.
- Main memory/system bus: This lets the CPU talk to the main memory or other devices attached to the computer like a keyboard, mouse, monitor, or network port. Numbers stored in memory must be copied to a register before they can be used by the ALU. Only a few registers are available, but memory can hold up to billions of values.

The instructions that a computer can follow are very simple. Here are enough instructions (with arguments written in parentheses) to write simple programs.

- INPUT (register, address)
- Copies the value stored at
*address*in the main memory and copies it to the named*register*.

- Copies the value stored at
- OUTPUT (address, register)
- Copies the value in the named
*register*to the specified main memory*address*.

- Copies the value in the named
- SET (register, value)
- Sets the named
*register*to the specified*value*.

- Sets the named
- ADD (register1, register2)
- Adds the values of
*register1*and*register2*and stores the result in*register1*.

- Adds the values of
- JUMP (line_number)
- Changes the instruction pointer’s value to
*line_number*.

- Changes the instruction pointer’s value to
- COMPARE-JUMP (register, value, line_number)
- Compares the value in the named
*register*with the specified*value*. If they are equal, it changes the instruction pointer’s value to*line_number*. This lets a program change the order of execution of instructions in response to input data.

- Compares the value in the named
- HALT
- Indicates that this is the last instruction. Stop. Early computers would ring a bell or beep to let the user know they were finished.

Let’s use these instructions to write a program that can add up a long series of numbers. I’ll make two assumptions about the input.

- The first number of the input will be the total number of items in the input.
- Some process which we don’t care about writes the numbers into the computer’s memory. This could be copying from a file, the internet, or a user manually typing them in.

Here is what the computer’s memory will look like before the program runs. I only included three numbers for brevity, but this could be anywhere from zero numbers to millions, limited only by how big the computer’s memory is.

Address | Value |

1000 | 3 |

1001 | 183 |

1002 | 492 |

1003 | 266 |

In the program, I use all four available registers. I use each register for a specific purpose.

- AX = the running total of the list (called the “accumulator”)
- BX = the next memory address to read
- CX = counter for how many more memory address to read
- DX = register to load the current list value into

The basic structure of the program is some setup code, followed by a loop which does the summing of the list, and finally writing the result to main memory. The main loop structure is created by using two COMPARE-JUMP and JUMP instructions. The COMPARE-JUMP decides when to leave the loop. The JUMP instruction restarts the loop at the end.

Here is the program code:

1 | SET BX, 1000 | Store the starting memory location in register BX |

2 | INPUT CX, BX | Copy first memory location to counter register (CX) |

3 | ADD BX, 1 | Increment BX to next memory address |

4 | INPUT AX, 0 | Initialize an accumulator register (AX) to 0 |

5 | COMPARE-JUMP CX, 0, 11 | If the counter equals zero, jump to instruction 11 |

6 | INPUT DX, BX | Copy the number to add to register DX |

7 | ADD AX, DX | Add DX to the accumulator register AX |

8 | ADD BX, 1 | Increment BX to next memory address |

9 | ADD CX, -1 | Decrement the counter register CX |

10 | JUMP 5 | Go to instruction 5 |

11 | OUTPUT BX, AX | Copy the accumulator register to main memory |

Raw computer instructions can be difficult to understand. Most programs are written in a more expressive format and then converted to computer instructions by another program called a compiler. In pseudocode the above instructions would look like:

int address = 1000; int counter = list [ address ]; // read from list address = address + 1; int sum = 0; while(counter > 0) { int value = read [ address ]; sum = sum + value; address = address + 1; counter = count - 1; } list [ address ] = sum; // write to list

I assembled the following table showing how the computer’s registers change with each instruction. Notice that only one register changes in each instruction. (With the exception of the instruction pointer which always increments unless there was a jump instruction.)

AX | BX | CX | DX | IP | Note | ||

1 | SET BX, 1000 | — | 1000 | — | — | 2 | |

2 | INPUT CX, BX | — | 1000 | 3 | — | 3 | |

3 | ADD BX, 1 | — | 1001 | 3 | — | 4 | |

4 | INPUT AX, 0 | 0 | 1001 | 3 | — | 5 | |

5 | COMPARE-JUMP CX, 0, 11 | 0 | 1001 | 3 | — | 6 | No jump |

6 | INPUT DX, BX | 0 | 1001 | 3 | 183 | 7 | |

7 | ADD AX, DX | 183 | 1001 | 3 | 183 | 8 | |

8 | ADD BX, 1 | 183 | 1002 | 3 | 183 | 9 | |

9 | ADD CX, -1 | 183 | 1002 | 2 | 183 | 10 | |

10 | JUMP 5 | 183 | 1002 | 2 | 183 | 5 | Jump |

5 | COMPARE-JUMP CX, 0, 11 | 183 | 1002 | 2 | 183 | 6 | No jump |

6 | INPUT DX, BX | 183 | 1002 | 2 | 492 | 7 | |

7 | ADD AX, DX | 675 | 1002 | 2 | 492 | 8 | |

8 | ADD BX, 1 | 675 | 1003 | 2 | 492 | 9 | |

9 | ADD CX, -1 | 675 | 1003 | 1 | 492 | 10 | |

10 | JUMP 5 | 675 | 1003 | 1 | 492 | 5 | Jump |

5 | COMPARE-JUMP CX, 0, 11 | 675 | 1003 | 1 | 492 | 6 | No jump |

6 | INPUT DX, BX | 675 | 1003 | 1 | 266 | 7 | |

7 | ADD AX, DX | 941 | 1003 | 1 | 266 | 8 | |

8 | ADD BX, 1 | 941 | 1004 | 1 | 266 | 9 | |

9 | ADD CX, -1 | 941 | 1004 | 0 | 266 | 10 | |

10 | JUMP 5 | 941 | 1004 | 0 | 266 | 5 | Jump |

5 | COMPARE-JUMP CX, 0, 11 | 941 | 1004 | 0 | 266 | 11 | Jump |

11 | OUTPUT BX, AX | 941 | 1004 | 0 | 266 | 12 | Write 941 to main memory at address 1004 |

12 | HALT | 941 | 1004 | 0 | 266 | — | Done |

The output of the program is 183 + 492 + 266 = **941**.

So that’s it. I hope this gave you some insight to how computers work. Of course I brushed over many of the complexities. I did not cover how to do multiplication or division, or how to represent negative numbers (there’s a very elegant way to do this called two’s complement), or how to compare numbers with logic gates. The model of the transistor I presented is very simple and does not account for power consumption or chaining multiple gates together. Within the computer’s logic there are more complicated structures as well such as clocked flip flops, 3-state gates, counters, dividers, multiplexers, and comparators. These are all topics beyond an introductory post, but none are too difficult to understand.

If this is a topic that interests you and you’d like to learn more, check out this free online course from MIT.