Main Points

Let's start at the bottom & work up.

Machine Language - Binary

A computer chip (CPU) has a language so primitive, it's difficult for most people to recognize it as a language. It's called an “instruction set” or "machine language"; this is the complete list of all the operations that it can perform. This machine language is composed of very simple instructions, like add, subtract, multiply, and divide. It also includes instructions to move data around in memory, and to compare 2 numbers to see if one is larger than the other. Most of these instructions will handle different types of operands, like bytes, integers, floating point, logical (Boolean), etc. The CPU contains small chunks of super fast memory, called registers, usually 32 or 64 bits long. The arithmetic and other operations are done here.

I know it seems primitive, but at the lowest level, this is all a computer can do. All the fearful and wonderful things that computers do must be expressed as a series of these simple instructions. It takes a whole bunch of instructions to do anything useful. This is the main reason why programs are so large, and why debugging (fixing errors) is such a pain.

Each instruction is expressed as a binary number telling the CPU what to do (add, compare, etc.) and the data (or the address thereof) to be operated on.

Assembly Language

Assembly language is one notch up from machine language. It is still one instruction at a time, but it now has names for the instructions and the operands. It is much easier to write an add operation like:

ADD X1, X2

This expression adds the value of X2 to X1, leaving the result in X1Note

Rather than something like this in binary:

1000010101010110000000010101011 or even 42AB00AB as hex (hexadecimal notation, ie base-16 math).

An “assembler” is a program that translates an assembly language into actual binary machine language instructions. In the above example, it would convert the ADD to the equivalent binary code, and figure the binary addresses of X1 and X2. It would then assemble these parts into a completed machine language instruction, saving the programmer MUCH work and mind-numbing detail. This is a fairly simple process (for the assembler) since each assembly language statement translates directly to one machine instruction.

Learn more about assembly language.

Higher Level Languages

One notch up from assembly language is what might be called higher level languages, like BASIC, C, C++, Java, etc. The programs are composed of statements, or lines of code, each of which may translate into many machine instructions. Translator programs at this level are called “compilers”, because they must compile each statement into many machine instructions.

These languages are a great labor saver for programmers. It is much easier and more readable to write a statement like:

X = Y + 3 * Z

Rather than a stack of assembly code like this.

LOAD R1, Y (Loads Y into register 1)
LOAD R2, Z (Loads Z into register 2)
MULT R2, 3 (Multiplies register 2 by 3)
ADD R1, R2 (Adds register 2 to register 1)
STOR R1, X (Stores register 1 into X)

I don’t even want to think about what the equivalent binary bit strings of machine code would look like.

This automatic compilation of a fairly readable programming language into almost totally unreadable binary machine code is both the strength and weakness of the compiler. Most of the programs being developed today would be totally impossible without the high level languages. Assembly code is simply not up to the complexity, and would take 10 – 20 times longer if you tried to use it. And nobody in their right mind writes real programs in machine code (binary). However, being able to read them is sometimes a help in debugging.

The down side of a compiler is that you (the programmer) and your program are now separated from the instructions that the computer is actually executing. You assume/hope that the machine code is an accurate representation of your program, and usually this is a fairly good assumption. However the compiler may change the order of operations or substitute “equivalent” operations, if it thinks this may improve things. This is especially true of optimizing compilers, which can move large chunks of code around to try to make the program run faster or use less memory. When the compiler does a good job, it creates code that runs very fast and doesn't use a lot of memory. If the compiler makes a bad guess, it can create bugs that are the very dickens to find and fix.

The output from a compiler can be any of a number of things, depending on how you set the options:

  • A file of machine instructions, binary, ready to run, like an exe file.
  • A file of Assembly code, which may be “tweaked” by hand, then passed to an assembler. Many FORTRAN, C, and C++ compilers will generate either of these.
  • A file of intermediate level code that is always the same, no matter what type of computer it is compiled on. Then you need another program, called a “run time environment” (RTE), that is written just for that type of machine. The RTE acts like a virtual computer to execute the intermediate code. This is how the internet programming language, Java, works. The theory is that, once you compile a Java program on any machine (PC, MAC, Sun, etc.), it will run on any other computer. As long as the person who runs it has the RTE for that machine, the Java program will run on ANY computer. It never works quite that well in actual practice, of course, but that is the theory. If you're interested in more on Java and internet related languages, we have it.
  • Then there are the “interpreters”, which don’t produce any output file at all. There is an interpreter program (kind of like the RTE) which scans your program and performs the operations immediately. The down side is that each statement is scanned each time it is executed. If a statement is inside a loop and executed 1000 times, it is scanned 1000 times. Many of the cheaper/older flavors of BASIC are like this. They are R – e – a – l – l – y S – l – o – o – o – o – w.

Personal Notes

Occasionally, there are legitimate reasons for using assembly language, usually to increase speed or reduce memory. Once I had a TRS-80, a REALLY old computer. It had BASIC built into it, and usually I used that. However, I used assembly language for a Space Invaders game, and a program for Conway’s Game of Life, just for the speed. I actually wrote Conway in both BASIC and assembler, for comparison. The BASIC version took nearly half an hour for each generation. The assembly version took about 1 second, a speed-up of about 1500 to 1.

Also, while I was at Tektronix, I spent several years writing firmware code for the Tek graphics terminals. This was all done in assembly language, simply to increase the speed and reduce the memory requirements. These were really large programs, hundreds of K bytes and debugging was an absolute nightmare! I suspect that Tek does their firmware in C nowadays.

I never had to program in actual binary machine code (thank goodness!). The nearest I ever came was a simple debugger that I wrote for debugging assembly programs on the TRS-80. It would dump the registers and memory to the screen, in hex of course. You could see the actual machine code instructions as well as stored data. However it was the data and the registers that I was mostly interested in, so it was helpful to know a little bit of binary for that.

Note:The assembly instructions are meant as an example, rather than to illustrate the language for any particular chip. However, they are reminiscent of the little I can remember of the assembly language on my old TRS-80.