Home Memoirs of a Gamer Movies I watched Guidebook Links

the Lobdegg's Comprehensive Guidebook to DOS Programming

Chapter 1 - Introduction

Section 1: Intel Assembly

Introduction

This section is not intended to be comprehensive, rather as a starter for newbies into the world of Intel Assembly. The information here will therefore never be complete and should be expanded upon with other texts. This will also primarily only focus on the capabilities of the original 8086 itself, and thus ignore most of the expansions to the instruction set that have been added in subsequent generations of the '86 series of microprocessors.

The Registers

Name	Low 8 bits	Hi 8 bits	16 bit	32 bit
The general purpose Registers
Accumulator	AL	AH	AX	EAX
Base	BL	BH	BX	EBX
Counter	CL	CH	CX	ECX
Data	DL	DH	DX	EDX
String Indices
Source Index			SI	ESI
Destination Index			DI	EDI
Stack Pointers
Stack Pointer			SP	ESP
Base Pointer			BP	EBP
Segments
Code Segment			CS
Data Segment			DS
Extra Segment			ES
Stack Segment			SS

32 bit registers added for reference as 32 bit assembly can be paired with 32 bit C projects in Open Watcom. This guide will attempt to offer code that works in both 16 bit land and 32 bit land.

Basic Operations

Although for the most part the 4 primary registers and their 8 half registers are referred to as general purpose registers, they do have some special usages, as specified below.

AX - Accumulator

The Accumulator is used as a primary for several advanced math functions, like MUL and DIV. There are also special bytecodes for adding something to AX. Using AX in order to use these versions of ADD and SUB can produce slightly more efficient programs when applicable.

BX - Base

The Base can be used as a general purpose pointer. Unlike SI/DI and SP/BP whose purposes and uses in memory can be very use specific, when BX is used as an address, it's expected to be any conceivable place without restrictions of purpose.

CX - Counter

The Counter register is used by the loop instruction, which will auto decrement it and then jump if the increment caused the zero flag to be clear.

DX - Data

Certain math instructions will pair AX with DX, such as MUL and DIV which uses DX to extend AX to twice the bit depth in the case of MUL or to place the remainder in the case of DIV.

The "String" Functions

The 8086 and subsequent chips support a series of instructions referred to as String Instructions. An incomplete list would be movsb, lodsb, stosb, cmpsb, and so on. The last letter in the instruction name is the bit width desired.

'b' for bytes (8 bits)
'w' for words (16 bits)
'd' for double words (32 bits)

In newer assemblers the letter 'd' for 32 bit transactions has been replaced by 'l' for long, an old term for 32 bit integers.

SI - Source Index

The Source Index register is used primarily by the string functions as the offset to the DS register to create a source address.

DI - Destination Index

The Destination Index register is used primarily by the string functions as the offset to the ES register to create a destination address.

You might be wondering why these are referred to as "String" instructions. Generally strings are what we call arbitrarily lengthed arrays of characters, like "Hello, world!" Well that's what these instructions are designed for, arbitrary lengthed chunks of data. To automatically loop over an entire block of data to either copy it, set it to an initial value, or even compare it against another block of data, just use the prefix command "rep" to the line and set CX to the number of iterations and the processor will loop that single instruction until CX equals 0.

Example:

rep movsb

The Stack

In general, computer memory is split into two logical blocks by programmers: the Heap and the Stack. The Heap grows up from the bottom of memory and the Stack grows down. The Heap is used for long term memory allocation, and the Stack is used for short term data storage, often times storing values for only a handful of instructions before the data is popped off the Stack back into use by one of our registers.

SP - Stack Pointer

The address composed by SS:SP points to the current location for the stack. When something is pushed to the stack, SP will decrease, and pops will increment the stack.

Note: Push and Pop must be on a piece of data that is a multiple of 16 bits wide. So for 8 bit pushes, either expand the value to 16 bits or pack 2 8 bit values into a single push/pop combo.

BP - Base Pointer

In order to maintain an easy framework for functions to track where their stack space starts, the Base Pointer exists to store the Stack Pointer. This means accessing local variables and parameters is safe and simple no matter how much the function has had to push to the stack.
A common usage follows this format:

16 bit Assembly

32 bit Assembly

    ;A normal function opener
function_name:
    push bp
    mov bp, sp
    ;Feel free to start modifying the stack!

    ;A normal function opener
function_name:
    push ebp
    mov ebp, esp
    ;Feel free to start modifying the stack!

    ;And to restore the stack state:
    mov sp, bp
    pop bp
    ret

    ;And to restore the stack state:
    mov esp, ebp
    pop ebp
    ret

The practical upshot to this wrapper, is that at anytime during the function, parameters can be accessed by referencing the memory location [bp+(4+x)] (in 16 bit mode) where x is the number of bytes from the first parameter we want to retrieve.
In 32 bit land, start parameters from [bp+(8+x)].

Note: In 16 bit mode the address of [bp] will return the value of BP pushed to the stack at the start of the function and [bp+2] will have the return address. In 32 bit mode [ebp] will have the pushed value of ebp, and [ebp+4] contains the return address.

* This assumes near function calls. For far function calls add another 2 bytes for the segment register.

16 Bit Land and Segmentation

The original 8086 supported a 20 bit address space. In olden days we would call this a Megabyte, but after a small tiff between engineers and the SI which led to data sizes being renamed from the generally understood Kilobyte / Megabyte / Gigabyte to Kibibyte / Mebibyte / Gibibyte. Looking back at older software can lead to some miscommunication because of this relatively recent (read just over 20 years old) redefinition.

Terminology not withstanding, there was a problem arising from a 16 bit processor where all available registers are 16 bits wide trying to address 20 bits of memory address space. Intel's solution was effective, if a little confusing. To expand it's 16 bit registers to 20 bits it combined 2 16 bit values: the Segment, and the Offset. This is commonly written in the format SEGMENT:OFFSET. Unfortunately here's where it gets a bit obtuse. Instead of the segment register acting as the bits starting past the 16 bit offset, Intel chose to add it so that the maximum address space available was 1 Mebibyte. The equation to combine the two is:

Address = (Segment * 16) + Offset

Note: In several older documents these 16 byte Segments are usually called "Paragraphs". This term will not be used in this guide too often, but it will help to bare this in mind during research into other documents.

CS - Code Segment

The Code Segment can't be set with a move or math instruction. Instead it can only be loaded with what's called a "long jump" or a "far call".

DS - Data Segment

The Data Segment is what is used in most cases where an instruction loads data from a memory address, whether given as an immediate or a register.

The one exception is the stosb/stosw/stosd and the movsb/movsw/movsd instructions which use the ES register for their write actions.

ES - Extra Segment

The Extra Segment can be used to specify an alternative segment for data retrieval and storage as opposed to the usual DS register.

SS - Stack Segment

The Stack Segment is used by stack instructions: Push and Pop.