Home
Memoirs of a Gamer
Movies I watched
Guidebook
Links
the Lobdegg's Comprehensive Guidebook to DOS Programming
Chapter 1 - Introduction
Section 1: Intel Assembly
Introduction
This section is not intended to be comprehensive, rather as a
starter for newbies into the world of Intel Assembly. The information
here will therefore never be complete and should be expanded upon
with other texts. This will also primarily only focus on the
capabilities of the original 8086 itself, and thus ignore most
of the expansions to the instruction set that have been added
in subsequent generations of the '86 series of microprocessors.
The Registers
32 bit registers added for reference as 32 bit assembly can be paired with 32 bit C
projects in Open Watcom. This guide will attempt to offer code that works in both
16 bit land and 32 bit land.
Basic Operations
Although for the most part the 4 primary registers and their 8 half registers
are referred to as general purpose registers, they do have some special
usages, as specified below.
AX - Accumulator
The Accumulator is used as a primary for several advanced math functions,
like MUL and DIV. There are also special bytecodes for adding something to
AX. Using AX in order to use these versions of ADD and SUB can produce
slightly more efficient programs when applicable.
BX - Base
The Base can be used as a general purpose pointer. Unlike SI/DI and SP/BP whose purposes
and uses in memory can be very use specific, when BX is used as an address, it's expected
to be any conceivable place without restrictions of purpose.
CX - Counter
The Counter register is used by the loop instruction, which will auto decrement it
and then jump if the increment caused the zero flag to be clear.
DX - Data
Certain math instructions will pair AX with DX, such as MUL and DIV which uses
DX to extend AX to twice the bit depth in the case of MUL or to place the remainder
in the case of DIV.
The "String" Functions
The 8086 and subsequent chips support a series of instructions referred to as
String Instructions. An incomplete list would be movsb, lodsb, stosb, cmpsb,
and so on. The last letter in the instruction name is the bit width desired.
- 'b' for bytes (8 bits)
- 'w' for words (16 bits)
- 'd' for double words (32 bits)
In newer assemblers the letter 'd' for 32 bit transactions has been replaced by
'l' for long, an old term for 32 bit integers.
SI - Source Index
The Source Index register is used primarily by the string functions as
the offset to the DS register to create a source address.
DI - Destination Index
The Destination Index register is used primarily by the string functions as
the offset to the ES register to create a destination address.
You might be wondering why these are referred to as "String" instructions. Generally
strings are what we call arbitrarily lengthed arrays of characters, like "Hello, world!"
Well that's what these instructions are designed for, arbitrary lengthed chunks of data.
To automatically loop over an entire block of data to either copy it, set it to an initial
value, or even compare it against another block of data, just use the prefix command "rep"
to the line and set CX to the number of iterations and the processor will loop that single
instruction until CX equals 0.
The Stack
In general, computer memory is split into two logical blocks by programmers:
the Heap and the Stack. The Heap grows up from the bottom of memory and the
Stack grows down. The Heap is used for long term memory allocation, and the
Stack is used for short term data storage, often times storing values for
only a handful of instructions before the data is popped off the Stack back
into use by one of our registers.
SP - Stack Pointer
The address composed by SS:SP points to the current location for
the stack. When something is pushed to the stack, SP will decrease,
and pops will increment the stack.
Note: Push and Pop must be on a piece of data that is a multiple
of 16 bits wide. So for 8 bit pushes, either expand the value to 16 bits
or pack 2 8 bit values into a single push/pop combo.
BP - Base Pointer
In order to maintain an easy framework for functions to track where their
stack space starts, the Base Pointer exists to store the Stack Pointer.
This means accessing local variables and parameters is safe and simple no
matter how much the function has had to push to the stack.
A common usage follows this format:
16 bit Assembly | 32 bit Assembly |
;A normal function opener
function_name:
push bp
mov bp, sp
;Feel free to start modifying the stack! |
;A normal function opener
function_name:
push ebp
mov ebp, esp
;Feel free to start modifying the stack! |
;And to restore the stack state:
mov sp, bp
pop bp
ret |
;And to restore the stack state:
mov esp, ebp
pop ebp
ret |
The practical upshot to this wrapper, is that at anytime during
the function, parameters can be accessed by referencing the memory
location
[bp+(4+x)] (in 16 bit mode)
where x is the number of bytes from the first parameter we want
to retrieve.
In 32 bit land, start parameters from
[bp+(8+x)].
Note: In 16 bit mode the address of [bp] will return the value of
BP pushed to the stack at the start of the function and *[bp+2] will
have the return address. In 32 bit mode [ebp] will have the pushed
value of ebp, and *[ebp+4] contains the return address.
* This assumes near function calls. For far function calls
add another 2 bytes for the segment register.
16 Bit Land and Segmentation
The original 8086 supported a 20 bit address space. In olden days we
would call this a Megabyte, but after a small tiff between engineers
and the SI
which led to data sizes being renamed from the generally understood
Kilobyte / Megabyte / Gigabyte to Kibibyte / Mebibyte / Gibibyte.
Looking back at older software can lead to some miscommunication
because of this relatively recent (read just over 20 years old)
redefinition.
Terminology not withstanding, there was a problem arising from a
16 bit processor where all available registers are 16 bits wide
trying to address 20 bits of memory address space. Intel's solution
was effective, if a little confusing. To expand it's 16 bit registers
to 20 bits it combined 2 16 bit values: the Segment, and the Offset.
This is commonly written in the format SEGMENT:OFFSET. Unfortunately
here's where it gets a bit obtuse. Instead of the segment register
acting as the bits starting past the 16 bit offset, Intel
chose to add it so that the maximum address space available was
1 Mebibyte. The equation to combine the two is:
Address = (Segment * 16) + Offset
Note: In several older documents these 16 byte Segments are usually
called "Paragraphs". This term will not be used in this guide
too often, but it will help to bare this in mind during research
into other documents.
CS - Code Segment
The Code Segment can't be set with a move or math instruction.
Instead it can only be loaded with what's called a "long jump"
or a "far call".
DS - Data Segment
The Data Segment is what is used in most cases where an instruction
loads data from a memory address, whether given as an immediate or
a register.
The one exception is the stosb/stosw/stosd and the movsb/movsw/movsd
instructions which use the ES register for their write actions.
ES - Extra Segment
The Extra Segment can be used to specify an alternative segment for
data retrieval and storage as opposed to the usual DS register.
SS - Stack Segment
The Stack Segment is used by stack instructions: Push and Pop.