Chapter 3
|
Using Addresses and Pointers |
MASM applications running in real mode require segmented addresses to access code and data. The address of the code or data in a segment is relative to a segment address in a segment register. You can also use pointers to access data in assembly language programs. (A pointer is a variable that contains an address as its value.)
The first section of this chapter describes how to initialize default segment registers to access near and far addresses. The next section describes how to access code and data. It also describes related operators, syntax, and displacements. The discussion of memory operands lays the foundation for the third section, which describes the stack.
The fourth section of this chapter explains how to use the TYPEDEF directive to declare pointers and the ASSUME directive to give the assembler information about registers containing pointers. This section also shows you how to do typical pointer operations and how to write code that works for pointer variables in any memory model.
Before you use segmented addresses in your programs, you need to initialize the segment registers. The initialization process depends on the registers used and on your choice of simplified segment directives or full segment definitions. The simplified segment directives (introduced in Chapter 2) handle most of the initialization process for you. This section explains how to inform the assembler and the processor of segment addresses, and how to access the near and far code and data in those segments.
The segmented architecture of the 8086-family of processors does not require that you specify two addresses every time you access memory. As explained in Chapter 2, Organizing Segments, the 8086 family of processors uses a system of default segment registers to simplify access to the most commonly used data and code.
The segment registers DS, SS, and CS are normally initialized to default segments at the beginning of a program. If you write the main module in a high-level language, the compiler initializes the segment registers. If you write the main module in assembly language, you must initialize the segment registers yourself. Follow these steps to initialize segments:
1. Tell the assembler which segment is associated with a register. The assembler must know the default segments at assembly time.
2. Tell the processor which segment is associated with a register by writing the necessary code to load the correct segment value into the segment register on the processor.
These steps are discussed separately in the following sections.
The first step in initializing segments is to tell the assembler which segment to associate with a register. You do this with the ASSUME directive. If you use simplified segment directives, the assembler automatically generates the appropriate ASSUME statements. If you use full segment definitions, you must code the ASSUME statements for registers other than CS yourself. (ASSUME can also be used on general-purpose registers, as explained in Defining Register Types with ASSUME later in this chapter.)
The .STARTUP directive generates startup code that sets DS equal to SS (unless you specify FARSTACK), allowing default data to be accessed through either SS or DS. This can improve efficiency in the code generated by compilers. The DS equals SS convention may not work with certain applications, such as memory-resident programs in MS-DOS and Windows dynamic-link libraries (see Chapter 10). The code generated for .STARTUP is shown in Starting and Ending Code with .STARTUP and .EXIT in Chapter 2. You can use similar code to set DS equal to SS in programs using full segment definitions.
Here is an example of ASSUME using full segment definitions:
ASSUME cs:_TEXT, ds:DGROUP, ss:DGROUP
This example is equivalent to the ASSUME statement generated with simplified segment directives in small model with NEARSTACK. Note that DS and SS are part of the same segment group. It is also possible to have different segments for data and code, and to use ASSUME to set ES, as shown here:
ASSUME cs:MYCODE, ds:MYDATA, ss:MYSTACK, es:OTHER
Correct use of the ASSUME statement can help find addressing errors. With .CODE, the assembler assumes CS is the current segment. When you use the simplified segment directives .DATA, .DATA?, .CONST, .FARDATA, or .FARDATA?, the assembler automatically assumes CS is the ERROR segment. This prevents instructions from appearing in these segments. If you use full segment definitions, you can accomplish the same by placing ASSUME CS:ERROR in a data segment.
With simple or full segments, you can cancel the control of an ASSUME statement by assuming NOTHING. You can cancel the previous assumption for ES with the following statement:
ASSUME es:NOTHING
Prior to the .MODEL statement (or in its absence), the assembler sets the
ASSUME statement for DS, ES, and SS to the current segment.
The second and final step in initializing segments is to inform the processor of segment values at run time. How segment values are initialized at run time differs for each segment register and depends on the operating system and on your use of simplified segment directives or full segment definitions.
A programs starting address determines where execution begins. After the operating system loads a program, it simply jumps to the starting address, giving processor control to the program. The true starting address is known only to the loader; the linker determines only the offset of the address within an undetermined code segment. Thats why a normal application is often referred to as relocatable code, because it runs regardless of where the loader places it in memory.
The offset of the starting address depends on the program type. Programs with an .EXE extension contain a header from which the loader reads the offset and combines it with a segment to form the starting address. Programs with a .COM extension (tiny model) have no such header, so by convention the loader jumps to the first byte of the program.
In either case, the .STARTUP directive identifies where execution begins, provided you use simplified segment directives. For an .EXE program, place .STARTUP immediately before the instruction where you want execution to start. In a .COM program, place .STARTUP before the first assembly instruction in your source code.
If you use full segment directives or prefer not to use .STARTUP, you must identify the starting instruction in two steps:
1. Label the starting instruction.
2. Provide the same label in the END directive.
These steps tell the linker where execution begins in the program. The following example illustrates the two steps for a tiny model program:
_TEXT SEGMENT WORD PUBLIC 'CODE'
ORG 100h ; Use this declaration for .COM files only
start: . ; First instruction here
.
.
_TEXT ENDS
END start ; Name of starting label
Notice the ORG statement in this example. This statement is mandatory in a tiny model program without the .STARTUP directive. It places the first instruction at offset 100h in the code segment to create space for a 256-byte (100h) data area called the Program Segment Prefix (PSP). The operating system takes care of initializing the PSP, so you need only make sure the area exists. (For a description of what data resides in the PSP, refer to the Tables chapter in the Reference.)
The DS register is automatically initialized to the correct value (DGROUP) if you use .STARTUP or if you are writing a program for Windows. If you do not use .STARTUP with MS-DOS, you must initialize DS using the following instructions:
mov ax, DGROUP
mov ds, ax
The initialization requires two instructions because the segment name is a constant and the assembler does not allow a constant to be loaded directly to a segment register. The previous example loads DGROUP, but you can load any valid segment or group.
The SS and SP registers are initialized automatically if you use the .STACK directive with simplified segments or if you define a segment that has the STACK combine type with full segment definitions. Using the STACK directive initializes SS to the stack segment. If you want SS to be equal to DS, use .STARTUP or its equivalent. (See Combining Segments, page 45.) For an .EXE file, the stack address is encoded into the executable header and resolved at load time. For a .COM file, the loader sets SS equal to CS and initializes SP to 0FFFEh.
If your program does not access far data, you do not need to initialize the ES register. If you choose to initialize, use the same technique as for the DS register. You can initialize SS to a far stack in the same way.
Addresses that have an implied segment name or segment registers associated with them are called near addresses. Addresses that have an explicit segment associated with them are called far addresses. The assembler handles near and far code automatically, as described in the following sections. You must specify how to handle far data.
The Microsoft segment model puts all near data and the stack in a group called DGROUP. Near code is put in a segment called _TEXT. Each modules far code or far data is placed in a separate segment. This convention is described in Controlling the Segment Order in Chapter 2.
The assembler cannot determine the address for some program components; these are said to be relocatable. The assembler generates a fixup record and the linker provides the address once it has determined the location of all segments. Usually a relocatable operand references a label, but there are exceptions. Examples in the next two sections include information about relocating near and far data.
Control transfers within near code do not require changes to segment registers. The processor automatically handles changes to the offset in the IP register when control-flow instructions such as JMP, CALL, and RET are used. The statement
call nearproc ; Change code offset
changes the IP register to the new address but leaves the segment unchanged. When the procedure returns, the processor resets IP to the offset of the next instruction after the CALL instruction.
The processor automatically handles segment register changes when dealing with far code. The statement
call farproc ; Change code segment and offset
automatically moves the segment and offset of the farproc procedure to the CS and IP registers. When the procedure returns, the processor sets CS to the original code segment and sets IP to the offset of the next instruction after the call.
A program can access near data directly, because a segment register already holds the correct segment for the data item. The term near data is often used to refer to the data in the DGROUP group.
After the first initialization of the DS and SS registers, these registers normally point into DGROUP. If you modify the contents of either of these registers during the execution of the program, you must reload the register with DGROUPs address before referencing any DGROUP data.
The processor assumes all memory references are relative to the segment in the DS register, with the exception of references using BP or SP. The processor associates these registers with the SS register. (You can override these assumptions with the segment override operator, described in Direct Memory Operands, on page 62.)
The following lines illustrate how the processor accesses either the DS or SS segments, depending on whether the pointer operand contains BP or SP. Note the distinction loses significance when DS and SS are equal.
nearvar WORD 0
.
.
.
mov ax, nearvar ; Reads from DS:[nearvar]
mov di, [bx] ; Reads from DS:[bx]
mov [di], cx ; Writes to DS:[di]
mov [bp+6], ax ; Writes to SS:[bp+6]
mov bx, [bp] ; Reads from SS:[bp]
To read or modify a far address, a segment register must point to the segment of the data. This requires two steps. First load the segment (normally either ES or DS) with the correct value, and then (optionally) set an assume of the segment register to the segment of the address.
|
Note |
Flat model does not require far addresses. By default, all addressing is relative to the initial values of the segment registers. Therefore, this section on far addressing does not apply to flat model programs.
One method commonly used to access far data is to initialize the ES segment register. This example shows two ways to do this:
; First method
mov ax, SEG farvar ; Load segment of the
mov es, ax , far address into ES
mov ax, es:farvar ; Provide an explicit segment
; override on the addressing
|
|
; Second method
mov ax, SEG farvar2 ; Load the segment of the
mov es, ax ; far address into ES
ASSUME ES:SEG farvar2 ; Tell the assembler that ES points
; to the segment containing farvar2
mov ax, farvar2 ; The assembler provides the ES
; override since it knows that
; the label is addressable
After loading the segment of the address into the ES segment register, you can explicitly override the segment register so that the addressing is correct (method 1) or allow the assembler to insert the override for you (method 2). The assembler uses ASSUME statements to determine which segment register can be used to address a segment of memory. To use the segment override operator, the left operand must be a segment register, not a segment name. (For more information on segment overrides, see Direct Memory Operands on page 62.)
If an instruction needs a segment override, the resulting code is slightly larger and slower, since the override must be encoded into the instruction. However, the resulting code may still be smaller than the code for multiple loads of the default segment register for the instruction.
The DS, SS, FS, and GS segment registers (FS and GS are available only on the 80386/486 processors) may also be used for addressing through other segments.
If a program uses ES to access far data, it need not restore ES when finished (unless the program uses flat model). However, some compilers require that you restore ES before returning to a module written in a high-level language.
To access far data, first set DS to the far segment and then restore the original DS when finished. Use the ASSUME directive to let the assembler know that DS no longer points to the default data segment, as shown here:
push ds ; Save original segment
mov ax, SEG fararray ; Move segment into data register
mov ds, ax ; Initialize segment register
ASSUME ds:SEG fararray ; Tell assembler where data is
mov ax, fararray[0] ; Set DX:AX = dword variable
mov dx, fararray[2] ; fararray
.
.
.
pop ds ; Restore segment
ASSUME ds:@DATA ; and default assumption
|
|
Direct Memory Operands,on page 62, describes an alternative method for accessing far data. The technique of resetting DS as shown in the previous example is best for a lengthy series of far data references. The segment override method described in Direct Memory Operands serves best when accessing only one or two far variables.
If your program changes DS to access far data, it should restore DS when finished. This allows procedures to assume that DS is the segment for near data. Many compilers, including Microsoft compilers, use this convention.
With few exceptions, assembly language instructions work on sources of data called operands. In a listing of assembly code (such as the examples in this book), operands appear in the operand field immediately to the right of the instructions.
This section describes the four kinds of instruction operands: register, immediate, direct memory, and indirect memory. Some instructions, such as POPF and STI, have implied operands which do not appear in the operand field. Otherwise, an implied operand is just as real as one stated explicitly.
Certain other instructions such as NOP and WAIT deserve special mention. These instructions affect only processor control and do not require an operand.
The following four types of operands are described in the rest of this section:
|
Operand Type |
Addressing Mode |
|
|
|
|
Register |
An 8-bit or 16-bit register on the 808680486; can also be 32-bit on the 80386/486. |
|
Immediate |
A constant value contained in the instruction itself. |
|
Direct memory |
A fixed location in memory. |
|
Indirect memory |
A memory location determined at run time by using the address stored in one or two registers. |
Instructions that take two or more operands always work right to left. The right operand is the source operand. It specifies data that will be read, but not changed, in the operation. The left operand is the destination operand. It specifies the data that will be acted on and possibly changed by the instruction.
|
|
Register operands refer to data stored in registers. The following examples show typical register operands:
mov bx, 10 ; Load constant to BX
add ax, bx ; Add BX to AX
jmp di ; Jump to the address in DI
An offset stored in a base or index register often serves as a pointer into memory. You can store an offset in one of the base or index registers, then use the register as an indirect memory operand. (See Indirect Memory Operands, following.) For example:
mov [bx], dl ; Store DL in indirect memory operand
inc bx ; Increment register operand
mov [bx], dl ; Store DL in new indirect memory operand
This example moves the value in DL to 2 consecutive bytes of a memory location pointed to by BX. Any instruction that changes the register value also changes the data item pointed to by the register.
An immediate operand is a constant or the result of a constant expression. The assembler encodes immediate values into the instruction at assembly time. Here are some typical examples showing immediate operands:
mov cx, 20 ; Load constant to register
add var, 1Fh ; Add hex constant to variable
sub &n