Chapter 5
|
Defining and Using Complex |
With the complex data types available in MASM 6.1 arrays, strings, records, structures, and unions you can access data as a unit or as individual elements that make up a unit. The individual elements of complex data types are often the integer types discussed in Chapter 4, Defining and Using Simple Data Types.
Arrays and Strings reviews how to declare, reference, and initialize arrays and strings. This section summarizes the general steps needed to process arrays and strings and describes the MASM instructions for moving, comparing, searching, loading, and storing.
Structures and Unions covers similar information for structures and unions: how to declare structure and union types, how to define structure and union variables, and how to reference structures and unions and their fields.
Records explains how to declare record types, define record variables, and use record operators.
An array is a sequential collection of variables, all of the same size and type, called elements. A string is an array of characters. For example, in the string ABC, each letter is an element. You can access the elements in an array or string relative to the first element. This section explains how to handle arrays and strings in your programs.
Array elements occupy memory contiguously, so a program references each element relative to the start of the array. To declare an array, supply a label name, the element type, and a series of initializing values or ? placeholders. The following examples declare the arrays warray and xarray:
warray WORD 1, 2, 3, 4
xarray DWORD 0FFFFFFFFh, 789ABCDEh
Initializer lists of array declarations can span multiple lines. The first initializer must appear on the same line as the data type, all entries must be initialized, and, if you want the array to continue to the new line, the line must end with a comma. These examples show legal multiple-line array declarations:
big BYTE 21, 22, 23, 24, 25,
26, 27, 28
somelist WORD 10,
20,
30
If you do not use the LENGTHOF and SIZEOF operators discussed later in this section, an array may span more than one logical line, although a separate type declaration is needed on each logical line:
var1 BYTE 10, 20, 30
BYTE 40, 50, 60
BYTE 70, 80, 90
You can also declare an array with the DUP operator. This operator works with any of the data allocation directives described in Allocating Memory for Integer Variables in Chapter 4. In the syntax
count DUP (initialvalue [[, initialvalue]]...)
the count value sets the number of times to repeat all values within the parentheses. The initialvalue can be an integer, character constant, or another DUP operator, and must always appear within parentheses. For example, the statement
barray BYTE 5 DUP (1)
allocates the integer 1 five times for a total of 5 bytes.
The following examples show various ways to allocate data elements with the DUP operator:
array DWORD 10 DUP (1) ; 10 doublewords
; initialized to 1
buffer BYTE 256 DUP (?) ; 256-byte buffer
masks BYTE 20 DUP (040h, 020h, 04h, 02h) ; 80-byte buffer
; with bit masks
three_d DWORD 5 DUP (5 DUP (5 DUP (0))) ; 125 doublewords
; initialized to 0
Each element in an array is referenced with an index number, beginning with zero. The array index appears in brackets after the array name, as in
array[9]
Assembly-language indexes differ from indexes in high-level languages, where the index number always corresponds to the elements position. In C, for example, array[9] references the arrays tenth element, regardless of whether each element is 1 byte or 8 bytes in size.
In assembly language, an elements index refers to the number of bytes between the element and the start of the array. This distinction can be ignored for arrays of byte-sized elements, since an elements position number matches its index. For example, defining the array
prime BYTE 1, 3, 5, 7, 11, 13, 17
gives a value of 1 to prime[0], a value of 3 to prime[1], and so forth.
However, in arrays with elements larger than 1 byte, index numbers (except zero) do not correspond to an elements position. You must multiply an elements position by its size to determine the elements index. Thus, for the array
wprime WORD 1, 3, 5, 7, 11, 13, 17
wprime[4] represents the third element (5), which is 4 bytes from the beginning of the array. Similarly, the expression wprime[6] represents the fourth element (7) and wprime[10] represents the sixth element (13).
The following example determines an index at run time. It multiplies the position by two (the size of a word element) by shifting it left:
mov si, cx ; CX holds position number
shl si, 1 ; Scale for word referencing
mov ax, wprime[si] ; Move element into AX
The offset required to access an array element can be calculated with the following formula:
nth element of array = array[(n-1) * size of element]
Referencing an array element by distance rather than position is not difficult to master, and is actually very consistent with how assembly language works. Recall that a variable name is a symbol that represents the contents of a particular address in memory. Thus, if the array wprime begins at address DS:2400h, the reference wprime[6] means to the processor the word value contained in the DS segment at offset 2400h-plus-6-bytes.
As described in Direct Memory Operands, Chapter 3, you can substitute the plus operator (+) for brackets, as in:
wprime[9]
wprime+9
Since brackets simply add a number to an address, you dont need them when referencing the first element. Thus, wprime and wprime[0] both refer to the first element of the array wprime.
If your program runs only on an 80186 processor or higher, you can use the BOUND instruction to verify that an index value is within the bounds of an array. For a description of BOUND, see the Reference.
When applied to arrays, the LENGTHOF, SIZEOF, and TYPE operators return information about the length and size of the array and about the type of the
initializers.
The LENGTHOF operator returns the number of elements in the array. The SIZEOF operator returns the number of bytes used by the initializers in the array definition. TYPE returns the size of the elements of the array. The following examples illustrate these operators:
array WORD 40 DUP (5)
larray EQU LENGTHOF array ; 40 elements
sarray EQU SIZEOF array ; 80 bytes
tarray EQU TYPE array ; 2 bytes per element
num DWORD 4, 5, 6, 7, 8, 9, 10, 11
lnum EQU LENGTHOF num ; 8 elements
snum EQU SIZEOF num ; 32 bytes
tnum EQU TYPE num ; 4 bytes per element
warray WORD 40 DUP (40 DUP (5))
len EQU LENGTHOF warray ; 1600 elements
siz EQU SIZEOF warray ; 3200 bytes
typ EQU TYPE warray ; 2 bytes per element
A string is an array of characters. Initializing a string like "Hello, there" allocates and initializes 1 byte for each character in the string. An initialized string can be no longer than 255 characters.
For data directives other than BYTE, a string may initialize only the first element. The initializer value must fit into the specified size and conform to the expression word size in effect (see Integer Constants and Constant Expressions in Chapter 1), as shown in these examples:
wstr WORD "OK"
dstr DWORD "DATA" ; Legal under EXPR32 only
As with arrays, string initializers can span multiple lines. The line must end with a comma if you want the string to continue to the next line.
str1 BYTE "This is a long string that does not ",
"fit on one line."
You can also have an array of pointers to strings.
PBYTE TYPEDEF PTR BYTE
.DATA
msg1 BYTE "Operation completed successfully."
msg2 BYTE "Unknown command"
msg3 BYTE "File not found"
pmsg PBYTE msg1 ; pmsg is an array
PBBYTE msg2 ; of pointers to
PBYTE msg3 ; above messages
Strings must be enclosed in single (') or double (") quotation marks. To put a single quotation mark inside a string enclosed by single quotation marks, use two single quotation marks. Likewise, if you need quotation marks inside a string enclosed by double quotation marks, use two sets. These examples show the various uses of quotation marks:
char BYTE 'a'
message BYTE "That's the message." ; That's the message.
warn BYTE 'Can''t find file.' ; Can't find file.
string BYTE "This ""value"" not found." ; This "value" not found.
You can always use single quotation marks inside a string enclosed by double quotation marks, as the initialization for message shows, and vice versa.
You do not have to initialize an array. The ? operator lets you allocate space for the array without placing specific values in it. Object files contain records for initialized data. Unspecified space left in the object file means that no records contain initialized data for that address. The actual values stored in arrays allocated with ? depend on certain conditions. The ? initializer is treated as a zero in a DUP statement that contains initializers in addition to the ? initializer. If the ? initializer does not appear in a DUP statement, or if the DUP statement contains only ? initializers, the assembler leaves the allocated space unspecified.
Because strings are simply arrays of byte elements, the LENGTHOF, SIZEOF, and TYPE operators behave as you would expect, as illustrated in this example:
msg BYTE "This string extends ",
"over three ",
"lines."
lmsg EQU LENGTHOF msg ; 37 elements
smsg EQU SIZEOF msg ; 37 bytes
tmsg EQU TYPE msg ; 1 byte per element
The 8086-family instruction set has seven string instructions for fast and efficient processing of entire strings and arrays. The term string in string instructions refers to a sequence of elements, not just character strings. These instructions work directly only on arrays of bytes and words on the 808680486 processors, and on arrays of bytes, words, and doublewords on the 80386/486 processors. Processing larger elements must be done indirectly with loops.
The following list gives capsule descriptions of the five instructions discussed in this section.
|
Instruction |
Description |
|
|
|
|
MOVS |
Copies a string from one location to another |
|
STOS |
Stores contents of the accumulator register to a string |
|
CMPS |
Compares one string with another |
|
LODS |
Loads values from a string to the accumulator register |
|
SCAS |
Scans a string for a specified value |
All of these instructions use registers in a similar way and have a similar syntax. Most are used with the repeat instruction prefixes REP, REPE (or REPZ), and REPNE (or REPNZ). REPZ is a synonym for REPE (Repeat While Equal) and REPNZ is a synonym for REPNE (Repeat While Not Equal).
This section first explains the general procedures for using all string instructions. It then illustrates each instruction with an example.
The string instructions have specific requirements for the location of strings and the use of registers. To operate on any string, follow these three steps:
1. Set the direction flag to indicate the direction in which you want to process the string. The STD instruction sets the flag, while CLD clears it.
If the direction flag is clear, the string is processed upward (from low addresses to high addresses, which is from left to right through the string). If the direction flag is set, the string is processed downward (from high addresses to low addresses, or from right to left). Under MS-DOS, the direction flag is normally clear if your program has not changed it.
2. Load the number of iterations for the string instruction into the CX register.
If you want to process 100 elements in a string, move 100 into CX. If you wish the string instruction to terminate conditionally (for example, during a search when a match is found), load the maximum number of iterations that can be performed without an error.
3. Load the starting offset address of the source string into DS:SI and the starting address of the destination string into ES:DI. Some string instructions take only a destination or source, not both (see Table 5.1).
Normally, the segment address of the source string should be DS, but you can use a segment override to specify a different segment for the source operand. You cannot override the segment address for the destination string. Therefore, you may need to change the value of ES. For information on changing segment registers, see Programming Segmented Addresses in Chapter 3.
|
Note |
Although you can use a segment override on the source operand, a segment override combined with a repeat prefix can cause problems in certain situations on all processors except the 80386/486. If an interrupt occurs during the string operation, the segment override is lost and the rest of the string operation processes incorrectly. Segment overrides can be used safely when interrupts are turned off or with the 80386/486 processors.