Chapter 5

Defining and Using Complex
Data Types


With the complex data types available in MASM 6.1 arrays, strings, records, structures, and unions you can access data as a unit or as individual elements that make up a unit. The individual elements of complex data types are often the integer types discussed in Chapter 4, “Defining and Using Simple Data Types.”

“Arrays and Strings” reviews how to declare, reference, and initialize arrays and strings. This section summarizes the general steps needed to process arrays and strings and describes the MASM instructions for moving, comparing, searching, loading, and storing.

“Structures and Unions” covers similar information for structures and unions: how to declare structure and union types, how to define structure and union variables, and how to reference structures and unions and their fields.

“Records” explains how to declare record types, define record variables, and use record operators.

Arrays and Strings

An array is a sequential collection of variables, all of the same size and type, called “elements.” A string is an array of characters. For example, in the string “ABC,” each letter is an element. You can access the elements in an array or string relative to the first element. This section explains how to handle arrays and strings in your programs.

Declaring and Referencing Arrays

Array elements occupy memory contiguously, so a program references each element relative to the start of the array. To declare an array, supply a label name, the element type, and a series of initializing values or ? placeholders. The following examples declare the arrays warray and xarray:

warray  WORD    1, 2, 3, 4
xarray  DWORD   0FFFFFFFFh, 789ABCDEh

Initializer lists of array declarations can span multiple lines. The first initializer must appear on the same line as the data type, all entries must be initialized, and, if you want the array to continue to the new line, the line must end with a comma. These examples show legal multiple-line array declarations:

big             BYTE    21, 22, 23, 24, 25,
                        26, 27, 28 

somelist        WORD    10,
                        20,
                        30

If you do not use the LENGTHOF and SIZEOF operators discussed later in this section, an array may span more than one logical line, although a separate type declaration is needed on each logical line:

var1    BYTE    10, 20, 30
        BYTE    40, 50, 60
        BYTE    70, 80, 90

The DUP Operator

You can also declare an array with the DUP operator. This operator works with any of the data allocation directives described in “Allocating Memory for Integer Variables” in Chapter 4. In the syntax

count DUP (initialvalue [[, initialvalue]]...)

the count value sets the number of times to repeat all values within the parentheses. The initialvalue can be an integer, character constant, or another DUP operator, and must always appear within parentheses. For example, the statement

barray  BYTE    5 DUP (1)

allocates the integer 1 five times for a total of 5 bytes.

The following examples show various ways to allocate data elements with the DUP operator:

array   DWORD   10 DUP (1)                    ; 10 doublewords
                                              ;   initialized to 1
buffer  BYTE    256 DUP (?)                   ; 256-byte buffer

masks   BYTE    20 DUP (040h, 020h, 04h, 02h) ; 80-byte buffer
                                              ;   with bit masks
three_d DWORD   5 DUP (5 DUP (5 DUP (0)))     ; 125 doublewords
                                              ;   initialized to 0

Referencing Arrays

Each element in an array is referenced with an index number, beginning with zero. The array index appears in brackets after the array name, as in

array[9]

Assembly-language indexes differ from indexes in high-level languages, where the index number always corresponds to the element’s position. In C, for example, array[9] references the array’s tenth element, regardless of whether each element is 1 byte or 8 bytes in size.

In assembly language, an element’s index refers to the number of bytes between the element and the start of the array. This distinction can be ignored for arrays of byte-sized elements, since an element’s position number matches its index. For example, defining the array

prime  BYTE 1, 3, 5, 7, 11, 13, 17

gives a value of 1 to prime[0], a value of 3 to prime[1], and so forth.

However, in arrays with elements larger than 1 byte, index numbers (except zero) do not correspond to an element’s position. You must multiply an element’s position by its size to determine the element’s index. Thus, for the array

wprime  WORD 1, 3, 5, 7, 11, 13, 17

wprime[4] represents the third element (5), which is 4 bytes from the beginning of the array. Similarly, the expression wprime[6] represents the fourth element (7) and wprime[10] represents the sixth element (13).

The following example determines an index at run time. It multiplies the position by two (the size of a word element) by shifting it left:

        mov     si, cx          ; CX holds position number
        shl     si, 1           ; Scale for word referencing
        mov     ax, wprime[si]  ; Move element into AX

The offset required to access an array element can be calculated with the following formula:

nth element of array = array[(n-1) * size of element]

Referencing an array element by distance rather than position is not difficult to master, and is actually very consistent with how assembly language works. Recall that a variable name is a symbol that represents the contents of a particular address in memory. Thus, if the array wprime begins at address DS:2400h, the reference wprime[6] means to the processor “the word value contained in the DS segment at offset 2400h-plus-6-bytes.”

As described in “Direct Memory Operands,” Chapter 3, you can substitute the plus operator (+) for brackets, as in:

wprime[9]
wprime+9

Since brackets simply add a number to an address, you don’t need them when referencing the first element. Thus, wprime and wprime[0] both refer to the first element of the array wprime.

If your program runs only on an 80186 processor or higher, you can use the BOUND instruction to verify that an index value is within the bounds of an array. For a description of BOUND, see the Reference.

LENGTHOF, SIZEOF, and TYPE for Arrays

When applied to arrays, the LENGTHOF, SIZEOF, and TYPE operators return information about the length and size of the array and about the type of the
initializers.

The LENGTHOF operator returns the number of elements in the array. The SIZEOF operator returns the number of bytes used by the initializers in the array definition. TYPE returns the size of the elements of the array. The following examples illustrate these operators:

array   WORD    40 DUP (5)

larray  EQU     LENGTHOF array    ; 40 elements
sarray  EQU     SIZEOF   array    ; 80 bytes
tarray  EQU     TYPE     array    ;  2 bytes per element

num     DWORD   4, 5, 6, 7, 8, 9, 10, 11

lnum    EQU     LENGTHOF num      ;  8 elements
snum    EQU     SIZEOF   num      ; 32 bytes
tnum    EQU     TYPE     num      ;  4 bytes per element

warray  WORD    40 DUP (40 DUP (5)) 

len     EQU     LENGTHOF warray   ; 1600 elements
siz     EQU     SIZEOF   warray   ; 3200 bytes
typ     EQU     TYPE     warray   ;    2 bytes per element

Declaring and Initializing Strings

A string is an array of characters. Initializing a string like "Hello, there" allocates and initializes 1 byte for each character in the string. An initialized string can be no longer than 255 characters.

For data directives other than BYTE, a string may initialize only the first element. The initializer value must fit into the specified size and conform to the expression word size in effect (see “Integer Constants and Constant Expressions” in Chapter 1), as shown in these examples:

wstr    WORD    "OK"
dstr    DWORD   "DATA"  ; Legal under EXPR32 only

As with arrays, string initializers can span multiple lines. The line must end with a comma if you want the string to continue to the next line.

str1    BYTE    "This is a long string that does not ",
                "fit on one line."

You can also have an array of pointers to strings.

PBYTE   TYPEDEF PTR BYTE
        .DATA
msg1    BYTE    "Operation completed successfully."
msg2    BYTE    "Unknown command"
msg3    BYTE    "File not found"
pmsg    PBYTE   msg1            ; pmsg is an array
        PBBYTE  msg2            ;   of pointers to
        PBYTE   msg3            ;   above messages

Strings must be enclosed in single (') or double (") quotation marks. To put a single quotation mark inside a string enclosed by single quotation marks, use two single quotation marks. Likewise, if you need quotation marks inside a string enclosed by double quotation marks, use two sets. These examples show the various uses of quotation marks:

char    BYTE    'a'
message BYTE    "That's the message."       ; That's the message.
warn    BYTE    'Can''t find file.'         ; Can't find file.
string  BYTE    "This ""value"" not found." ; This "value" not found.

You can always use single quotation marks inside a string enclosed by double quotation marks, as the initialization for message shows, and vice versa.

The ? Initializer

You do not have to initialize an array. The ? operator lets you allocate space for the array without placing specific values in it. Object files contain records for initialized data. Unspecified space left in the object file means that no records contain initialized data for that address. The actual values stored in arrays allocated with ? depend on certain conditions. The ? initializer is treated as a zero in a DUP statement that contains initializers in addition to the ? initializer. If the ? initializer does not appear in a DUP statement, or if the DUP statement contains only ? initializers, the assembler leaves the allocated space unspecified.

LENGTHOF, SIZEOF, and TYPE for Strings

Because strings are simply arrays of byte elements, the LENGTHOF, SIZEOF, and TYPE operators behave as you would expect, as illustrated in this example:

msg     BYTE    "This string extends ",
                "over three ",
                "lines."

lmsg    EQU     LENGTHOF msg    ; 37 elements
smsg    EQU     SIZEOF   msg    ; 37 bytes
tmsg    EQU     TYPE     msg    ;  1 byte per element

Processing Strings

The 8086-family instruction set has seven string instructions for fast and efficient processing of entire strings and arrays. The term “string” in “string instructions” refers to a sequence of elements, not just character strings. These instructions work directly only on arrays of bytes and words on the 8086–80486 processors, and on arrays of bytes, words, and doublewords on the 80386/486 processors. Processing larger elements must be done indirectly with loops.

The following list gives capsule descriptions of the five instructions discussed in this section.

Instruction

Description

MOVS

Copies a string from one location to another

STOS

Stores contents of the accumulator register to a string

CMPS

Compares one string with another

LODS

Loads values from a string to the accumulator register

SCAS

Scans a string for a specified value

 

All of these instructions use registers in a similar way and have a similar syntax. Most are used with the repeat instruction prefixes REP, REPE (or REPZ), and REPNE (or REPNZ). REPZ is a synonym for REPE (Repeat While Equal) and REPNZ is a synonym for REPNE (Repeat While Not Equal).

This section first explains the general procedures for using all string instructions. It then illustrates each instruction with an example.

Overview of String Instructions

The string instructions have specific requirements for the location of strings and the use of registers. To operate on any string, follow these three steps:

1.  Set the direction flag to indicate the direction in which you want to process the string. The STD instruction sets the flag, while CLD clears it.

If the direction flag is clear, the string is processed upward (from low addresses to high addresses, which is from left to right through the string). If the direction flag is set, the string is processed downward (from high addresses to low addresses, or from right to left). Under MS-DOS, the direction flag is normally clear if your program has not changed it.

2.  Load the number of iterations for the string instruction into the CX register.

If you want to process 100 elements in a string, move 100 into CX. If you wish the string instruction to terminate conditionally (for example, during a search when a match is found), load the maximum number of iterations that can be performed without an error.

3.  Load the starting offset address of the source string into DS:SI and the starting address of the destination string into ES:DI. Some string instructions take only a destination or source, not both (see Table 5.1).

Normally, the segment address of the source string should be DS, but you can use a segment override to specify a different segment for the source operand. You cannot override the segment address for the destination string. Therefore, you may need to change the value of ES. For information on changing segment registers, see “Programming Segmented Addresses” in Chapter 3.

 

Note

Although you can use a segment override on the source operand, a segment override combined with a repeat prefix can cause problems in certain situations on all processors except the 80386/486. If an interrupt occurs during the string operation, the segment override is lost and the rest of the string operation processes incorrectly. Segment overrides can be used safely when interrupts are turned off or with the 80386/486 processors.