
Chapter 2: Advanced HLA Programming
Note: This text is being made available in alpha form for proofreading and evaluation purposes only. This book contains known problems, use the information in this book at your own discretion. Please do not distribute this book. Send all corrections to rhyde@cs.ucr.edu.
©2003, Randall Hyde, All Rights Reserved
2.1: Using Advanced HLA Features
This book uses several advanced features in the HLA language. Chances are pretty good that unless you've spent a lot of time studying HLA prior to reading this book, you're not going to be familiar with many of these features. Now before you complain about the choice of using these features, and how this book should have avoided them in order to make the material more accessible to "less than expert HLA programmers," just keep in mind that many of these advanced features were added to the HLA language specifically to support Windows programming. Therefore, avoiding these features would prove to be counter-productive.
Much of the advanced HLA programming we're going to wind up doing involves HLA compile-time language statements (including the macro processor), data declarations, high-level control structures, and high-level procedure call syntax. While The Art of Assembly Language covers each of these topics, this book isn't going to make the assumption that you've carefully read those sections of The Art of Assembly Language (or even read "AoA" at all).Whether or not you've studied this material in that book, you'll definitely want to carefully read this chapter. It contains important information that the rest of this book is going to use, with an emphasis on those HLA features that are important to Windows programmers.
2.2: HLA's High-Level Data Structure Facilities
One of the primary differences between high level languages and "traditional" assemblers is the support for abstract data types in high level languages versus only supporting low-level machine data types (e.g., bytes, words, and double words) in assembly language. One of the hallmarks of a high level assembler (like HLA, MASM, or TASM) is support for abstract data types and user-defined data types. The HLA assembler provides about the richest set of built-in data types you'll find in any assembler plus the ability to easily create new data types as the need arises. Because Windows uses sophisticated data structures in calls to Win32 API functions, you'll want to familiarize yourself with some of HLA's structured data types in order to write code that interfaces better with Windows.
2.2.1: Basic Data Types
Whereas a traditional assembler provides a few machine-level data types based on the sizes of those types, and then leaves it up to the assembly language programmer to manually interpret those types in a program, HLA provides a rich set of basic data types that denote the purpose of the object in addition to its size. For example, with a traditional assembler you might declare a variable to be a byte object; within your assembly language source code it is up to you to decide whether you want to treat that byte as a character variable, an unsigned integer, a boolean object, a signed integer, an enumerated data type, or some other value that you can represent with eight bits. With HLA, you can actually declare your variables to be char, boolean, enum, int8, uns8, etc., and make the intent of that variable's usage much clearer; HLA will even do some type checking when you use these data types to help find logical errors in your programs.
The first thing to note about HLA's abstract data types is that HLA is an assembly language. At the machine instruction level, the 80x86 CPU works with bytes, words, double-words, and other such primitive machine data types. Once you've loaded an int8 object into AL, for example, it is up to you to choose the correct machine instructions to treat that object as a signed integer. There is nothing stopping you from treating this as an unsigned integer object, as a character value, as a boolean value, etc. Do not, however, get the impression that HLA's abstract data typing facilities serve little purpose beyond providing documentative services that make your programs a little easier to read. HLA provides high-level control structures and procedure calling syntax, as well as a built-in compile-time language, that can take advantage of HLA's type-checking facilities. To achieve maximum benefit from HLA, therefore, you should choose the proper data types for your variables, constants, and other objects in HLA.
Although there are some very good reasons for using abstract (high-level) data types in an assembly language program, HLA doesn't lose sight of the fact that it is an assembler. Sometimes the use of high-level data abstractions interferes with the assembly language programming paradigm. In such situations, you'll want to use low-level machine data types to make your code more efficient. To support such situations, HLA supports the full range of low-level machine data types you'll find on the 80x86 CPU, including bytes, words, double words, quad words, ten-byte words, and long words (128-bit objects/16-byte objects). HLA uses the following type names for these primitive data types:
byte Eight-byte (one byte) objects.
word 16-bit (two byte) objects
dword 32-bit (four byte) double word objects
qword 64-bit (eight byte) quad word objects
tbyte 80-bit (ten byte) ten byte objects
lword 128-bit (16 byte) long word objects
Note that some individuals might consider HLA's real32, real64, and real80 data types to be primitive because the 80x86 CPU directly supports these types. However, HLA treats these types as abstract data types and performs type checking on them because of the special nature of their data.
The low-level data types provide a "type-escape" mechanism for HLA. The only type checking that HLA does on these machine level types is to ensure that the size of the object is appropriate for its current use (e.g., HLA will report an error if you try to load a byte object into a word-sized register). Other than the size check, which you can override if you desire, HLA will allow the use of any similarly-sized scalar object in place of one of these data types and you can use an object that has one of these data types in place of a similarly sized scalar object.
Note the use of the term "scalar" in the previous paragraph. What this means is that you can substitute any like-size single variable for a byte, word, dword, qword, tbyte, or lword object in an HLA statement. You may not, however, directly substitute an array, record, union, class, or other composite type in place of one of these data types, even if the size of the composite object is the same as the primitive data type. For example, you cannot directly substitute a two-byte array in place of a word object (though, using coercion, even this is possible in HLA; you'll find out more about coercion in just a little bit).
2.2.1.1: Eight Bit Data Types
HLA supports the following eight-bit data types, all of which are directly compatible with the byte type:
Directly compatible means that you may supply a byte object anywhere one of these other types is required and you may also supply any one of these other types wherever a byte type is required. Note that these data types are not compatible amongst themselves; that is, you cannot always supply, say, a char data type where an uns8 object is required. The byte type is the only data type that is universally compatible with all of these types.
Note that HLA assigns the byte type to the 80x86 eight-bit registers, AL, AH, BL, BH, CL, CH, DL, and DH. Therefore, these registers are automatically compatible with any byte-sized object using one of HLA's single byte types.
HLA uses the boolean type to represent true/false values. Technically, a boolean variable requires only a single bit to represent the two possible values; however, the 80x86 CPU can only efficiently access objects that are multiples of eight bits. Therefore, HLA sets aside a whole byte for a boolean object. HLA represents false with the value zero and true with the value one. Traditionally, programmers have used zero for false and any non-zero value to represent true; therefore, HLA's definition is upwards compatible to the traditional (i.e., "C") definition. In general, when using boolean values, you should always store a zero or one into a boolean variable and test for zero/not zero when checking boolean values. This will provide the maximum compatibility with all uses of this data type. Note that arithmetic expressions involving boolean operands may produce strange results when combining operands using the AND and OR operators. The next chapter discusses this issue in detail. Just be aware of this for the time being.
Enumerated data types are data types that associate sequential numeric values with symbolic constants. Internally, HLA treats enum objects as unsigned integers (e.g., for purposes of comparison), though enum objects are not directly compatible with uns8 objects. An enumerated data type in HLA consists of a list of identifiers to which HLA assigns sequential unsigned integer values, starting with zero. For example, consider the following HLA type declaration:
type color_t :enum{ red, green, blue }:HLA internally assigns the value zero with red, the value one with green, and the value two with blue. It is important to note, however, that red, green, and blue are not uns8 objects. They are of type color_t and this will make a difference in certain circumstances. For more details on enumerated data types in HLA, please consult the HLA Reference Manual.
HLA uses the uns8 data type to represent unsigned integer values in the range 0..255. This data type is fundamentally equivalent to the byte type except that uns8 objects are not automatically compatible with the other eight-bit data types. In boolean expressions, HLA always defaults to using unsigned comparisons and conditional jump instructions when the operands are uns8 operands.
HLA uses the int8 type to represent signed integers in the range -128..+127. In boolean expressions, HLA always uses signed comparisons and conditional jumps with at least one of the operands is signed (note that if you mix an unsigned and signed operand across a relational operator, HLA defaults to use a signed comparison).
Internally, HLA treats the char data type as an eight-bit unsigned integer value. Though not directly compatible with the uns8 type, HLA uses unsigned comparisons and conditional jumps whenever it encounters a char operand in a run-time boolean expression.
As noted earlier, the byte type is fundamentally compatible with all other scalar eight-bit types. When two byte types appear in a run-time boolean expression around the same operand, HLA compares the two values using an unsigned comparison. If a byte operand appears with another eight-bit type around some relational operator in a run-time boolean expression, then HLA uses whatever type of comparison is appropriate for that other operand (i.e., unsigned for enum, char, boolean, and uns8 operands or signed for int8 operands).
There are three major areas where HLA differentiates the types of eight-bit objects: in run-time boolean expressions, in procedure call parameter lists, and in compile-time (constant) expressions. We'll address each of these areas separately in later sections of this chapter.
2.2.1.2: Sixteen-Bit Data Types
HLA supports the following 16-bit data types, all of which are directly compatible with the word type:
HLA uses int16 objects to represent signed values in the range -32768..+32767. Whenever HLA encounters an int16 object in a run-time boolean expression, it uses signed comparisons and conditional jump instructions, even if the operand opposite the int16 object in a relational expression is an unsigned operand.
HLA uses uns16 objects to represent unsigned values in the range 0..65535. Whenever HLA encounters an uns16 object in a run-time boolean expression, and the corresponding object opposite the relational operator is not an int16 value, HLA uses unsigned comparison and jump instructions.
HLA uses wchar objects to represent unicode characters. These are unsigned objects (though not directly compatible with uns16 values) so HLA uses unsigned comparisons and conditional jump instructions whenever you compare wchar objects in a run-time boolean expression.
Objects of type word are directly compatible with any of the other 16-bit ordinal data types. If an object of type word appears in a run-time boolean expression, then HLA will emit code that uses the type of comparison required by the other operand in the relational expression. If the other operand is also of type word, HLA uses an unsigned comparison and jump to compare the two operands.
2.2.1.3: Thirty-Two-Bit Data Types
HLA supports the following 32-bit data types, all of which are directly compatible with the dword data type:
HLA also supports the following 32-bit data types, though these types are not directly compatible with the dword type:
The behavior of the uns32 and int32 types, with respect to the dword word, is similar to the relationship between uns8/int8 and byte, and uns16/int16 and word. Take a look at the previous sections for more details.
HLA generally treats pointer types as dword objects within most run-time expressions. In particular, HLA will perform an unsigned comparison when comparing two pointer objects.
Procedure pointer types are also treated just like dwords by HLA with one major exception: HLA provides implicit syntax for calling a procedure indirectly using HLA's high level procedure call syntax. So although you can treat a procedure pointer just like a dword object within a run-time boolean/relational expression, you cannot directly call a procedure via a pointer to a procedure in a dword variable using HLA's high-level procedure call syntax. Such activity is only available when using procedure pointers (or when specifying an explicit coercion operator). We'll take another look at HLA's procedure pointer objects in a few sections.
Although HLA string variables are pointers, HLA treats strings as a unique data type for the purposes of type checking. In particular, dword and string objects aren't always interchangeable nor are they even type compatible in all cases. To confuse the issue even farther, certain Win32 API function calls allow you to pass a small integer value (16 bits or less) in place of a string pointer as a function parameter. Therefore, in certain cases, HLA will allow you to substitute a small integer constant in place of an actual string pointer when calling a function (this use, however, is being depreciated, so don't count on it in future versions of HLA). Generally, though, you can use a string variable (not a constant) wherever a dword object is expected; you may not, however, always use a dword object where a string object is expected. This same discussion applies to the unicode data type (which is a string of unicode characters).
Because floating point values on the 80x86 are fundamentally different than integer values, HLA does not allow you to mix real32 and dword objects in the same expression. Indeed, HLA doesn't even allow real32 objects in a boolean/relational expression (though you may pass real32 objects as procedure parameters). For that reason, we'll not consider real32 objects here. See The Art of Assembly Language for more details concerning floating point arithmetic on the 80x86.
2.2.1.4: Sixty-Four-Bit Data Types
HLA supports a couple of 64-bit data types:
As with the 32-bit data types, qword is directly compatible with int64 and uns64 types. The issue of int64 and uns64 compatibility isn't as much of an issue because the 80x86 doesn't directly support 64-bit comparisons with the integer instruction set1. Also like the 32-bit data types, the real64 type is not directly compatible with the qword type, nor may you compare real64 objects within an HLA run-time boolean expression.
2.2.1.5: Eighty-Bit Data Types
HLA supports two 80-bit data types - the tbyte type and the real80 type. Neither type is directly compatible with the other and HLA doesn't allow the comparison of 80-bit objects within a run-time boolean expression. Tbyte objects usually hold BCD values and real80 objects hold floating point values. We'll not consider these two types any farther, here.
2.2.1.6: One Hundred Twenty-Eight Bit Data Types
HLA supports several 128-bit data types, they are:
- cset
- int128
- lword
- real1282
- uns128
Though HLA does not support the use of int128, lword, and uns128 objects in a run-time boolean expression (because the 80x86 CPU doesn't support the direct comparison of 128-bit objects using the integer instruction set), you can still pass these objects as procedure parameters and use them in compile-time expressions, so it's worth mentioning that the relationship they share is similar to the dword/uns32/int32 relationship sans the ability to compare them. The cset data type is, technically, a composite data type (character sets can be thought of as an array of 128 bits); therefore, there is no fundamental relationship between an lword and a cset other than their size. similarly, real128 data types are also composite objects (it's an array of two real64 objects) so they aren't directly compatible with lwords, either. Because Windows doesn't use 128-bit data types very often, we'll not consider these data types any further.
2.2.2: Composite Data Types
HLA supports a fair number of composite data types as well as scalar data types. As the name suggests, a composite data type is one that is composed (or made up) of other data types. HLA provides direct support for the following composite data types: arrays, records (structures), unions, classes, pointers, procedure pointers, character sets, and thunks. Character sets, pointers, and procedure pointers are a special case - HLA provides direct support for these composite data types so that you can often treat these types as scalar (though not ordinal) types. Though it's sometimes convenient to treat these objects as scalars, there are times when it's more convenient to think of them as composite objects. Thus we'll consider them again in the following subsections.
2.2.2.1: HLA Array Types
HLA uses a high-level-like syntax for the declaration of arrays in your assembly language programs. A typical (run-time) array variable declaration might look like the following:
static runTimeArray :int32[16];This declaration sets aside storage for a sequence of 16 32-bit integers in memory. The name runTimeArray refers to the address of the first element of this array. The next variable immediately following this array in memory will be at an address that is (at least) 64 bytes later (16 array elements times four bytes for each element is 64 bytes). You can even declare multi-dimensional arrays in HLA using a high level like syntax, e.g.,
type two_dim_array : char[ 4, 16 ]; // 64 bytes of storage three_dim_array : dword[ 4, 2, 2 ]; // Also 64 bytes of storage four_dim_array : real32[ 2, 2, 2, 2 ]; // Also 64 bytes of storageNote that multi-dimensional arrays consume as much storage as the product of their dimensions, also multiplied by the size of a single array element.
As is true in all assembly languages, indexes into an array start at zero and the last element (of a particular dimension or index) ends at the array bounds minus one. For example, the elements of the runTimeArray example given earlier range from index zero through 15.
One very fundamental point, which is the cause of much confusion to assembly language programmers and especially to HLA programmers, is that you do not index into an array using the same syntax and concepts as you do in a high level language. The fact that HLA uses a high-level language-like syntax for array declarations falsely propagates this idea. The fact that HLA's compile-time language works the way high level languages do, in direct contrast to the run-time (i.e., 80x86 assembly) language also aids in this confusion. Don't get caught in this trap!
Although both The HLA Reference Manual and The Art of Assembly Language Programming discuss accessing elements of arrays, emails and posts to assembly language related newsgroups indicate that this is one area that causes problems for assembly programmers (especially beginning assembly programmers). So it's worthwhile repeating some of that information here.
The most fundamental mistake people make when indexing into array in assembly language code is that they forget that they must multiply the index into the array by the size of an array element. More often than not, a beginning assembly language programmer will translate code like the following:
eax = runTimeArray[ i ]; // "C" exampleinto assembly code that looks like this:
mov( i, ebx ); // Must load array index into a 32-bit register to use indexed addressing mov( runTimeArray[ ebx ], eax );Nice try, but completely incorrect. The problem is that the 80x86 indexed addressing modes compute byte offsets from some base address, not element indexes into an array. That is, the "runTimeArray[ebx]" addressing mode computes the effective address obtained by adding the current value in EBX to the address associated with the runTimeArray label. If EBX contains five and the memory address associated with the runTimeArray label is $40_0000, then this addressing mode references the object at address $40_0005 in memory. Note that this is not the address of the sixth element of the runTimeArray array. Each element of the runTimeArray object consumes four bytes, so the elements appear in memory as follows:
The address $40_0005 (obtained as the effective address of the expression runTimeArray[ebx] when EBX contains five) is actually the address if the second byte of the runTimeArray[1] element. This probably isn't what the programmer had in mind.
As you can see in this list, each element of runTimeArray is located four bytes in memory apart. This makes perfect sense because each element of the array consumes four bytes of memory and we'd like the array elements to occupy adjacent memory cells (that is, each element of the array should immediately follow the previous element of the array in memory). In order to achieve this, you must multiply the array index by the size of each array element (in bytes). Because each element of the array is a double word (four bytes), we have to multiply the index into the array by four and add the base address of the array (that is, the address of the first element) to obtain the actual address of the desired element. In the current example, if we want to access the element at index five (that is, the sixth element of the array), then we need to multiply the index (five) by four before adding in the base address. Five times four is 20 ($14), adding this to the base address produces $40_0014 in the current example. If you'll look at the list for runTimeArray element addresses, you'll find that runTimeArray[5] does, indeed, sit at address $40_0014 in memory.
Because programs commonly access elements of arrays whose element sizes are one, two, four, or eight bytes long, the 80x86 CPU provides a special set of addressing modes, the scaled indexed addressing modes, that let you easily compute the index into these arrays. These addressing modes use syntax like the following:
byteArray[ eax*1 ] -- note that this is functionally identical to byteArray[ ebx ] wordArray[ ebx*2 ] dwordArray[ ecx*4 ] qwordArray[ edx*8 ]The register you specify in the scaled indexed addressing mode must be a general-purpose 32-bit integer register; see The Art of Assembly Language for more details.
If you need to access an element of an array whose element sizes are not one, two, four, or eight bytes, then you've got to manually multiply the index by the size of an array element (typically using a shl or intmul instruction). For example, if you've got an array of real80 objects you'll need to multiply the index into the array by 10 (10 bytes = 80 bits) before accessing the particular element. Here's some sample code that demonstrates this:
// Push element "i" of real80Array onto the FPU stack: intmul( 10, i, ecx ); // ecx = 10 * i fld( real80Array[ ecx ] );Note that you do not use a scaled indexed addressing mode when you manually multiply the index by the element size. You've already done the multiplication to obtain a byte offset into the array.
Accessing an element of a multidimensional array requires the use of a more complex calculation. In the Windows API world, all two-dimensional arrays are stored in a form known as row-major order. Rather than get into the complexities of row-major (versus column-major) addressing here, we'll just use the HLA Standard Library array.index function that does multi-dimensional array computations for you automatically. Those who are interested in the low-level implementation of multi-dimensional array access should take a look at the chapter covering arrays in The Art of Assembly Language Programming.
The HLA Standard Library array.index macro automatically computes the index into an array (with any number of dimensions). Because this is a macro, it directly generates code that is almost as good as hand-written assembly language code (in a few special cases you might be able to generate slightly better code by hand, but most of the time this macro generates exactly the same code you'd manually write). To use the array.index macro, you must either include stdlib.hhf at the beginning of your source file or, at least, include arrays.hhf. The array.index macro uses the following invocation syntax:
array.index( <32-bit register>, <array name>, <<list of array indices>> );The first argument must be a 32-bit general purpose integer register (i.e., EAX, EBX, ECX, EDX, ESI, EDI, EBP, or ESP). Generally, you would not use EBP or ESP here (because they have special purposes and they generally aren't available for use in accessing array elements).
The second argument to the array.index macro is the name of the array whose elements you want to access. HLA actually supports dynamically allocated multi-dimensional arrays in the arrays.hhf module, we'll not worry about those here (see the HLA Standard Library documentation for more details); instead, this book will simply assume that you've supplied the name of an array you've declared in a static, readonly, storage, or var declaration section, or as a procedure pass by value parameter. Any of the array examples appearing in this section would work fine, for example.
The third (through nth) parameter(s) specify the constants, variables, or registers that you want to use as the indexes into this array. The index objects must all be 32-bit integer values (presumably unsigned, though the macro accepts signed values too).
The array.index macro computes the address of the specified array element and leaves that address sitting in the 32-bit register you specify as the first argument in the array.index argument list. To access the specified array element, all you need to do is reference the memory location pointed at by that 32-bit register after the array.index invocation. Note that you do not have to multiply this value by the size of the array element (that's already done for you) nor do you have to add in the base address of the array (that's also done for you). Here are some examples of accessing array elements using the array.index macro:
// j = runTimeArray[ i ]; array.index( ebx, runTimeArray, i ); mov( [ebx], eax ); mov( eax, j ); // c = two_dim_array[ i, j ]; array.index( edx, two_dim_array, i, j ); mov( [edx], al ); mov( al, c ); // esi = three_dim_array[ i, j, k ] array.index( ecx, three_dim_array, i, j, k ); mov( [ecx], edx ); // push four_dim_array[ eax, ebx, ecx, edx ] onto the FPU stack: array.index( edi, four_dim_array, eax, ebx, ecx, edx ); fld( (type real32 [edi]) );The HLA compile-time language provides some built-in compile-time functions that return values of interest to an HLA programmer using arrays in an HLA program. During compilation, HLA replaces each of these compile-time functions with a constant that is the result of computing the compile-time function's result. You may use these compile-time functions anywhere a constant is legal in an HLA program. See Table 2-1 for a list of the applicable functions.
Table 2-1: HLA Compile-Time Functions That Provide Useful Array Information
You can use the compile-time functions in Table 2-1 to automate much of your array calculations. This can make your programs much easier to maintain by avoiding "hard-coded" constants in your program. For example, suppose you have an array of some type (myType_t) whose definition is subject to change as you modify the program. Computing an index into this array is a bit of a problem; for example, if you know that the size of an element in the array is six bytes, you could (manually) compute an index into that array using code like the following:
intmul( 6, index, ebx ); lea( eax, myArray[ebx] ); // Load eax with the address of myArray[index].The problem with this approach is that it fails if you decide to change the definition of myType_t (myArray's type) without manually adjusting this code to deal with the fact that myArray's size is no longer six bytes. Now consider the following implementation of this code sequence:
intmul( @elementsize( myArray ), index, ebx ); lea( eax, myArray[ ebx ] );This code sequence automatically recomputes the appropriate index into the array regardless of myArray's size (because HLA automatically substitutes the correct size for @elementsize). About the only drawback to this scheme is that it hides the fact that certain indexes (1, 2, 4, and 8) might be better computed using an 80x86 scaled indexed addressing mode, though it is possible to fix this problem by using a macro (the HLA Standard Library array.index function, for example, automatically checks for these special element sizes and adjusts the code it outputs in an appropriate fashion for these special cases).
HLA is interesting insofar as it allows you to declare and use compile-time constant arrays as well as run-time memory arrays. Few other languages, much less assemblers, provide the concept of a compile-time array. HLA's compile-time language makes extensive use of these compile-time arrays and you can use compile-time arrays (also known as array constants) to initialize run-time arrays at program load time (i.e., arrays you declare in the static and readonly sections). There is, however, one important issue of which you need to be aware - HLA compile-time arrays/array constants use semantics that are appropriate for the HLA compile-time language (which is a high level language). These semantics are different than the semantics for a run-time (assembly language) array. In particular, indexes into array constants are actual array indexes, not byte offsets. Therefore, you do not multiply the index of a constant array by the element size. Furthermore, HLA automatically computes the row-major index into a constant array for you, you do not have to use array.index or manually compute the index yourself. We'll return to this subject a little later in this chapter when we discuss the HLA compile-time language. Just be aware of the fact that compile-time array constants are fundamentally different that run-time assembly language arrays.
2.2.2.2: HLA Union Types
A discriminant union is a data type that combines several different objects so that they use the same storage in memory. In theory, access to each of the variables in a discriminant union are mutually exclusive; because the variables share the same storage in memory, storing a value into one of these variables disturbs the values of the other variables sharing that same memory address. Some high level language programmers use discriminant union types as a sneaky way to retype data in memory. Obviously, this feature is of little value in an assembly language program where is trivially easy to recast data in memory as some other type. However, there are two primary purposes for using a discriminate union, both of which apply in assembly language as well as in high level languages: memory conservation and the creation of variant types. Because many assembler authors perceive discriminant unions as a type casting trick, very few assemblers support the union data type. Unfortunately (for those assemblers), many Windows data types use unions and this makes using such assemblers inconvenient when working with these Windows data structures. Fortunately, HLA provides full support for discriminant union data types, so this isn't a problem at all in HLA.
HLA implements the discriminant union type using the union..endunion reserved words. The following HLA type declaration demonstrates a union declaration:
type allInts: union i8 :int8; i16 :int16; i32 :int32; endunion;Each entry variable declaration appearing between the union and endunion keywords is a field of that union. All fields in a union have the same starting address in memory. The size of a union object is the size of the largest field in the union. The fields of a union may have any type that is legal in a variable (HLA var) declaration section.
Given a union object, say i of type allInts, you access the fields of the union by specifying the union variable name, a period ("dot"), and the field name. The following 80x86 mov instructions demonstrate how to access each of the fields of an i variable of type allInts:
mov( i.i8, al ); mov( i.i16, ax ); mov( i.i32, eax );A good example of a union type in action is the Windows w.CHARTYPE data type:
type CHARTYPE: union UnicodeChar: word; AsciiChar: byte; endunion;The Windows w.CHARTYPE is a perfect example of a union data structure that encapsulates a couple of mutually exclusive values (that is, you wouldn't use the UnicodeChar and AsciiChar fields of this union simultaneously). Most Windows applications use either ASCII (ANSI) characters or Unicode characters; few applications would attempt to use both character sets in the same program (other than to convert one form to the canonical form that the application uses). Windows, as a general rule, provides two sets of API functions that deal with characters - one for ASCII and one for Unicode. By playing games with C preprocessor macros and accompanying data types (like w.CHARTYPE), Windows allows C programmers to call either set of functions while passing the same set of parameters. Of course, the programmer has to know which character type to supply (i.e., whether to store their data in the Unicode or AsciiChar fields), but the interface is the same either way.
Most of the unions you'll find in Windows data structures represent mutually exclusive data fields, like the fields belonging to w.CHARTYPE. Depending on your circumstances (e.g., whether your program works with ASCII or Unicode characters) you use one or the other of these two fields in your program and you will probably not use the other field at all. Another common use of discriminant union types is to create variant objects. A variant object is a variable that has a dynamic type, that is, the program determines the type of the object at run-time and, in fact, the variable's type can change under program control. Dynamically typed objects are great when you cannot anticipate exactly what type of data the user will enter while the program is running. The typical implementation of a variant data type consists of two components: a tag value that describes the current (dynamic) type of the object and a discriminant union value that can hold any of the possible types. During program execution, the software uses a switch/case statement (or similar logic) to execute appropriate code based on the current type of the dynamically typed object. Although the use of variant types is interesting and useful, Windows doesn't make much use of this advanced feature, so we won't discuss it any farther here.
2.2.2.3: HLA Record (Structure) Types
Records (structures) are probably the most important high level data structure an assembler can provide for Win32 assembly language programmers. In fact, if an assembler does not provide decent, built-in, capabilities for declaring and using records, you simply don't want to use that assembler for developing Win32 programs. Windows APIs make extensive use of records (C structs) and attempting to write assembly code with an assembler that doesn't properly support records is going to be a burden. Fortunately, HLA is not one of these assemblers - HLA provides excellent support for records. In this section we'll explore HLA's support for this important user-definable data type.
The record data type provides a mechanism for collecting together different variables into a physical unit so the program can treat this disparate collection of objects as a single entity. At the most basic level an array is also such a data type. However, the difference between an array and a record is the fact that all the elements of an array must have the same type, this restriction does not exist for the elements (fields) of a record. Another difference between an array and a record is that you reference the elements of an array using a numeric index whereas you access the fields of a record by the fields' names.
HLA's records allow programmers to create data types whose fields can be different types. The following HLA type declaration defines a simple record with four fields:
type Planet: record x :int32; y :int32; z :int32; density :real64; endrecord;Objects of type Planet will consume 20 bytes of storage at run-time.
The fields of a record may be of any legal HLA data type including other composite data types. Like unions, anything that is legal in a var section is a legal field of a record. Also like unions, you use the dot-notation to access fields of a record object. For example, you could use the following 80x86 instructions to access an object plnt of type Planet:
mov( plnt.x, eax ); mov( edx, plnt.y ); add( plnt.z, ebx ); fld( plnt.density );There are several advantages to using records versus simply creating separate variables for each of the fields of the record. First of all, a record type provides a template that makes it very easy to create new instances (copies) of that record type. For example, if you need nine planets, you can easily create them in an HLA static section as follows:
static Mercury :Planet; Venus :Planet; Earth :Planet; Mars :Planet; Jupiter :Planet; Saturn :Planet; Uranus :Planet; Neptune :Planet; Pluto :Planet;Imagine the mess you would have if you had to create four separate variables for each of these nine planets (e.g., Mercury_x, Mercury_y, Mercury_z, Mercury_density, etc.).
Another solution, of course, is to create an array of records, with one array element for each planet. You can easily do this as follows:
static Planets :Planet[9];Each element of the array will have all the fields associated with the Planet data type. Because HLA treats an array name as though it were the first element of the array, you can access the fields of the Planets array using the same dot-notation syntax you use to access fields of a single Planet object, e.g.,
mov( Planets.x, eax ); // Moves field "x" of Planets[0] into eax.If you want to access a field of some element other than the first element of an array of records, you tack on the indexed addressing mode and supply an appropriate index into the array, e.g.,
intmul( @elementsize( Planets ), index, ebx ); mov( Planets.x[ebx], eax );Note that the indexed addressing mode specification follows the entire variable's name, unlike most high level languages, like C, where you would inject the array index into the name immediately after Planets. Keep in mind that you can also use HLA's array.index macro to index into an array of records, though you will need to coerce the resulting pointer to the Planet type in order to access the fields of that record, e.g.,
array.index( ebx, Planets, index ); mov( (type Planet [ebx]).x, eax );In addition to saving you the effort of creating separate variables for each field, records also let you encapsulate related data, making it easier to copy record variables as single entities, manipulate record objects via pointers, and pass records as parameters to procedures. For example, it's much easier to pass a single parameter of type Planet to some procedure rather than passing the four separate fields that the Planet type encapsulates (if you don't think passing four parameters is a big deal, just keep in mind that many Win32 structures, that often appear as procedure parameters, have dozens of fields).
Beyond the issues of convenience and efficiency, there is one other important reason for using records in your assembly language programs: they're easier to read and maintain. Assembly language code is hard enough to read as it is; every opportunity you get to produce more readable assembly code is an opportunity you should take.
Record types may inherit fields from other record types. Consider the following two HLA type declarations:
type Pt2D: record x: int32; y: int32; endrecord; Pt3D: record inherits( Pt2D ) z: int32; endrecord;In this example, Pt3D (point 3-D) inherits all the fields from the Pt2D (point 2-D) type. The inherits keyword tells HLA to copy all the fields from the specified record (Pt2D in this example) to the beginning of the current record declaration (Pt3D in this example). Therefore, the declaration of Pt3D above is equivalent to:
type Pt3D: record x: int32; y: int32; z: int32; endrecord;In some special situations you may want to override a field from a previous field declaration. For example, consider the following record declarations:
type BaseRecord: record a: uns32; b: uns32; endrecord; DerivedRecord: record inherits( BaseRecord ) b: boolean; // New definition for b! c: char; endrecord;Normally, HLA will report a "duplicate" symbol error when attempting to compile the declaration for DerivedRecord since the b field is already defined via the inherits( BaseRecord ) option. However, in certain cases it's quite possible that the programmer wishes to make the original field inaccessible in the derived class by using a different name. That is, perhaps the programmer intends to actually create the following record:
type DerivedRecord: record a: uns32; // Derived from BaseRecord b: uns32; // Derived from BaseRecord, but inaccessible here. b: boolean; // New definition for b! c: char; endrecord;HLA allows a programmer explicitly override the definition of a particular field by using the overrides keyword before the field they wish to override. So while the previous declarations for DerivedRecord produce errors, the following is acceptable to HLA:
type BaseRecord: record a: uns32; b: uns32; endrecord; DerivedRecord: record inherits( BaseRecord ) overrides b: boolean; // New definition for b! c: char; endrecord;Normally, HLA aligns each field on the next available byte offset in a record. If you wish to align fields within a record on some other boundary, you may use the align directive to achieve this. Consider the following record declaration as an example:
type AlignedRecord: record b:boolean; // Offset 0 c:char; // Offset 1 align(4); d:dword; // Offset 4 e:byte; // Offset 8 w:word; // Offset 9 f:byte; // Offset 11 endrecord;Note that field d is aligned at a four-byte offset while w is not aligned. We can correct this problem by sticking another align directive in this record:
type AlignedRecord2: record b:boolean; // Offset 0 c:char; // Offset 1 align(4); d:dword; // Offset 4 e:byte; // Offset 8 align(2); w:word; // Offset 10 f:byte; // Offset 12 endrecord;Be aware of the fact that the align directive in a record only aligns fields in memory if the record object itself is aligned on an appropriate boundary. For example, if an object of type AlignedRecord2 appears in memory at an odd address, then the d and w fields will also be misaligned (that is, they will appear at odd addresses in memory). Therefore, you must ensure appropriate alignment of any record variable whose fields you're assuming are aligned.
Note that the AlignedRecord2 type consumes 13 bytes. This means that if you create an array of AlignedRecord2 objects, every other element will be aligned on an odd address and three out of four elements will not be double-word aligned (so the d field will not be aligned on a four-byte boundary in memory). If you are expecting fields in a record to be aligned on a certain byte boundary, then the size of the record must be an even multiple of that alignment factor if you have arrays of the record. This means that you must pad the record with extra bytes at the end to ensure proper alignment. For the AlignedRecord2 example, we need to pad the record with three bytes so that the size is an even multiple of four bytes. This is easily achieved by using an align directive as the last declaration in the record:
type AlignedRecord2: record b:boolean; // Offset 0 c:char; // Offset 1 align(4); d:dword; // Offset 4 e:byte; // Offset 8 align(2); w:word; // Offset 10 f:byte; // Offset 12 align(4) // Ensures we're padded to a multiple of four bytes. endrecord;Note that you should only use values that are integral powers of two in the align directive.
If you want to ensure that all fields are appropriately aligned on some boundary within the record, but you don't want to have to manually insert align directives throughout the record, HLA provides a second alignment option to solve your problem. Consider the following syntax:
type alignedRecord3 : record[4] << Set of fields >> endrecord;The "[4]" immediately following the record reserved word tells HLA to start all fields in the record at offsets that are multiples of four, regardless of the object's size (and the size of the objects preceding the field). HLA allows any integer expression that produces a value in the range 1..4096 inside these parenthesis. If you specify the value one (which is the default), then all fields are packed (aligned on a byte boundary). For values greater than one, HLA will align each field of the record on the specified boundary. For arrays, HLA will align the field on a boundary that is a multiple of the array element's size. The maximum boundary HLA will round any field to is a multiple of 4096 bytes.
Note that if you set the record alignment using this syntactical form, any align directive you supply in the record may not produce the desired results. When HLA sees an align directive in a record that is using field alignment, HLA will first align the current offset to the value specified by align and then align the next field's offset to the global record align value.
Nested record declarations may specify a different alignment value than the enclosing record, e.g.,
type alignedRecord4 : record[4] a:byte; b:byte; c:record[8] d:byte; e:byte; endrecord; f:byte; g:byte; endrecord;In this example, HLA aligns fields a, b, f, and g on double word boundaries, it aligns d and e (within c) on eight-byte boundaries. Note that the alignment of the fields in the nested record is true only within that nested record. That is, if c turns out to be aligned on some boundary other than an eight-byte boundary, then d and e will not actually be on eight-byte boundaries; they will, however be on eight-byte boundaries relative to the start of c.
In addition to letting you specify a fixed alignment value, HLA also lets you specify a minimum and maximum alignment value for a record. The syntax for this is the following:
type recordname : record[maximum : minimum] << fields >> endrecord;Whenever you specify a maximum and minimum value as above, HLA will align all fields on a boundary that is at least the minimum alignment value. However, if the object's size is greater than the minimum value but less than or equal to the maximum value, then HLA will align that particular field on a boundary that is a multiple of the object's size. If the object's size is greater than the maximum size, then HLA will align the object on a boundary that is a multiple of the maximum size. As an example, consider the following record:
type r: record[ 4:1 ]; a:byte; // offset 0 b:word; // offset 2 c:byte; // offset 4 d:dword;[2] // offset 8 e:byte; // offset 16 f:byte; // offset 17 g:qword; // offset 20 endrecord;Note that HLA aligns g on a double word boundary (not quad word, which would be offset 24) since the maximum alignment size is four. Note that since the minimum size is one, HLA allows the f field to be aligned on an odd boundary (since it's a byte).
If an array, record, or union field appears within a record, then HLA uses the size of an array element or the largest field of the record or union to determine the alignment size. That is, HLA will align the field without the outermost record on a boundary that is compatible with the size of the largest element of the nested array, union, or record.
HLA's sophisticated record alignment facilities let you specify record field alignments that match that used by most major high level language compilers. This lets you easily access data types used in those HLLs without resorting to inserting lots of ALIGN directives inside the record. We'll take a look at this feature in the next chapter when we discuss how to translate C structs into HLA records.
By default, the first field of a record is assigned offset zero within that record. If you would like to specify a different starting offset, you can use the following syntax for a record declaration:
type Pt3D: record := 4; x: int32; y: int32; z: int32; endrecord;The constant expression specified after the assignment operator (":=") specifies the starting offset of the first field in the record. In this example x, y, and z will have the offsets 4, 8, and 12, respectively.
Warning: setting the starting offset in this manner does not add padding bytes to the record. This record is still a 12-byte object. If you declare variables using a record declared in this fashion, you may run into problems because the field offsets do not match the actual offsets in memory. This option is intended primarily for mapping records to pre-existing data structures in memory. Only really advanced assembly language programmers should use this option.
Note: Windows carefully defines the data fields in most of the record data structures so that you don't have to worry about field alignment. As long as you ensure that a record variable is sitting at a double-word aligned address in memory, you can be confident that the record's fields will be aligned at an appropriate address within that record.
2.2.3: Nested and Anonymous Unions and Records
The fields of a record or union may take on any legal data type. Such fields can be primitive types (as has been the case in the examples up to this point), arrays, or even other record and union types. Support for nestable (recursive) data types in an assembly language is very important to Win32 assembly programmers because many Win32 data structures employ this scheme.
Consider, for example, the following record type:
type recWrecAndUnion: record r1Field: byte; r2Field: word; r3Field: dword; rRecField: record rr4Field: byte[2]; rr5Field: word[3]; endrecord; r6Field: byte; rUnionField: union r7Field:dword; r8Field:real64; endunion; endrecord;To access the nested record and union fields found within the recWrecAndUnion type, you use an extended form of the "dot-notation". For example, if you have a variable, r, of type recWrecAndUnion, you can access the various fields of this variable using the following identifiers:
r.r1Field r.r2Field r.r3Field r.rRecField.rr4Field r.rRecField.rr5Field r.r6Field r.rUnionField.r7Field r.rUnionField.r8FieldIn a similar fashion, unions may contain records and other unions as fields, e.g.,
type unionWrecord: union u1Field: byte; u2Field: word; u3Field: dword; urField: record u4Field: byte[2]; u5Field: word[3]; endrecord; u6Field: byte; endunion;Again, you would use the "extended dot-notation" syntax to access the fields of this union. Assuming that you have a variable u of type unionWrecord, you would access u's fields as follows:
u.u1Field u.u2Field u.u3Field u.urField.u4Field u.urField.u5Field u.u6FieldWhenever a record appears within a union, the total size of that record is what HLA uses to determine the largest data object in the union for the purposes of determining the union's size. Also note that the fields of a record within a union all begin at separate offsets; it is the record object in the union that begins at the same offset as the other fields in the union. The fields of the record, however, begin at unique offsets within that record, just as for any record.
Unions also support a special field type known as an anonymous record . The following example demonstrates the syntax for an anonymous record in a union:
type unionWrecord: union u1Field: byte; u2Field: word; u3Field: dword; record u4Field: byte[2]; u5Field: word[3]; endrecord; u6Field: byte; endunion;Fields appearing within the anonymous record do not necessarily start at offset zero in the data structure. In the example above, u4Field starts at offset zero while u5Field immediately follows it two bytes later. The fields in the union outside the anonymous record all start at offset zero. If the size of the anonymous record is larger than any other field in the union, then the record's size determines the size of the union. This is true for the example above, so the union's size is 16 bytes since the anonymous record consumes 16 bytes.
To access a field of an anonymous record in a union, you use the standard dot-notation syntax. Because the anonymous record does not have an explicit field name associated with the record, you access the fields of the record using a single-level dot notation, just like the other fields in the union. For example, if you have a variable named uwr of type unionWrecord, you'd access the fields of uwr as follows:
uwr.u1Field uwr.u2Field uwr.u3Field uwr.u4Field -- a field of the anonymous record uwr.u5Field -- a field of the anonymous record uwr.u6FieldYou may also declare anonymous unions within a record. An anonymous union is a union declaration without a field name associated with the union, e.g.,
type DemoAU: record x: real32; union u1:int32; r1:real32; endunion; y:real32; endrecord;In this example, x, u1, r1, and y are all fields of DemoAU. To access the fields of a variable D of type DemoAU, you would use the following names: D.x, D.u1, D.r1, and D.y. Note that D.u1 and D.r1 share the same memory locations at run-time, while D.x and D.y have unique addresses associated with them.
2.2.4: Pointer Types
HLA allows you to declare a pointer to some other type using syntax like the following:
pointer to base_typeThe following example demonstrates how to create a pointer to a 32-bit integer within the type declaration section:
type pi32: pointer to int32;HLA pointers are always 32-bit pointers.
HLA also allows you to define pointers to existing procedures using syntax like the following:
procedure someProc( parameter_list ); << procedure options, followed by @external, @forward, or procedure body>> . . . type p : pointer to procedure someProc;The p procedure pointer "inherits" all the parameters and other procedure options associated with the original procedure. This is really just shorthand for the following:
procedure someProc( parameter_list ); << procedure options, followed by @external, @forward, or procedure body>> . . . type p : procedure ( Same_Parameters_as_someProc ); <<same options as someProc>>The former version, however, is easier to maintain since you don't have to keep the parameter lists and procedure options in sync.
Note that HLA provides the reserved word null (or NULL, reserved words are case insensitive) to represent the nil pointer. HLA replaces NULL with the value zero. The NULL pointer is compatible with any pointer type (including strings, which are pointers).
2.2.5: Thunk Types
A "thunk" is an eight-byte variable that contains a pointer to a piece of code to execute and an execution environment pointer (i.e., a pointer to an activation record). The code associated with a thunk is, essentially, a small procedure that (generally) uses the activation record of the surround code rather than creating its own activation record. HLA uses thunks to implement the iterator yield statement as well as pass by name and pass by lazy evaluation parameters. In addition to these two uses of thunks, HLA allows you to declare your own thunk objects and use them for any purpose you desire. Windows also uses the term "thunk" but the Windows meaning is different than the HLA meaning. Fortunately, most Windows thunks involve calling 16-bit code in older versions of Windows. In modern Windows systems, you use thunks much less often than in older versions of Windows.
For more information about HLA thunks, please consult the HLA Reference Manual. Thunks are mentioned here only because of the possible confusion that might exist with Windows' thunks. This book will not use HLA thunks, so there is little need to discuss this subject further.
2.2.6: Type Coercion