TOC PREV NEXT INDEX

Put your logo here!


Chapter 3: The C - Assembly Connection

3.1: Why are We Reading About C?

You probably purchased this book to learn assembly language programming under Windows (after all, that's what the title promises). This chapter is going to spend considerable time talking about the C programming language. Now assembly language programmers fall into two camps with respect to the C programming language: those who already know it and don't really need to learn a whole lot more about C, and those who don't know C and probably don't want to learn it, either. Unfortunately, as the last chapter points out, the vast majority of Windows programming documentation assumes that the reader is fluent in C. This book cannot begin to provide all the information you may need to write effective Win32 applications; therefore, this chapter does the next best thing - it describes how you can translate that C documentation for use in your assembly language programs.

This chapter contains two main sections. The first section provides a basic description of the C programming language for those readers who are not familiar with the C/C++ programming language. It describes various statements in C/C++ and provides their HLA equivalents. Though far from a complete course on the C programming language, this section will provide sufficient information to read some common Win32 programming examples in C and translate them into assembly language. Experienced C/C++ programmers can elect to skip this section (though if you're not comfortable with HLA, you may want to skim over this section because it will help you learn HLA from a C perspective). The second portion of this chapter deals with the Win32 interface and how C passes parameter data to and from Windows. Unless you're well-versed in compiler construction, mixed language calling sequences, and you've examined a lot of compiler code, you'll probably want to take a look at this material.

3.2: Basic C Programming From an Assembly Perspective

The C programming language is a member of the group of programming languages known as the imperative or procedural programming languages. Languages in this family include FORTRAN, BASIC, Pascal (Delphi/Kylix), Ada, Modula-2, and, of course, C. Generally, if you've learned to write programs in one of these languages, it's relatively easy to learn one of the other languages in the same category. When you attempt to learn a new language from a different class of languages (i.e., you switch programming paradigms), it's almost like you're learning to program all over again; learning a new language that is dissimilar to the one(s) you already know is a difficult task. A recent trend in programming language design has been the hybrid language. A hybrid language bridges the gap between two different programming paradigms. For example, the C++ language is a hybrid language that shares attributes common to both procedural/imperative languages and object-oriented languages. Although hybrid languages often present some compromises on one side or the other of the gulf they span, the advantage of a hybrid language is that it is easy to learn a new programming paradigm if you're already familiar with one of the programming methodologies that the language presents. For example, programmers who already know find it much easier to learn object-oriented programming via C++ rather than learning the object-oriented programming paradigm from scratch, say by learning Smalltalk (or some other "pure" object-oriented language). So hybrid languages are good in the sense that they help you learn a new way of programming by leveraging your existing knowledge.

The High Level Assembler, HLA, is a good example of a hybrid programming language. While a true assembly language, allowing you to do just about anything that is possible with a traditional (or low-level) assembler, HLA also inherits some syntax and many other features from various high-level imperative programming languages. In particular, HLA borrows several control and data structures from the C, Pascal, Ada, and Modula-2 programming languages. The original intent for this design choice was to make it easier to learn assembly language if you already knew an high level language like Pascal or C/C++. By borrowing heavily from the syntax of these high-level programming languages, a new assembly language programmer could learn assembly programming much more rapidly by leveraging their C/C++/Pascal knowledge during the early phase of their assembly education.

Note, however, that the reverse is also true. Someone who knows HLA well but doesn't know C can use their HLA knowledge to help them learn the C programming language. HLA's high level control structures are strongly based on languages like C and Modula-2 (or Ada); therefore, if you're familiar with HLA's high level control structures, then learning C's control structures will be a breeze. The sections that immediately follow use this concept to teach some basic C syntax. For those programmers who are not comfortable or familiar with HLA's high level control structures, the following subsections will also describe how to convert between "pure" assembly language and various C control structures. The ultimate goal here is to show you how to convert C code to HLA assembly code; after all, when reading some Win32 programming documentation, you're going to need to convert the examples you're reading in C into assembly language. Although it is always possible (and very easy) to convert any C control structure directly into assembly language, the reverse is not true. That is, it is possible to devise some control flow scheme in assembly language that does not translate directly into a high level language like C. Fortunately, for our purposes, you generally won't need to go in that direction. So even though you're learning about C from an assembly perspective (that is, you're being taught how to read C code by studying the comparable assembly code), this is not a treatise on converting assembly into C (which can be a very difficult task if the assembly code is not well structured).

3.2.1: C Scalar Data Types

The C programming language provides three basic scalar data types1: integers, and a couple floating point types. Other data types you'd find in a traditional imperative programming language (e.g., character or boolean values) are generally implemented with integer types in C. Although C only provides three basic scalar types, it does provide several variations of the integer and floating point types. Fortunately, every C data type maps directly to an HLA structured data type, so conversion from C to HLA data types is a trivial process.

3.2.1.1: C and Assembler Integer Data Types

The C programming language specifies (up to) four different integer types: char (which, despite its name, is a special case of an integer value), short, int, and long. A few compilers support a fifth size, "long long". In general, the C programming language does not specify the size of the integer values; that decision is left to whomever implements a specific compiler. However, when working under Windows (Win32), you can make the following assumptions about integer sizes:

The C programming language also specifies two types of integers: signed and unsigned. By default, all integer values are signed. You can explicitly specify unsigned by prefacing one of these types with the keyword unsigned. Therefore, C's integral types map to HLA's types as shown in Table 3-1.

Table 3-1: Integer Type Correspondence Between HLA and C
C Type
Corresponding HLA Types
char
char, byte, int81
short
word, int16
int
dword, int32
long
dword, int32
long long
qword, int64
unsigned char
char, byte, uns8
unsigned short
word, uns16
unsigned
dword, uns32
unsigned int
dword, uns32
unsigned long
dword, uns32
unsigned long long
qword, uns64
1Some compilers have an option that lets you specify the use of unsigned char as the default. In this case, the corresponding HLA type is uns8.

Generic integer literal constants in C take several forms. C uses standard decimal representation for base 10 integer constants, just like most programming languages (including HLA). For example, the sequence of digits:

128
 

represents the literal integer constant 128.

If a literal integer constant begins with a zero (followed by one or more octal digits in the range 0..7), then C treats the literal constant as a base-8 (octal) value. HLA doesn't support octal constants, so you will have to manually convert such constants to decimal or hexadecimal prior to using them in an assembly language program. Fortunately, you rarely see octal constants in modern C programs (especially in Win32 programs).

C integer literal constants that begin with "0x" are hexadecimal (base-16) constants. You will replace the "0x" prefix with a "$" prefix when converting the value from C to HLA. For example, the C literal constant "0x1234ABCD" becomes the HLA literal constant "$1234ABCD".

C also allows the use of an "L" suffix on a literal integer constant to tell the compiler that this should be a long integer value. HLA automatically adjusts all literal constants to the appropriate size, so there is no need to tell HLA to extend a smaller constant to a long (32-bit) value. If you encounter an "L" suffix in a C literal constant, just drop the suffix when translating the value to assembly.

3.2.1.2: C and Assembly Character Types

As the previous section notes, C treats character variables and constants as really small (one-byte) integer values. There are some non-intuitive aspects to using C character variables that can trip you up; hence the presence of this section.

The first place to start is with a discussion of C and HLA literal character constants. The two literal forms are quite similar, but there are just enough differences to trip you up if you're not careful. The first thing to note is that both HLA and C treat a character constant differently than a string containing one character. We'll cover character strings a little later in this chapter, but keep in mind that character objects are not a special case of a string object.

A character literal constant in C and HLA usually consists of a single character surrounded by apostrophe characters. E.g., `a' is a character constant in both of these languages. However, HLA and C differ when dealing with non-printable (i.e., control) and a couple of other characters. C uses an escape character sequence to represent the apostrophe character, the backslash character, and the control characters. For example, to represent the apostrophe character itself, you'd use the C literal constant `\''. The backslash tells C to treat the following value specially; in this particular case, the backslash tells C to treat the following apostrophe character as a regular character rather than using it to terminate the character constant. Likewise, you use `\\' to tell C that you want a single backslash character constant. C also uses a backslash followed by a single lowercase alphabetic character to denote common control characters. Table 3-2 lists the escape character sequences that C defines.

Table 3-2: C Escape Character Sequences
C Escape Sequence
Control Character
`\n'
New line (carriage return/line feed under Windows, though C encodes this as a single line feed)
`\r'
Carriage return
`\b'
Backspace
`\a'
Alert (bell character, control-G)
`\f'
Form Feed (control-L)
`\t'
Tab character (control-I)
`\v'
Vertical tab character (control-k)

C also allows the specification of the character's numeric code by following a backslash with an octal or hexadecimal constant in the range 0..0xff, e.g., `\0x1b'.

HLA does not support escape character sequences using the backslash character. Instead, HLA uses a pound sign (`#') followed immediately by a numeric constant to specify the ASCII character code. Table 3-3 shows how to translate various C escape sequences to their corresponding HLA literal character constants.

Table 3-3: Converting C Escape Sequences to HLA Character Constants
C Escape Sequence
HLA Character Constant
Description
`\n'
#$a #$d
Note that the end of line sequence under Windows is not a character, but rather a string consisting of two characters. If you need to represent newline with a single character, using a linefeed (as see does) whose ASCII code is $A; linefeed is also defined in the HLA Standard Library as stdio.lf. Note that the "nl" symbol an HLA user typically uses for newline is a two-character string containing line feed followed by carriage return.
`\r'
#$d
Carriage return character. This is defined in the HLA Standard Library as stdio.cr.
`\b'
#8
Backspace character. This is defined in the HLA Standard Library as stdio.bs.
`\a'
#7
Alert (bell) character. This is defined in the HLA Standard Library as stdio.bell.
`\f'
#$c
Form feed character.
`\t'
#9
Tab character. This is defined in the HLA Standard Library as stdio.tab.
`\v'
#$b
Vertical tab character.

Because C treats character values as single-byte integer values, there is another interesting aspect to character values in C - they can be negative. One might wonder what "minus `z'" means, but the truth is, there really is no such thing as a negative character; C simply uses signed characters to represent small integer values in the range -128..+127 (versus unsigned characters that represent values in the range 0..255). For the standard seven-bit ASCII characters, the values are always positive, regardless of whether you're using a signed character or an unsigned character object. Note, however, that many C functions return a signed character value to specify certain error conditions or other states that you cannot normally represent within the ASCII character set. For example, many functions return the character value minus one to indicate end of file.

3.2.1.3: C and Assembly Floating Point (Real) Data Types

The C programming language defines three different floating point sizes: float, double, and long double2. Like the integer data type sizes, the C language makes no guarantees about the size or precision of floating point values other than to claim that double is at least as large as float and long double is at least as large as double. However, while nearly every C/C++ compiler that generates code for Windows uses the same sizes for integers (8, 16, and 32 bits for char, short, and int/long), there are differences in the sizes of floating objects among compilers. In particular, some compilers use a 10-byte extended precision format for long double (e.g., Borland) while others use an eight-byte double precision format (e.g., Microsoft). Fortunately, all (reasonable) compilers running under Windows use the IEEE 32-bit single-precision format for float and the IEEE 64-bit double-precision format for double. If you encounter a long double object in C code, you will have to check with the compiler's vendor to determine the size of the object in order to convert it to assembly language. Of course, for many applications it won't really matter if you go ahead and use a 10-byte real value whenever you encounter a long double object. After all, the original programmer is probably expecting something bigger than a double object anyway. Do keep in mind, however, that some algorithms involving real arithmetic may not be stable when run with a different floating point size other than the sized used when they were developed.

T shows the correspondence between C/C++ floating point data types and HLA's floating point data types.

Table 3-4: Real Type Correspondence Between C and Assembly
C Real Type
Corresponding HLA Type
Comment
float
real32
32-bit IEEE format floating point value.
double
real64
64-bit IEEE format floating point value.
long double
real64
64-bit IEEE format floating point value on certain compilers (e.g., Microsoft).
real80
80-bit IEEE format floating point value on certain compilers (e.g., Borland)

C and HLA floating point literal constants are quite similar. They may begin with an optional sign, followed by one or more decimal digits. Then you can have an optional decimal point followed by another optional sequence of decimal digits; following that, you can have an optional exponent specified as an `e' or `E', an optional sign, and one or more decimal digits. The final result must not look like an integer constant (i.e., the decimal point or an exponent must be present).

C allows an optional "F" suffix on a floating point constant to specify single precision, e.g., 1.2f. Similarly, you can attach an "L" suffix to a float value to indicate long double, e.g., 1.2L. By default, C literal floating point constants are always double precision values. HLA always maintains all constants as 80-bit floating point values internally and converts them to 64 or 32 bits as necessary, so there is no need for such a suffix. So by dropping the "F" or "L" suffix, if it is present, you can use any C floating point literal constant as-is in the HLA code.

3.2.2: C and Assembly Composite Data Types

C provides four major composite (or aggregate) data types: arrays, structures, unions, and pointers. C++ adds the class types well. A composite type is a type that is built up from smaller types (e.g., an array is a collection of smaller objects, each element having the same type as all other elements in the array). In the following sub-sections, we'll explore these composite data types in C and provide the corresponding type in HLA.

As is the case throughout this chapter, this section assumes that you already know assembly language and may not know C. This section provides the correspondence between C and HLA, but doesn't attempt to teach you how to do things like access an array in assembly language; the previous chapter already covered that material.

3.2.2.1: C and Assembly Array Types

Since HLA's syntax was based on the C and Pascal programming languages, it should come as no surprise that array declarations in HLA are very similar to those in C. This makes the translation from C to HLA very easy.

The syntax for an array declaration in C is the following:

elementType  arrayName[ elements ] <<additional dimension infor-
mation>>;
 

elementType is the type of an individual element of the array. arrayName is the name of the array object. elements is an integer constant value the specifies the number of array elements. Here are some sample C array declarations:


 
	int intArray[4];
 
	char grid[3][3];  // 3x3 two-dimensional array
 
	double flts[16];
 
	userType userData[2][2][2]
 

 

In HLA, multiple dimension arrays use a comma-delimited list to separate each of the maximum bounds (rather than using separate sets of brackets). Here are the corresponding declarations in HLA:


 
	intArray :int32[ 4 ];
 
	grid     :char[3,3];
 
	flts     :real64[16];
 
	userData :userType[2,2,2];
 

 

Both C and HLA index arrays from zero to n-1, where n is the value specified as the array bound in the declaration.

C stores arrays in memory using row-major ordering. Therefore, when accessing elements of a multi-dimensional array, always use the row-major ordering algorithm (see The Art of Assembly Language for details if you're unfamiliar with accessing elements of a multi-dimensional array in assembly language).

In C, it is possible to provide an initial value for an array when declaring an array. The following C example demonstrates the syntax for this operation:


 
	int iArray[4] = {1,2,3,4};
 

 

HLA also allows initialization of array variables, but only for static objects. Here's the HLA version of this C code:


 
static
 
	iArray :int32[4] := [1,2,3,4];
 

 

C allows the same syntax for an array initialization to automatic variables. However, you cannot initialize an automatic variable at compile time (this is true for C and HLA); therefore, the C compiler automatically emits code to copy the data from some static memory somewhere into the automatic array, e.g., a C declaration like the following appearing in a function (i.e., as an automatic local variable):


 
	int autoArray[4] = {1,2,3,4};
 

 

gets translated into machine code that looks something like the following:


 
readonly
 
	staticInitializeData :dword := [1,2,3,4];
 
		.
 
		.
 
		.
 
var
 
	autoArray: int32[4];
 
		.
 
		.
 
		.
 
	mov( &staticInitializerData, esi );
 
	lea( edi, autoArray );
 
	mov( 4, ecx );
 
	rep.movsd();
 

 

(yes, compilers really do generate code like this). This code is pretty disgusting. If you see an automatic array variable with an initializer, it's probably a better idea to try and figure out if you really need a new copy of the initial data every time you enter the function.

3.2.2.2: C and Assembly Record/Structure Types

C implements the equivalent of HLA'S records using the struct keyword. Structure declarations in C look just like standard variable declarations sandwiched inside a "struct {...}" block. Conversion to HLA is relatively simple: just stick the HLA equivalent of those field declarations in a record..endrecord block. The only point of confusion is C's syntax for declaring tags, types, and variables of some struct type.

C allows you to declare structure variables and types several different ways. First, consider the following structure variable declaration:


 
	struct
 
	{
 
		int fieldA;
 
		float fieldB;
 
		char fieldC;
 
	}
 
		structVar;
 

 

Assuming this is a global variable in C (i.e., not within a function) then this creates a static variable structVar that has three fields: structVar.fieldA, structVar.fieldB, and structVar.fieldC. The corresponding HLA declaration is the following (again, assuming a static variable):


 
static
 
	structVar:
 
		record
 
			fieldA  :int32;
 
			fieldB  :real32;
 
			fieldC  :char;
 
		endrecord;
 

 

In an HLA program, you'd access the fields of the structVar variable using the exact same syntax as C, specifically: structVar.fieldA, structVar.fieldB, and structVar.fieldC.

There are two ways to declare structure types in C: using typedef and using tag fields. Here's the version using C's typedef facility (along with a variable declaration of the structure type):


 
typedef struct
 
	{
 
		int fieldA;
 
		float fieldB;
 
		char fieldC;
 
	}
 
		structType;
 

 
	structType structVar;
 

 

Once again, you access the fields of structVar as structVar.fieldA, structVar.fieldB, and structVar.fieldC.

The typedef keyword was added to the C language well after it's original design. In the original C language, you'd declare a structure type using a structure tag as follows:


 
struct structType
 
	{
 
		int fieldA;
 
		float fieldB;
 
		char fieldC;
 
	} /* Note: you could actually declare structType variables here */ ;
 

 
	struct structType strucVar;
 

 

HLA provides a single mechanism for declaring a record type - by declaring the type in HLA's type section. The syntax for an HLA record in the type section takes this form:


 
type
 
	structType:
 
		record
 
			fieldA  :int32;
 
			fieldB  :real32;
 
			fieldC  :char;
 
		endrecord;
 

 
static
 
	structVar  :structType;
 

 

C also allows the initialization of structure variables by using initializers in a variable declaration. The syntax is similar to that for an array (i.e., a list of comma separated values within braces). C associates the values in the list with each of the fields by position. Here is an example of the structVar declaration given earlier with an initializer:


 
	struct structType structVar = { 1234, 5.678, `9' };
 

 

Like the array initializers you read about in the previous section, the C compiler will initialize these fields at compile time if the variable is a static or global variable. however, if it's an automatic variable, then the C compiler emits code to copy the data into the structure upon each entry into the function defining the initialized structure (i.e., this is an expensive operation).

Like arrays, HLA allows the initialization of static objects you define in a static, storage, or readonly section. Here is the previous C example translated to HLA:


 
static
 
	structVar  :structType := structType:[1234, 5.678, `9' ];
 

 

One major difference between HLA and C with respect to structures is the alignment of the fields within the structure. By default, HLA packs all the fields in a record so that there are no extra bytes within a structure. The structType structure, for example, would consume exactly nine bytes in HLA (four bytes each for the int32 and real32 fields and one byte for the char field). C structures, on the other hand, generally adhere to certain alignment and padding rules. Although the rules vary by compiler implementation, most Win32 C/C++ compilers align each field of a struct on an offset boundary that is an even multiple of that field's size (up to four bytes). In the current structType example, fieldA would fall at offset zero, fieldB would fall on offset four within the structure, and FieldC would appear at offset eight in the structure. Since each field has an offset that is an even multiple of the object's size, C doesn't manipulate the field offsets for the record. Suppose, however, that we have the following C declaration in our code:


 
typedef struct
 
	{
 
		char fieldC;
 
		int fieldA;
 
		short fieldD;
 
		float fieldB;
 
	}
 
		structType2;
 

 

If C, like HLA, didn't align the fields by default, you'd find fieldC appearing at offset zero, fieldA at offset 1, fieldD at offset five, and fieldB at offset seven. Unfortunately, these fields would appear at less than optimal addresses in memory. C, however, tends to insert padding between fields so that each field is aligned at an offset within the structure that is an even multiple of the object's size (up to four bytes, which is the maximum field alignment that compilers support under Win32). This padding is generally transparent to the C programmer; however, when converting C structures to assembly language records, the assembly language programmer must take this padding into consideration in order to correctly communicate information between an assembly language program and Windows. The traditional way to handle this (in assembly language) has been to explicitly add padding fields to the assembly language structure, e.g.,


 
type
 
	structType2:
 
		record
 
			fieldC  :char;
 
			pad0    :byte[3]; // Three bytes of padding.
 
			fieldA  :int32;
 
			fieldD  :int16;
 
			pad1    :byte[2]; // Two bytes of padding.
 
			fieldB  :real32;
 
		endrecord;
 

 

HLA provides a set of features that automatically and semi-automatically align record fields. First of all, you can use the HLA align directive to align a field to a given offset within the record; this mechanism is great for programs that need absolute control over the alignment of each field. However, this kind of control is not necessary when implementing C records in assembly language, so a better approach is to use HLA's automatic record field alignment facilities3. The rules for C/C++ (under Windows, at least) are pretty standard among compilers; the rule is this: each object (up to four bytes) is aligned at a starting offset that is an even multiple of that object's size. If the object is larger than four bytes, then it gets aligned to an offset that is an even multiple of four bytes. In HLA, you can easily do this using a declaration like the following:


 
type
 
	structType2:
 
		record[4:1];
 
			fieldC  :char;
 
			pad0    :byte[3]; // Three bytes of padding.
 
			fieldA  :int32;
 
			fieldD  :int16;
 
			pad1    :byte[2]; // Two bytes of padding.
 
			fieldB  :real32;
 
		endrecord;
 

 

The "[4:1];" appendage to the record keyword tells HLA to align objects between one and four bytes on their natural boundary and larger objects on a four-byte boundary, just like C. This is the only record alignment option you should need for C/C++ code (and, in fact, you should always use this alignment option when converting C/C++ structs to HLA records). If you would like more information on this alignment option, please see the HLA reference manual.

Specifying the field alignment via the "[4:1];" option only ensures that each field in the record starts at an appropriate offset. It does not guarantee that the record's length is an even multiple of four bytes (and most C/C++ compilers will add padding to the end of a struct to ensure the length is an even multiple of four bytes). However, you can easily tell HLA to ensure that the record's length is an even multiple of four bytes by adding an align directive to the end of the field list in the record, e.g.,


 
type
 
	structType2:
 
		record[4:1];
 
			fieldC  :char;
 
			pad0    :byte[3]; // Three bytes of padding.
 
			fieldA  :int32;
 
			fieldD  :int16;
 
			pad1    :byte[2]; // Two bytes of padding.
 
			fieldB  :real32;
 
			align(4);
 
		endrecord;
 

 

Another important thing to remember is that the fields of a record or structure are only properly aligned in memory if the starting address of the record or structure is aligned at an appropriate address in memory. Specifying the alignment of the fields in an HLA record does not guarantee this. Instead, you will have to use HLA's align directive when declaring variables in your HLA programs. The best thing to do is ensure that an instance of a record (i.e., a record variable) begins at an address that is an even multiple of four bytes4. You can do this by declaring your record variables as follows:


 
static
 
	align(4);  // Align following record variable on a four-byte address in memory.
 
	recVar   :structType2;
 

 

3.2.2.3: C and Assembly Union Types

A C union is a special type of structure where all the fields have the same offset (in C, the offset is zero, in HLA you can actually select the offset though the default is zero). That is, all the fields of an instance of a union overlay one another in memory. Because the fields of a union all have the same starting address, there are no issues regarding field offset alignment between C and HLA. Therefore, all you need really do is directly convert the C syntax to the HLA syntax for a union declaration.

In C, union type declarations can take one of two forms:


 
union unionTag
 
{
 
	<<fields that look like variable declarations>>
 
} <<optional union variable declarations>>;
 

 
typedef union
 
{
 
	<<fields that look like variable declarations>>
 
}
 
	unionTag2; /* The type declaration */
 

 

You may also declare union variables directly using syntax like the following:


 
union
 
{
 
	<< fields that look like variable declarations >>
 
}
 
	unionVariable1, unionVariable2, unionVariable3;
 

 
	union unionTag unionVar3, unionVar4;
 
	unionTag2 unionVar5;
 

 

In HLA, you declare a union type in HLA's type section using the union/endunion reserved words as follows:


 
type
 
	unionType:
 
		union
 
			<< fields that look like HLA variable declarations >>
 
		endunion;
 

 

You can declare actual HLA union variables using declarations like the following:


 
storage
 
	unionVar1:
 
		union
 
			<< fields that look like HLA variable declarations >>
 
		endunion;
 

 
	unionvar2: unionType;
 

 

The size of a union object in HLA is the size of the largest field in the union declaration. You can force the size of the union object to some fixed size by placing an align directive at the end of the union declaration. For example, the following HLA type declaration defines a union type that consumes an even multiple of four bytes:


 
type
 
	union4_t:
 
		union
 
			b    :boolean;
 
			c    :char[3];
 
			w    :word;
 
			align(4);
 
		endunion;
 

 

Without the "align(4);" field in this union declaration, HLA would only allocate three bytes for a object of type union4_t because the single largest field is c, which consumes three bytes. The presence of the "align(4);" directive, however, tells HLA to align the end of the union on a four-byte boundary (that is, make the union's size an even multiple of four bytes). You'll need to check with your compiler to see what it will do with unions, but most compilers probably extend the union so that it's size is an even multiple of the largest scalar (e.g., non-array) object in the field list (in the example above, a typical C compiler would probably ensure that and instance of the union4_t type is an even multiple of two bytes long).

As for records, the fact that a union type is an even multiple of some size does not guarantee that a variable of that type (i.e., an instance) is aligned on that particular boundary in memory. As with records, if you want to ensure that a union variable begins at a desired address boundary, you need to stick an align directive before the declaration, e.g.,


 
var
 
	align(4);
 
	u  :union4_t;
 

 

3.2.2.4: C and Assembly Character String Types

The C/C++ programming language does not support a true string type. Instead, C/C++ uses an array of characters with a zero terminating byte to represent a character string. C/C++ uses a pointer to the first character of the character sequence as a "string object". HLA defines a true character string type, though it's internal representation is a little different than C/C++'s. Fortunately, HLA's string format is upwards compatible with the zero-terminated string format that C/C++ uses, so it's very easy to convert HLA strings into the C/C++ format (indeed, the conversion is trivial). There are a few problems going in the other direction (at least, at run time). So a special discussion of HLA versus C/C++ strings is in order.

Character string declarations are somewhat confused in C/C++ because C/C++ often treats pointers to an object and arrays of that same object type equivalently. For example, in an arithmetic expression, C/C++ does not differentiate between the use of a char* object (a character pointer) and an array of characters. Because of this syntactical confusion in C/C++, you'll often see strings in this language declared one of two different ways: as an array of characters or as a pointer to a character. For example, the following are both typical string variable declarations in C/C++:


 
	char stringA[ 256 ]; // Holds up to 255 characters plus a zero terminating byte.
 
	char *stringB;       // Must allocate storage for this string dynamically.
 

 

The big difference between these two declarations is that the stringA declaration actually reserves the storage for the character string while the stringB declaration only reserves storage for a pointer. Later, when the program is running, the programmer must allocate storage for the string associated with stringB (or assign the address of some previously allocated string to stringB). Interestingly enough, once you have two declarations like stringA and stringB in this example, you can access characters in either string variable using either pointer or array syntax. That is, all of the following are perfectly legal given these declarations for stringA and stringB (and they all do basically the same thing):


 
	char ch;
 

 
	ch = stringA[i];       // Access the char at position i in stringA.
 
	ch = *(stringB + i);   // Access the char at position i in stringB.
 
	ch = stringB[i];       // Access the char at position i in stringB.
 
	ch = *(stringA + i);   // Access the char at position i in stringA.
 

 

String literal constants in C/C++ are interesting. Syntactically, a C/C++ string literal looks very similar to an HLA string literal: it consists of a sequence of zero or more characters surrounded by quotes. The big difference between HLA string literals and C/C++ string literals is that C/C++ uses escape sequences to represent control characters and non-graphic characters within a string (as well as the backslash and quote characters). HLA does not support the escape character sequences (see the section on character constants for more details on C/C++ escape character sequences). To convert a C/C++ string literal that contains escape sequences into an HLA character string, there are four rules you need follow:

"hello" #$d #$a "world" #$d #$a    //HLA automatically splices all 
this together.
 
"hello" #$a "world" #$a
 

Whenever a C/C++ compiler sees a string literal constant in a source file, the compiler will allocate storage for each character in that constant plus one extra byte to hold the zero terminating byte5 and, of course, the compiler will initialize each byte of this storage with the successive characters in the string literal constant. C/C++ compilers will typically (though not always) place this string they've created in a read-only section of memory. Once the compiler does this, it replaces the string literal constant in the program with the address of the first character it has created in this read-only memory segment. To see this in action, consider the following C/C++ code fragment:


 
	char* str;
 

 
	str = "Some Literal String Constant";
 

 

This does not copy the character sequence "Some Literal String Constant" into str. Instead, the compiler creates a sequence of bytes somewhere in memory, initializes them with the characters in this string (plus a zero terminating byte), and then stores the address of the first character of this sequence into the str character pointer variable. This is very efficient, as C/C++ needs to only move a four-byte pointer around at run-time rather than moving a 29 byte sequence.

The HLA language provides an explicit string data type. Internally, however, HLA represents strings using a four-byte pointer variable, just as C/C++ does. There is an important difference, however, between the type of data that an HLA string variable references and a C/C++ character pointer references: HLA includes some additional information with its string data: specifically, a length field and a maximum length field. Like C/C++ string objects, the address held in an HLA string variable points at the first character of the character string (and HLA strings also end with a zero terminating byte). Unlike C/C++ however, the four bytes immediately preceding the first character of the string contain the current length of the string (as an uns32 value). The four bytes preceding the length field contain the maximum possible length of the string (that is, how much memory is reserved for the string variable, which can be larger than the current number of characters actually found in the string). Figure 3-1 shows the HLA string format in memory.

Figure 3-1: HLA String Format in Memory

The interesting thing to note about HLA strings is that they are downwards compatible with C/C++ strings. That is, if you've got a function, procedure, or some other piece of code that operates on a C/C++ string, you can generally pass it a pointer to an HLA string and the operation will work as expected6. This was done by design and is one of the reasons HLA interfaces so well with the Win32 API. The Win32 API generally expects C/C++ zero terminated strings when you pass it string data. Usually, you can pass an HLA string directly to a Win32 API function and everything will work as expected (few Win32 API function calls actually change the string data).

Going the other direction, converting a C/C++ string to an HLA string, is not quite as trivial (though still easy to accomplish using appropriate code in the HLA Standard Library). Before describing how to convert C/C++ strings to the HLA format, however, it is important to point out that HLA is assembly language and assembly language is capable of working with any string format you can dream up (including C/C++ strings). If you've got some code that produces a zero-terminated C/C++ string, you don't have to convert that string to an HLA string in order to manipulate it within your HLA programs. Although the HLA string format is generally more efficient (faster) because the algorithms that process HLA strings can be written more efficiently, you can write your own zero-terminated string functions and, in fact, there are several common zero-terminated string functions found in the HLA Standard Library. However, as just noted, HLA strings are generally more efficient, so if you're going to be doing a bit of processing on a zero terminated string, it's probably wise to convert it to an HLA string first. On the other hand, if you're only going to do one or two trivial operations on a zero-terminated string that the Win32 API returns, you're better off manipulating it directly as a zero-terminated string and not bothering with the conversion (of course, if convenience is your goal rather than efficiency, it's probably always better to convert the string to the HLA string format upon return from a Win32 API call just so you don't have to work with two different string formats in your assembly code). Of course, the other option open to you is to work exclusively with zero-terminated strings in your assembly code (though the HLA Standard Library doesn't provide anywhere near the number of zero-terminated string functions).

The purpose of this chapter is to explain how to convert C/C++ based Windows documentation to a form suitable for assembly language programmers and describe how assembly language programmers can call the C-based functions that make up the Win32 API set. Therefore, we won't get into the details of converting string function calls in C/C++ to their equivalent (or similar) HLA Standard Library calls. Rest assured, the HLA Standard Library provides a much richer set of string functions than does the C Standard Library; so anything you find someone doing in C/C++ can be done just as easily (and usually more efficiently) using an HLA Standard Library function. Please consult the HLA Standard Library documentation for more details. In this section, we'll simply cover those routines that are necessary for converting between the two different string formats and copying string data from one location to another.

In C/C++ code, it is very common to see the program declare a string object (character pointer) and initialize the pointer with the address of a literal string constant all in one operation, e.g.,


 
	char *str = "Literal String Constant";
 

 

Remember, this doesn't actually copy any character data; it simply loads the character pointer variable str with the address of the `L' character in this string literal constant. The compiler places the literal string data somewhere in memory and either initializes str with the address of the start of that character data (if str is a global or static variable) or it generates a short (generally one instruction) machine code sequence to initialize a run-time variable with the address of this string constant. Note, and this is very important, many compilers will allocate the actual string data in read-only memory. Therefore, as long as str continues to point at this literal string constant's data in memory, any attempt to store data into the string will likely produce a memory protection fault. Some very old C programs may have made the assumption that literal string data appears in read/write memory and would overwrite the characters in the string literal constant at run-time. This is generally not allowed by modern C/C++ compilers. HLA, by default, also places string literal data in write-protected memory. So if you encounter some older C code that overwrites the characters in a string literal constant, be aware of the fact that you will not (generally) be able to get away with this in HLA.

If str (in the previous example) is a static or global variable, converting this declaration to assembly language is very easy, you can use a declaration like the following:


 
static
 
	HLAstr  :string := "Literal String Constant";
 

 

Remember, like C/C++, HLA implements string objects using pointers. So HLAstr is actually a four-byte pointer variable and this code initializes that pointer with the address of the first character in the literal string constant.

Because HLA string variables are pointers that (when properly initialized) point at a sequence of characters that end with a zero byte (among other things), you can use an HLA string variable just about anywhere C/C++ expects a string object. So, for example, if you want to make some Win32 API call that requires a string parameter, you can pass in HLAstr as that parameter, e.g.,

Win32APIcall( HLAstr );
 

Of course, you don't have to assign the address of a string literal constant to an HLA string variable to make this function call, HLA allows you to pass the literal string constant directly:

Win32APIcall( "Literal String Constant" );
 

Note that this call does not pass the string data directly to the function; like C/C++, HLA continues to allocate and initialize the literal string constant in write-protected memory somewhere else and simply passes the address of the literal string constant as the function's parameter.

For actual string variable objects (that is, strings whose character values you can change at run-time), you'll typically find two different ways of variable declaration/allocation in a C/C++ program: the programmer will either create a character array with sufficient storage to hold the characters in the string, or the programmer will simply declare a character pointer variable and allocate storage for the string at run-time using a function like malloc (C) or new (C++). We'll look at both of these mechanisms in HLA in the following paragraphs.

One way to allocate storage for a C string variable is to simply declare a character array with sufficient room to hold the longest possible string value you will assign plus an extra byte for the zero terminator. To do this, a C/C++ programmer simply declares a character array with n+1 elements, where n is the maximum number of characters they expect to put into the string. The following is a typical "string" declaration in C/C++ that creates a string capable of holding up to 255 characters:

char myStr[256];  // 255 chars plus a zero terminating byte
 

Although HLA most certainly allows you to create arrays of characters, such arrays are not directly compatible with HLA strings (because they don't reserve extra storage prior to the string object for the length and maximum length fields that are present in the HLA string format). Therefore, you cannot allocate storage for an HLA string variable by simply doing the following:


 
static
 
		myStr :char[256];  // Does NOT create an HLA string!
 

 

You can use this technique to pre-allocate storage for a zero-terminated string in HLA, but this does not create the proper storage for an HLA string variable.

In general, HLA expects you to allocate storage for string objects dynamically (that is, at run-time). However, for static strings (i.e., non-automatic string variables) the HLA Standard Library does provide a macro, str.strvar, that lets you declare string variable and allocate storage for the string variable at compile-time. Here is an example of the use of this macro:


 
static  //Note: str.strvar only makes sense in the STATIC section
 
	preAllocated :str.strvar( 255 ); // Allows strings up to 255 characters long.
 

 

There are a couple of important things to note about this declaration. First of all, note that you surround the size of the string with parentheses, not square brackets. This is because str.strvar is actually a macro, not a type and the 255 that appears in this example is a macro argument. The second thing to note is that the use of the str.strvar macro only makes sense in the static declaration section. It is illegal in a var or storage section because this macro initializes the variable you're declaring (something you can't do in a var or storage declaration section). While the use of the str.strvar macro is technically legal in HLA's readonly declaration section, it doesn't make sense to use it there since doing so would allocate the string data storage in write-protected memory, so you'd never be able to store any character data into the string. Also note that the str.strvar macro doesn't provide any mechanism for initializing the character data in the string within the declaration (not that it would be that hard to create a new macro that does this). Finally, note that you specify the maximum number of characters you want to allow in the string as the str.strvar argument. You do not have to account for any zero-terminating bytes, the current string length field, or the maximum length field.

Although the str.strvar macro is very convenient to use, it does have a couple of serious limitations. Specifically, it only makes sense to use it in the static section of an HLA program and you must know the maximum length of the string before you can use the str.strvar macro to declare a string object. Both of these issues are easy to resolve by using dynamically allocated string objects. C/C++ programmers can also create dynamically allocated strings by declaring their string variables as pointers to characters, e.g.,


 
	char *strAsPtr;
 
		.
 
		.
 
		.
 
	// Allocate storage for a 255 character string at run-time
 

 
	strAsPtr = malloc( 256 );  // 255 chars + zero terminating byte
 

 
	// Or (in C++):
 

 
	strAsPtr = new char[256];
 

 

In HLA, the declaration and allocation is very similar:


 
var  // Could also be static or storage section
 
	strAsPtr :string;
 
		.
 
		.
 
		.
 
	strmalloc( 255 );
 
	mov( eax, strAsPtr );
 

 

Do note a couple of things about the HLA usage. First, you should note that the specify the actual number of characters you wish to allow in the string; you don't have to worry about the zero terminating byte. Second, you should call the HLA Standard Library stralloc function (rather than malloc, which HLA also provides) in order to allocate storage for a string at run time; in addition to allocating storage for the character data in the string, stralloc also allocates (additional) storage for the zero terminating byte, the maximum length value, and the current length value. The stralloc function also initializes the string to the empty string (by setting the current length to zero and storing a zero in the first character position of the string data area). Finally, note that stralloc returns the address of the first character of the string in EAX so you must store that address into the string variable. HLA does support a feature known as instruction composition, so you could actually write these two instructions as follows:

mov( stralloc( 255 ), strAsPtr );
 

The drawback to this form, however, is that it hides the fact that this code sequence wipes out the EAX register. So be careful if you decide to use this form.

Once you've initialized an HLA string variable so that it points at valid string data, you can generally pass that string variable to a C/C++ function (i.e., a Win32 API function) that expects a string value. The only restriction is that if the C/C++ function modifies the string data in such a way that it changes the length of the string, HLA will not recognize the change if you continue to treat the string as an HLA type string upon return from the C/C++ function. Likewise, if a C/C++ function returns a pointer to a string that the function has created itself, your HLA functions cannot treat that string as an HLA string because it's probably just a zero-terminated string. Fortunately, there are only a small number of Win32 API functions that return a string or modify an existing string, so you don't have to worry about this very often. However, that's a problem in and of itself; you may have to deal with this problem so infrequently that it slips your mind whenever you call a function that returns such a string. As a concrete example, consider the following (HLA) prototype for the Win32 GetFullPathName function:


 
static
 
	GetFullPathName: procedure
 
	(
 
	       lpFileName    : string;
 
	       nBufferLength : dword;
 
	   var lpBuffer      : var;
 
	   var lpFilePart    : var
 
	);
 
	@stdcall; @returns( "eax" ); @external( "__imp__GetFullPathNameA@16" );
 

 

This function, given the address of a zero-terminated string containing a filename (lpFileName), the length length of a buffer (nBufferLength), the address of a buffer (lpBuffer), and the address of a pointer variable (lpFilePart) will convert the filename you pass in to a full drive letter/path/filename string. Specifically, this function returns a string in the buffer whose address you specify in the lpbuffer parameter. However, this function does not create an HLA-compatible string; it simply creates a zero-terminated string.

Fortunately, most functions in the Win32 API that return string data do two things for you that will help make your life easier. First of all, Win32 API functions that store string data into your variables generally require that you pass a maximum length for the string (e.g., the nBufferLength parameter in the prototype above). The function will not write any characters to the buffer beyond this maximum length (doing so could corrupt other data in memory). The other important thing this function does (which is true for most Win32 API functions that return string data) is that it returns the length of the string in the EAX register; the function returns zero if there was some sort of error. Because of the way this function works, converting the return result to an HLA string is nearly trivial. Consider the following call to GetFullPathName:


 
static
 
	fullName :string;
 
	namePtr  :pointer to char;
 
		.
 
		.
 
		.
 
	stralloc( 256 );       // Allocate sufficient storage to hold the string data.
 
	mov( eax, fullName );
 
		.
 
		.
 
		.
 
	mov( fullName, edx );
 
	GetFullPathName
 
	(
 
		"myfile.exe",                          // File to get the full path for.
 
		(type str.strRec [edx]).MaxStrLen,     // Maximum string size
 
		[edx],                                 // Pointer to buffer
 
		namePtr                                // Address of base name gets stored here
 
	);
 
	mov( fullName, edx );                     // Note: Win32 calls don't preserve EDX
 
	mov( eax, (type str.strRec [edx]).length  // Set the actual string length
 

 

The nice thing about most Win32 API calls that return string data is that they guarantee that they won't overwrite a buffer you pass in (you also pass in the maximum string length, which is available in the MaxStrLen field of the string object, that is, at offset -8 from the string pointer). These string functions also return the string length as the function's result, so you can shove this into the HLA string's length field (at offset -4 from the string's base address) immediately upon return. This is a very efficient way to convert C/C++ strings to HLA format strings upon return from a function.

Of course, converting a C/C++ string to an HLA string is only easy if the C/C++ function you're calling returns the length of the string it has processed. It also helps if the function guarantees that it won't overstep the bounds of the string variable you've passed it (i.e., it accepts a MaxStrLen parameter and won't write any data beyond the maximum buffer size you've specified). Although most Win32 API functions that return string data operate this way (respect a maximum buffer size and return the length of the actual string), there are many C/C++ functions you may need to call that won't do this. In such a case, you've got to compute the length of the string yourself (and guarantee that your character buffer is large enough to hold the maximum possible string the function will produce). Fortunately, there is a function in the HLA Standard Library, str.zlen, that will compute the length of a zero terminated string so you can easily update the length field of an HLA string object that a C/C++ function has changed (without respect to the HLA string's length field). For example, suppose you have a C/C++ function named fstr that expects the address of a character buffer where it can store (or modify) a zero-terminated string. Since HLA strings are zero-terminated, you can pass an HLA string to this function. However, if fstr changes the length of the string, the function will not update the HLA string's length field and the result will be inconsistent. You can easily correct this by computing the length of the string yourself and storing the length into the HLA string's length field, e.g.,


 
static
 
	someStr  :str.strvar( 255 );  // Allow strings up to 255 characters in length.
 
		.
 
		.
 
		.
 
	fstr( someStr );     // Assume this changes the length of someStr.
 

 
	mov( someStr, edx ); // Get the pointer to the string data structure.
 
	str.zlen( edx );     // Compute the new length of someStr.
 
	mov( eax, (type str.strRec [edx]).length );  // Update the length field.
 

 

One thing to keep in mind is that str.zlen has to search past every character in the string to find the zero-terminating byte in order to compute the string's length. This is not a particularly efficient operation, particularly if the string is long. Although str.zlen uses a fairly decent algorithm to search for the terminating zero byte, the amount of time it takes to execute is still proportional to the length of the string. Therefore, you want to avoid calling str.zlen if at all possible. Fortunately, many C/C++ string functions (that modify string data) return the length as the function result, so calling str.zlen isn't always necessary.

Although not explicitly shown in the example above, do keep in mind that many string functions (especially Win32 API functions) assign double-duty to the function return result; they'll return a positive value if the function successfully produces a string and they'll return a negative result (or sometimes a zero result) if there was an error. For example, the GetFullPathName function in the earlier example returns zero if it there was a problem producing the string. You code should check for errors on return from these functions to prevent problems. While shoving a zero into a string length field isn't cause for concern (indeed, that's perfectly reasonable), a negative number will create all kinds of problems (since the HLA string functions treat the length field as an uns32 value, those functions will interpret a negative number as a really large positive value).

3.2.2.5: C++ and Assembly Class Types

C++ and HLA both support classes. Since HLA is an assembly language, it should be obvious that anything you can do with a C++ class you can do with in HLA (since C++ compilers convert C++ source code into x86 machine code, it should be obvious that you can achieve anything possible in C++ using assembly language). However, directly translating C++ classes into HLA classes is not a trivial matter. Many constructs transfer across directly. Some constructs in C++, however, do not have a direct correspondence in HLA. Quite honestly, the conversion of C++ classes to HLA classes is beyond the scope of this book; fortunately, we're dealing with the Win32 API in this book so we won't see much Windows source code that uses classes in typical documentation we're interested it. There is quite a bit of C++-based object-oriented code out there for Windows but most of it uses the Microsoft Foundation Classes (MFC) class library, and that's not applicable to assembly language programming (at least, not today; someday there might be an "Assembly Language Foundation Class" library but until that day arrives we don't have to worry about MFC). Since this book covers Win32 API programming, there really is no need to worry about converting C++ classes into assembly - the Win32 API documentation doesn't use classes. This book may very well use HLA classes and objects in certain programming examples, but that will be pure assembly language, not the result of translating C++ code into HLA code.

3.2.3: C and Assembly Pointer Types7

When running under 32-bit versions of the Windows operating system, C pointers are always 32-bit values and map directly to 32-bit flat/linear machine addresses. This is true regardless of what the pointer references (including both data and functions in C). At one level, the conversion of a pointer from C to HLA/assembly is fairly trivial, just create a dword variable in you HLA program and keep the pointer there. However, HLA (like C) supports typed pointers and procedure pointers; generally, it's a good idea to map C pointers to their HLA typed-pointer equivalent.

In C, you can apply the address-of operator, `&' to an object (e.g., a static variable or a function name) to take the address of that object. In HLA, you may use this same operator to take the address of a static object (a static/storage/readonly variable, statement label, or procedure name). The result is a 32-bit value that you may load into a register or 32-bit memory variable. For example, if in C you have a statement like the following (pi is a pointer to an integer and i is a static integer variable):

pi = &i;
 

You convert this to HLA syntax as follows:

mov( &i, pi );
 

If the object you're taking the address of with the `&' operator is not a function name or a static (non-indexed) variable, then you must compute the address of the object at run-time using the lea instruction. For example, automatic variables (non-static local variables) fall into this class. Consider the following C function fragment:


 
	int f( void )
 
	{
 
		int *pi;   // Declares a variable that is a pointer to an integer.
 
		int i;     // Declares a local integer variable using automatic allocation.
 
 
 
		pi = &i;
 
			.
 
			.
 
			.
 
	}
 

 

Because i is an automatic variable (the default storage class for local variables in a function) HLA cannot static compute this address for you at compile time. Instead, you would use the lea instruction as follows:


 
	procedure f;
 
	var
 
		pi: pointer to int32;  // Declare HLA pointers this way.
 
		i:  int32;             // Declares an automatic variable in HLA (in the VAR section)
 
			.
 
			.
 
			.
 
		lea( eax, i );    // Compute the run-time address of i
 
		mov( eax, pi );   // Save address in pi.
 
			.
 
			.
 
			.
 

 

You should also note that you need to use the lea instruction when accessing an indexed object, even if the base address is a static variable. For example, consider the following C/C++ code that takes the address of an array element:

	int *pi;
 
	static int array[ 16 ];
 
		.
 
		.
 
		.
 
	pi = &array[i];
 

 

In order to access an array element in assembly language, you will need to use a 32-bit register as an index register and compute the actual element address at run-time rather than at compile-time (assuming that i in this example is a variable rather than a constant). Therefore, you cannot use the `&' operator to statically compute the address of this array element at compile-time, instead you should use the lea instruction as follows:


 
	mov( i, ebx );               // Move index into a 32-bit register.
 
	lea( eax, array[ ebx*4 ] );  // int objects are four bytes under Win32
 
	mov( eax, pi );
 

 

As you've seen in these simple examples, C uses the unary `*' operator to declare pointer objects. Anytime you see a declaration like the following in C:


 
	typename *x;
 

 

you can convert it to a comparable HLA declaration as follows:


 
	x:pointer to hla_typename;  // hla_typename is the HLA equivalent of C's typename
 

 

Of course, since all pointers are simply 32-bit objects, you can also convert to assembly language using a statement like the following:


 
	x:dword;
 

 

Both forms are equivalent in assembly language, though the former version is preferable since it's a little more descriptive of what's going on here.

Function pointers in C is one area where the C syntax can get absolutely weird. A simple C function pointer declaration takes the following form:


 
	returnType (*functionPtrName) ( parameters );
 

 

For example,


 
	int (*ptrToFuncReturnsInt) ( int i, int j );
 

 

This example declares a pointer to a function that takes two integer parameters and returns an integer value. Note that the following is not equivalent to this example:


 
	int *ptrToFuncReturnsInt( int i, int j );  //Not a function pointer declaration!
 

 

This example is a prototype for a function that returns a pointer to an integer, not a function pointer. C's syntax is a little messed up (i.e., it wasn't thought out properly during the early stages of the design), so you can get some real interesting function pointer declarations; some are nearly indecipherable.

Of course, at the assembly language level all pointers are just dword variables. So all you really need to implement the C function pointer in HLA is a statement like the following:


 
	ptrToFuncReturnsInt: dword;
 

 

You can all this function in HLA using the call instruction as follows:


 
	call( ptrToFuncReturnsInt );
 

 

As for data pointers, however, HLA provides a better solution: procedure variables. A procedure variable is a pointer object that (presumably) contains the address of some HLA procedure. The advantage of a procedure variable over a dword variable is that you can use HLA's high-level syntax calling convention with procedure variables; something you cannot do with a dword variable. Here's an example of a procedure variable declaration in HLA:


 
	ptrToFuncReturnsInt: procedure( i:int32; j:int32 );
 

 

HLA allows you to call the code whose address this variable contains using the standard HLA procedure call syntax, e.g.,


 
	ptrToFuncReturnsInt( 5, intVar );
 

 

To make this same call using a dword variable8, you'd have to manually pass the parameters yourself as follows (assuming you pass the parameters on the stack):


 
		push( 5 );       // Pass first parameter (i)
 
		push( intVar );  // Pass second parameter.
 
		call( ptrToFuncReturnsInt );
 

 

In C, any time a function name appears without the call operator attached to it (the "call operator" is a set of parenthesis that may contain optional parameters), C substitutes the address of the function in place of the name. You do not have to supply the address-of operator (`&') to extract the function's address (though it is legal to go ahead and do so). So if you see something like the following in a C program:

ptrToFuncReturnsInt = funcReturnsInt;
 

where funcReturnsInt is the name of a function that is compatible with ptrToFuncReturnsInt's declaration (e.g., it returns an integer result and has two integer parameters in our current examples), this code is simply taking the address of the function and shoving it into ptrToFuncReturnsInt exactly as though you'd stuck the `&' operator in front of the whole thing. In HLA, you can use the `&' operator to take the address of a function (they are always static objects as far as the compiler is concerned) and move it into a 32-bit register or variable (e.g., a procedure variable). Here's the code above rewritten in HLA:

mov( &funcReturnsInt, ptrToFuncReturnsInt );
 

Both C and HLA allow you to initialize static variables (including pointer variables) at compile time. In C, you could do this as follows:

static int (*ptrToFuncReturnsInt) ( int i, int j) = funcRe-
turnsInt;
 

The comparable statement in HLA looks like this:


 
procedure funcReturnsInt( i:int32; j:int32 );
 
begin funcReturnsInt;
 
		.
 
		.
 
		.
 
end funcReturnsInt;
 
		.
 
		.
 
		.
 
static
 
	ptrToFuncReturnsInt:procedure( i:int32; j:int32 ) := &funcReturnsInt;
 

 

Especially note that in HLA, unlike C, you still have to use the `&' operator when taking the address of a function name.

Note that you cannot initialize HLA automatic variables (var variables) using a statement like the one in this example. Instead, you must move the address of the function into the pointer variable using the mov instruction given a little earlier.

Arrays are another object that C treats specially with respect to pointers. Like functions, C will automatically supply the address of an array if you specify the name of an array variable without the corresponding index operator (the square brackets). HLA requires that you explicitly take the address of the array variable. If the array is a static object (static/readonly/storage) then you may use the (static) address-of operator, `&'; however, if the variable is an automatic (var) object, then you have to take the address of the object at run-time using the lea instruction:

static
 
	staticArray  :byte[10];
 
var
 
	autoArray    :byte[10];
 
		.
 
		.
 
		.
 
	mov( &staticArray, eax );   // Can use `&' on static objects
 
	lea( ebx, autoArray );      // Must use lea on automatic (VAR) objects.
 

 

C doesn't automatically substitute the address of a structure or union whenever it encounters a struct or union variable. You have to explicitly take the address of the object using the `&' operator. In HLA, taking the address of a structure or union operator is very easy - if it's a static object you can use the `&' operator, if it's an automatic (var) object, you have to use the lea instruction to compute the address of the object at run-time, just like other variables in HLA.

There are a couple of different ways that C/C++ allows you to dereference a pointer variable9. First, as you've already seen, to dereference a function pointer you simply "call" the function pointer the same way you would directly call a function: you append the call operator (parenthesis and possible parameters) to the pointer name. As you've already seen, you can do the same thing in HLA as well. Technically, you could also dereference a function pointer in C/C++ as follows:

(*ptrToFuncReturnsInt)( 5, intVar );
 

However, you'll rarely see this syntax in an actual C/C++ source file. You may convert an indirect function call in C/C++ to either HLA's high-level or low-level syntax, e.g., the following two calls are equivalent:


 
		push( 5 );
 
		push( intVar );
 
		call( ptrToFuncReturnsInt );
 

 
// -or-
 

 
		ptrToFuncReturnsInt( 5, intVar );
 

 

Dereferencing a pointer to a data object is a bit more exciting in C/C++. There are several ways to dereference pointers depending upon the underlying data type that the pointer references. If you've got a pointer that refers to a scalar data object in memory, C/C++ uses the unary `*' operator. For example, if pi is a pointer that contains the address of some integer in memory, you can reference the integer value using the following C/C++ syntax:


 
		*pi = i+2;  // Store the sum i+2 into the integer that pi points at.
 
		j = *pi;    // Grab a copy of the integer's value and store it into j.
 

 

Converting an indirect reference to assembly language is fairly simple. The only gotcha, of course, is that you must first move the pointer value into an 80x86 register before dereferencing the pointer. The following HLA examples demonstrate how to convert the two C/C++ statements in this example to their equivalent HLA code:


 
	// *pi = i + 2;
 

 
	mov( pi, ebx );    // Move pointer into a 32-bit register first!
 
	mov( i, eax );     // Compute i + 2 and leave sum in EAX
 
	add( 2, eax );
 
	mov( eax, [ebx] ); // Store i+2's sum into the location pointed at by EBX.
 

 
	// j = *pi;
 

 
	mov( pi, ebx );    // Only necessary if EBX no longer contains pi's value!
 
	mov( [ebx], eax ); // Only necessary if EAX no longer contains *pi's value!
 
	mov( eax, j );     // Store the value of *pi into j.
 

 

If you've got a pointer that holds the address of a sequence of data values (e.g., an array), then there are two completely different (but equivalent) ways you can indirectly access those values. One way is to use C/C++'s pointer arithmetic syntax, the other is to use array syntax. Assuming pa is a pointer to an array of integers, the following example demonstrates these two different syntactical forms in action:


 
	*(pa+i) = j;   // Stores j into the ith object beyond the address held in pa.
 
	pa[i] = j;     // Stores j into the ith element of the array pointed at by pa.
 

 

The important thing to note here is that both forms are absolutely equivalent and almost every C/C++ compiler on the planet generates exactly the same code for these two statements. Since compilers generally produce exactly the same code for these two statements, it should come as no surprise that you would manually convert these two statements to the same assembly code sequence. Conversion to assembly language is slightly complicated by the fact th