Chapter 12
|
Mixed-Language Programming |
Mixed-language programming allows you to combine the unique strengths of Microsoft Basic, C, C++, and FORTRAN with your assembly-language routines. Any one of these languages can call MASM routines, and you can call any of these languages from within your assembly-language programs. This makes virtually all the routines from high-levellanguage libraries available to a mixed-language program.
MASM 6.1 provides mixed-language features similar to those in high-level languages. For example, you can use the INVOKE directive to call high-level-language procedures, and the assembler handles the argument-passing details for you. You can also use H2INC to translate C header files to MASM include files, as explained in Chapter 20 of Environment and Tools.
The mixed-language features of MASM 6.1 do not make older methods of defining mixed-language interfaces obsolete. In most cases, mixed-language programs written with earlier versions of MASM will assemble and link correctly under MASM 6.1. (For more information, see Appendix A.)
This chapter explains how to write assembly routines that can be called from high-levellanguage modules and how to call high-level language routines from MASM. You should already understand the languages you want to combine and should know how to write, compile, and link multiple-module programs with these languages.
This chapter covers only assembly-language interface with C, C++, Basic, and FORTRAN; it does not cover mixed-language programming between high-level languages. The focus here is the Microsoft versions of C, C++, Basic, and FORTRAN, but the same principles apply to other languages and compilers. Many of the techniques used in this chapter are explained in the material in Chapter 7 on writing procedures in assembly language, and in Chapter 8 on multiple-module programming.
The first section of this chapter discusses naming and calling conventions. The next section, Writing an Assembly Procedure for a Mixed-Language Program, provides a template for writing an assembly-language procedure that can be called from another module written in a high-level language. This represents the essence of mixed-language programming. Assembly language is often used for creating fast secondary routines in a large program written in a high-level language.
The third section describes specific conventions for linking assembly-language procedures with modules in C, C++, Basic, and FORTRAN. These language-specific sections also provide details on how the language manages various data structures so that your MASM programs are compatible with the data from the high-level language.
Each language has its own set of conventions, which fall into two categories:
The naming convention specifies how or if the compiler or assembler alters the name of an identifier before placing it into an object file.
The calling convention determines how a language implements a call to a procedure and how the procedure returns to the caller.
MASM supports several different conventions. The assembler uses C convention when you specify a language type (langtype) of C, and Pascal convention for language types PASCAL, BASIC, or FORTRAN. To the assembler, the keywords BASIC, PASCAL, and FORTRAN are synonymous. MASM also supports the SYSCALL and STDCALL conventions, which mix elements of the C and Pascal conventions.
MASM gives you several ways to set the naming and calling conventions in your assembly-language program. Using .MODEL with a langtype sets the default for the module. This can also be done with the OPTION directive. This is equivalent to the /Gc or /Gd option from the command line. Procedure prototypes and declarations can specify a langtype to override the default.
When you write mixed-language routines, the easiest way to ensure convention compatibility is to adopt the conventions of the called procedures language. However, Microsoft languages can change the naming and calling conventions for different procedures. If your program must call a procedure that uses an argument-passing method different from that of the default language, prototype the procedure first with the desired language type. This tells the assembler to override the conventions of the default language and assume the proper conventions for the prototyped procedure. The MASM/High-LevelLanguage Interface section in this chapter explains how to change the default conventions. The following sections provide more detail on the information summarized in Table 12.1.
Table 12 . 1 Naming and Calling Conventions
|
|
||||||
|
Convention |
C |
SYSCALL |
STDCALL |
BASIC |
FORTRAN |
PASCAL |
|
|
||||||
|
Leading underscore |
X |
X |
||||
|
Capitalize all |
X |
X |
X |
|||
|
Arguments pushed left to right |
X |
X |
X |
|||
|
Arguments pushed right to left |
X |
X |
X |
|||
|
Caller stack cleanup |
X |
X |
* |
|||
|
:VARARG allowed |
X |
X |
X |
|||
|
|
||||||
* The STDCALL language type uses caller stack cleanup if the :VARARG parameter is used. Otherwise, the called routine must clean up the stack.
Naming convention refers to the way a compiler or assembler stores the names of identifiers. The first two rows of Table 12.1 show how each language type affects symbol names. SYSCALL leaves symbol names as they appear in the source code, but C and STDCALL add an underscore prefix. PASCAL, BASIC, and FORTRAN change symbols to all uppercase.
The following list describes how these naming conventions affect a variable called Big Time in your source code:
|
Langtype Specified |
Characteristics |
|
|
|
|
SYSCALL |
Leaves the name unmodified. The linker sees the variable as Big Time. |
|
C, STDCALL |
The assembler (or compiler) adds a leading underscore to the name, but does not change case. The linker sees the variable as _Big Time. |
|
PASCAL, FORTRAN, BASIC |
Converts all names to uppercase. The linker sees the variable as Big Time. |
Specify the C language type for assembly-language procedures called from programs that assume the C calling convention. Note that such programs are not necessarily written in C, since other languages can mimic C conventions.
With the C calling convention, the caller pushes arguments from right to left as they appear in the callers argument list. The called procedure returns without removing the arguments from the stack. It is the callers responsibility to clean the stack after the call, either by popping the arguments or by adding an appropriate value to the stack pointer SP.
The called routine must return with the original values in BP, SI, DI, DS, and SS. It must also preserve the direction flag.
The additional overhead of cleaning the stack after each call has compensations. It frees the caller from having to pass a set number of arguments to the called procedure each time. Because the first argument in the list is always the last one pushed, it is always on the top of the stack. Thus, it has the same address relative to the frame pointer, regardless of how many arguments were actually passed.
For example, consider the C library function printf, which accepts different numbers of arguments. A C program calls the function like this:
printf( "Numbers: %f %f %.2f\n", n1, n2, n3 );
printf( "Also: %f", n4 );
The first line passes four arguments (including the string in quotes) and the second line passes only two arguments. Notice that printf has no reliable way of determining how many arguments the caller has pushed. Therefore, the function returns without adjusting the stack. The C calling convention requires the caller to take responsibility for removing the arguments from the stack, since only the caller knows how many arguments it passed.
Use INVOKE to call a C-callable function from your assembly-language program, since INVOKE automatically generates the necessary stack-cleaning code after the call. You must also prototype the function with the VARARG keyword if appropriate, as explained in Procedures, Chapter 7. Similarly, when you write a C-callable procedure that accepts a varying number of arguments, include VARARG in the procedures PROC statement.
By default, the langtype for FORTRAN, BASIC, and PASCAL selects the Pascal calling convention. This convention pushes arguments left to right so that the last argument is lowest on the stack, and it requires that the called routine remove arguments from the stack.
Arguments are placed on the stack in the same order in which they appear in the source code. The first argument is highest in memory (because it is also the first argument to be placed on the stack), and the stack grows downward.
A routine that uses the Pascal calling convention must preserve SI, DI, BP, DS, and SS. For 32-bit code, the EBX, ES, FS, and GS registers must be preserved as well as EBP, ESI, and EDI. The direction flag is also cleared upon entry and must be preserved.
Passing a variable number of arguments is not possible with the Pascal calling convention.
A STDCALL procedure adopts the C name and calling conventions when prototyped with the VARARG keyword. Refer to the section Declaring Parameters with the PROC Directive in Chapter 7. Without VARARG, the procedure uses the C naming and Pascal calling conventions. STDCALL provides compatibility with 32-bit versions of Microsoft compilers.
As Table 12.1 shows, SYSCALL is identical to the C calling convention, but does not add an underscore prefix to symbols.
Argument passing order for both STDCALL and SYSCALL is the same as the C calling convention. The caller pushes the arguments from right to left and must remove the parameters from the stack after the call. However, STDCALL requires the called procedure to clean the stack if the procedure does not accept a variable number of arguments.
Both conventions require the called procedure to preserve the registers BP, SI, DI, DS, and SS. Under STDCALL, the direction flag is clear on entry and must be returned clear.
SYSCALL allows a variable number of arguments in the same way as the C calling convention. STDCALL also mimics the C convention when VARARG appears in the called procedures declaration or definition. It allows a varying number of arguments and requires the caller to clean the stack. If not declared or defined with VARARG, the called procedure does not accept a variable argument list and must clean the stack before it returns.
MASM 6.1 simplifies the coding required for linking MASM routines to high-level language routines. You can use the PROTO directive to write procedure prototypes, and the INVOKE directive to call external routines. MASM simplifies procedure-related tasks in the following ways:
The PROTO directive improves error checking on argument types.
INVOKE pushes arguments onto the stack and converts argument types to types expected when possible. These arguments can be referenced by their parameter label, rather than as offsets of the stack pointer.
The LOCAL directive following the PROC statement saves places on the stack for local variables. These variables can also be referenced by name, rather than as offsets of the stack pointer.
PROC sets up the appropriate stack frame according to the processor mode.
The USES keyword preserves registers given as arguments.
The C calling conventions specified in the PROC syntax allow for a variable number of arguments to be passed to the procedure.
The RET keyword adjusts the stack upward by the number of bytes in the argument list, removes local variables from the stack, and pops saved registers.
The PROC statement lists parameter names and types. The parameters can be referenced by name inside the procedure.
The complete syntax and parameter descriptions for these procedure directives are explained in Procedures in Chapter 7. This section provides a template that you can use for writing a MASM routine to be called from a high-level language.
The template looks like this:
Label PROC [[distance langtype visibility <prologueargs> USES reglist parmlist]]
LOCAL varlist
.
.
.
RET
Label ENDP
Replace the italicized words with appropriate keywords, registers, or variables as defined by the syntax in Declaring Parameters with the PROC Directive in Chapter 7.
The distance (NEAR or FAR) and visibility (PUBLIC, PRIVATE, or EXPORT) that you give in the procedure declaration override the current defaults. In some languages, the model can also be specified with command-line options.
The langtype determines the calling convention for accessing arguments and restoring the stack. For information on calling conventions, see Naming and Calling Conventions earlier in this chapter.
The types for the parameters listed in the parmlist must be given. Also, if any of the parameters are pointers, the assembler does not generate code to get the value of the pointer references. You must write this code yourself. An example of how to write such code is provided in Declaring Parameters with the PROC Directive in Chapter 7.
If you need to code your own stack-frame setup manually, or if you do not want the assembler to generate the standard stack setup and cleanup, see Passing Arguments on the Stack and User-Defined Prologue and Epilogue Code in Chapter 7.
Since high-levellanguage programs require initialization, you must write the main routine of a mixed-language program in the high-level language, or link with the startup code supplied by the high-levellanguage compiler. This gives the assembly code access to high-level routines or library functions. The next section explains how to link an assembly-language program with C-language startup code.
For procedures with prototypes, INVOKE makes calls from MASM to high-level
language programs, much like procedure or function calls in the high-level language. INVOKE calls procedures and generates the code to push arguments in the order specified by the procedures calling convention, and to remove arguments from the stack at the end of the procedure.
INVOKE can also do type checking and data conversion for the argument types so that the procedure receives compatible data. For explanations of how to write procedure prototypes and several examples of procedure declarations and the corresponding prototypes, see Declaring Procedure Prototypes in Chapter 7.
For programs that mix assembly language and C, the H2INC utility makes it easy to write prototypes and data declarations for the C procedures you want to call from MASM. H2INC translates the C prototypes and declarations into the corresponding MASM prototypes and declarations, which INVOKE can use to call the procedure. The use of H2INC is explained in Chapter 20 in Environment and Tools.
Mixed-language programming also allows the main program or a routine to use external data data defined in the other module. External data is the data that is stored in a set place in memory (unlike dynamic and local data, which is allocated on the stack and heap) and is visible to other modules.
External data is shared by all routines. One of the modules must define the static data, which causes the compiler to allocate storage for the data. The other modules that access the data must declare the data as external.
Each language has its own convention for how an argument is actually passed. If the argument-passing conventions of your routines do not agree, then a called routine receives bad data. Microsoft languages support three different methods for passing an argument:
Near reference. Passes a variables near (offset) address, expressed as an offset from the default data segment. This method gives the called routine direct access to the variable itself. Any change the routine makes to the parameter is reflected in the calling routine.
Far reference. Passes a variables far (segmented) address. Though slower than passing a near reference, this method is necessary for passing data that lies outside the default data segment. (This is not an issue in Basic unless you have specifically requested far memory.)
Value. Passes only a copy of the variable, not its address. With this method, the called routine gets a copy of the argument on the stack, but has no access to the original variable. The copy is discarded when the routine returns, and the variable retains its original value.
When you pass arguments between routines written in different languages, you must ensure that the caller and the called routine use the same conventions for passing and receiving arguments. In most cases, you should check the argument-passing defaults used by each language and make any necessary adjustments. Most languages have features that allow you to change argument-passing methods.
A procedure called from any high-level language should preserve the direction flag and the values of BP, SI, DI, SS, and DS. Routines called from MASM must not alter SI, DI, SS, DS, or BP.
Microsoft high-level languages push segment addresses before offsets. This lets the called routine use the LES and LDS instructions to read far addresses from the stack. Furthermore, each word of an argument is placed on the stack in order of significance. Thus, the high word of a long integer is pushed first, followed by the low word.
Most high-level-language compilers store arrays in row-major order. This means that all elements of a row are stored consecutively. The first five elements of an array with four rows and three columns are stored in row-major order as
A[1, 1], A[1, 2], A[1, 3], A[2, 1], A[2, 2]
In column-major order, the column elements are stored consecutively. For example, this same array would be stored in column-major order as
A[1, 1], A[2, 1], A[3, 1], A[4, 1], A[1, 2], A[2, 2]
This section summarizes the characteristics of the interface between MASM and Microsoft C and QuickC compilers. With the default naming and calling convention, the assembler (or compiler) pushes arguments right to left and adds a leading underscore to routine names.
This list shows the 16-bit C data types and equivalent data types in MASM 6.1. For 32-bit C compilers, int and unsigned int are equivalent to the MASM types SDWORD and DWORD, respectively.