Chapter 2

Organizing Segments


Understanding segments is an essential part of programming in assembly language. In the family of 8086-based processors, the term segment has two meanings:

  A block of memory of discrete size, called a “physical segment.” The number of bytes in a physical memory segment is 64K for 16-bit processors or 4 gigabytes for 32-bit processors.

  A variable-sized block of memory, called a “logical segment,” occupied by a program’s code or data.

 

As you read this chapter, the distinction between the two definitions will become clear. The adjectives “physical” and “logical” are not often used when speaking of segments. The beginning programmer is left to infer from context which definition applies. Fortunately, this is not difficult, and a distinction is often not required.

This chapter begins with a close look at physical memory segments. This lays the foundation for understanding logical segments, which form the subject of most of the following sections.

The section “Using Simplified Segment Directives” explains how to begin, end, and organize segments. It also explains how to access far data and code with simplified segment directives.

The next section, “Using Full Segment Definitions,” describes how to order, combine, and divide segments, and how to use the SEGMENT directive to define full segments. It also explains how to create a segment group so that you can use one segment address to access all the data.

Most of the information in this chapter also applies to writing modules to be called from other programs. Exceptions are noted when they apply. For more information about multiple-module programming, see Chapter 8, “Sharing Data and Procedures Among Modules and Libraries.”

Physical Memory Segments

As explained in Chapter 1, a physical segment can begin only at memory locations evenly divisible by 16, including address 0. Intel calls such locations “paragraphs.” You can easily recognize a paragraph location because its hexadecimal address always ends with 0, as in 10000h or 2EA70h. The 8086/286 processors allow segments 64K in size, the largest number 16 bits can represent. The 80386/486 processors still adhere to the 64K limit when running in real mode. In protected mode, however, they use 32-bit registers that can hold addresses up to 4 gigabytes.

Segmented architecture presents certain hurdles for the assembly-language programmer. For small programs, the limitations lose importance. Code and data each occupy less than 64K and reside in individual segments. A simple offset locates each variable or instruction within a segment.

Larger programs, however, must contend with problems of segmented memory areas. If data occupies two or more segments, the program must specify both segment and offset to access a variable. When the data forms a continuous stream across segments such as the text in a word processor’s workspace the problems become more acute. Whenever it adds or deletes text in the first segment, the word processor must seamlessly move data back and forth over the boundaries of each following segment.

The problem of segment boundaries disappears in the so-called flat address space of 32-bit protected mode. Although segments still exist, they easily hold all the code and data of the largest programs. Even a very large program becomes in effect a small application, able to reach all code and data with a single offset address.

Logical Segments

Logical segments contain the three components of a program: code, data, and stack. MASM organizes the three parts for you so they occupy physical segments of memory. The segment registers CS, DS, and SS contain the addresses of the physical memory segments where the logical segments reside.

You can define segments in two ways: with simplified segment directives and with full segment definitions. You can also use both kinds of segment definitions in the same program.

Simplified segment directives hide many of the details of segment definition and assume the same conventions used by Microsoft high-level languages. (See the following section, “Using Simplified Segment Directives.”) The simplified segment directives generate necessary code, specify segment attributes, and arrange segment order.

Full segment definitions require more complex syntax but provide more complete control over how the assembler generates segments. (See “Using Full Segment Definitions” later in this chapter.) If you use full segment definitions, you must write code to handle all the tasks performed automatically by the simplified segment directives.

Using Simplified Segment Directives

Structuring a MASM program using simplified segments requires use of several directives to assign standard names, alignment, and attributes to the segments in your program. These directives define the segments in such a way that linking with Microsoft high-level languages is easy.

The simplified segment directives are .MODEL, .CODE, .CONST, .DATA, .DATA?, .FARDATA, .FARDATA?, .STACK, .STARTUP, and .EXIT. The following sections discuss these directives and the arguments they take.

MASM programs consist of modules made up of segments. Every program written only in MASM has one main module, where program execution begins. This main module can contain code, data, or stack segments defined with all of the simplified segment directives. Any additional modules should contain only code and data segments. Every module that uses simplified segments must, however, begin with the .MODEL directive.

The following example shows the structure of a main module using simplified segment directives. It uses the default processor (8086) and the default stack distance (NEARSTACK). Additional modules linked to this main program would use only the .MODEL, .CODE, and .DATA directives and the END statement.

; This is the structure of a main module
; using simplified segment directives

        .MODEL small, c ; This statement is required before you
                        ;   can use other simplified segment directives

        .STACK          ; Use default 1-kilobyte stack
 
        .DATA           ; Begin data segment
 
                        ; Place data declarations here
 
        .CODE           ; Begin code segment
        .STARTUP        ; Generate start-up code
 
                        ; Place instructions here
 
        .EXIT           ; Generate exit code
        END

The .DATA and .CODE statements do not require any separate statements to define the end of a segment. They close the preceding segment and then open a new segment. The .STACK directive opens and closes the stack segment but does not close the current segment. The END statement closes the last segment and marks the end of the source code. It must be at the end of every module.

Defining Basic Attributes with .MODEL

The .MODEL directive defines the attributes that affect the entire module: memory model, default calling and naming conventions, operating system, and stack type. This directive enables use of simplified segments and controls the name of the code segment and the default distance for procedures.

You must place .MODEL in your source file before any other simplified segment directive. The syntax is:

.MODEL memorymodel [[, modeloptions ]]

The memorymodel field is required and must appear immediately after the .MODEL directive. The use of modeloptions, which define the other attributes, is optional. The modeloptions must be separated by commas. You can also use equates passed from the ML command line to define the modeloptions.

The following list summarizes the memorymodel field and the modeloptions fields, which specify language and stack distance:

Field

Description

Memory model

TINY, SMALL, COMPACT, MEDIUM, LARGE, HUGE, or FLAT. Determines size of code and data pointers. This field is required.

Language

C, BASIC, FORTRAN, PASCAL, SYSCALL, or STDCALL. Sets calling and naming conventions for procedures and public symbols.

Stack distance

NEARSTACK or FARSTACK. Specifying NEARSTACK groups the stack segment into a single physical segment (DGROUP) along with data. SS is assumed to equal DS. FARSTACK does not group the stack with DGROUP; thus SS does not equal DS.

 

 

You can use no more than one reserved word from each field. The following examples show how you can combine various fields:

         .MODEL   small                  ; Small memory model
         .MODEL   large, c, farstack     ; Large memory model,
                                         ;   C conventions,
                                         ;   separate stack
         .MODEL   medium, pascal         ; Medium memory model,
                                         ;   Pascal conventions,
                                         ;   near stack (default)

The next four sections give more detail on each field.

Defining the Memory Model

MASM supports the standard memory models used by Microsoft high-level languages tiny, small, medium, compact, large, huge, and flat. You specify the memory model with attributes of the same name placed after the .MODEL directive. With the exception of the flat model, which requires instructions specific to the 80386/486, your choice of a memory model does not limit the kind of instructions you can write. The memory model does, however, control segment defaults and determine whether data and code are near or far by default, as indicated in the following table.

Table 2.1    Attributes of Memory Models

Memory Model

Default Code

Default Data

Operating
System

Data and Code Combined

Tiny

Near

Near

MS-DOS

Yes

Small

Near

Near

MS-DOS, Windows

No

Medium

Far

Near

MS-DOS, Windows

No

Compact

Near

Far

MS-DOS, Windows

No

Large

Far

Far

MS-DOS, Windows

No

Huge

Far

Far

MS-DOS, Windows

No

Flat

Near

Near

Windows NT

Yes

 

When writing assembler modules for a high-level language, you should use the same memory model as the calling language. Choose the smallest memory model available that can contain your data and code, since near references operate more efficiently than far references.

The predefined symbol hidden@Model returns the memory model, encoding memory models as integers 1 through 7. For more information on predefined symbols, see “Predefined Symbols” in Chapter 1. For an example of how to use them, see Help.

The seven memory models supported by MASM 6.1 fall into three groups, described in the following paragraphs.

Small, Medium, Compact, Large, and Huge Models

The traditional memory models recognized by many languages are small, medium, compact, large, and huge. Small model supports one data segment and one code segment. All data and code are near by default. Large model supports multiple code and multiple data segments. All data and code are far by default. Medium and compact models are in-between. Medium model supports multiple code and single data segments; compact model supports multiple data segments and a single code segment.

Huge model implies individual data items larger than a single segment, but the implementation of huge data items must be coded by the programmer. Since the assembler provides no direct support for this feature, huge model is essentially the same as large model.

In each of these models, you can override the default. For example, you can make large data items far in small model, or internal procedures near in large model.

Tiny Model

Tiny-model programs run only under MS-DOS. Tiny model places all data and code in a single segment. Therefore, the total program file size can occupy no more than 64K. The default is near for code and static data items; you cannot override this default. However, you can allocate far data dynamically at run time using MS-DOS memory allocation services.

Tiny model produces MS-DOS .COM files. Specifying .MODEL tiny automatically sends the /TINY argument to the linker. Therefore, the /AT argument is not necessary with .MODEL tiny. However, /AT does not insert a .MODEL directive. It only verifies that there are no base or pointer fixups, and sends /TINY to the linker.

Flat Model

The flat memory model is a nonsegmented configuration available in 32-bit operating systems. It is similar to tiny model in that all code and data go in a single 32-bit segment.

To write a flat model program, specify the .386 or .486 directive before .MODEL FLAT. All data and code (including system resources) are in a single 32-bit segment. The operating system automatically initializes segment registers at load time; you need to modify them only when mixing 16-bit and 32-bit segments in a single application. CS, DS, ES, and SS all occupy the supergroup FLAT. Addresses and pointers passed to system services are always 32-bit near addresses and pointers.

Choosing the Language Convention

The language option facilitates compatibility with high-level languages by determining the internal encoding for external and public symbol names, the code generated for procedure initialization and cleanup, and the order that arguments are passed to a procedure with INVOKE. It also facilitates compatibility with high-level language modules. The PASCAL, BASIC, and FORTRAN conventions are identical. C and SYSCALL have the same calling convention but different naming conventions. Functions in the Windows API use the Pascal calling convention.

Procedure definitions (PROC) and high-level procedure calls (INVOKE) automatically generate code consistent with the calling convention of the specified language. The PROC, INVOKE, PUBLIC, and EXTERN directives all use the naming convention of the language. These directives follow the default language conventions from the .MODEL directive unless you specifically override the default. Use of these directives is explained in “Controlling Program Flow,” Chapter 7. You can also use the OPTION directive to set the language type. (See “Using the OPTION Directive” in Chapter 1.) Not specifying a language type in either the .MODEL, OPTION, EXTERN, PROC, INVOKE, or PROTO statement causes the assembler to generate an error.

The predefined symbol hidden@Interface provides information about the language parameters. For a description of the bit flags, see Help.