Open Architecture Handbook The Borland Developer's Technical Guide _________________________________________________ BORLAND INTERNATIONAL, INC. 100 BORLAND WAY P.O. BOX 660001 SCOTTS VALLEY, CA 95067-0001 unknown 1 Copyright * 1991, 1993 by Borland International. All rights reserved. All Borland products are trademarks or registered trademarks of Borland International, Inc. Windows, as used in this manual, shall refer to Microsoft's implementation of a windows system. Other brand and product names are trademarks or registered trademarks of their respective holders. PRINTED IN THE USA. R1 10 9 8 7 6 5 4 3 2 1 2 Open Architecture Handbook INTRODUCTION ________________________________________________________________________________ This book presents technical information about several of Borland's language tools, including internal functions implementation details file formats, and other technical specifications It is for advanced users and corporate developers who want to utilize the "behind the scenes" features of Borland's products to develop their own customized tools and environments, and to provide better compatibility with existing code and tools from other vendors. Why open architecture? At the beginning the PC's second decade, one word has captured the spirit and attention of the entire computer industry. It is the word open. Today we hear more and more about open systems, open standards, open tools . . . and open architectures. Along with object-oriented design, the open architecture movement heralds a new era of modular software that is designed to be shareable, extensible and compatible. Just as today's users want a database that integrates smoothly with their word processor, their spreadsheet and their company's mainframe database, so today's software developers demand editors, compilers, debuggers, application frameworks and other tools that they can "mix and match," tools that they can extend or enhance themselves, open development environments that they can customize to work the way they want. The age of closed environments and inaccessible proprietary architectures is coming to an end. With more open and compatible software tools, programmers are better able to create the exciting, reliable and cost-effective software that the nineties will demand. Introduction Page 1 Borland language tools As a leader in the object-oriented design revolution, Borland maintains an unqualified commitment to the open architecture movement. This book, as part of that commitment, provides detailed technical information about the "guts" of Borland's language development tools: internal file formats, compiler implementation details, debugger record structures and much more. This information will enable programmers to extend Borland's tools to meet their own needs, help third-party developers spawn compatible add-on tools, aid software engineers in squeezing out the utmost performance levels from their code by taking advantage of implementation-specific features, and give all programmers greater control and independence over their development environment. How to use this book This book, as befits its subject, is not for the novice user or the technically unsophisticated. Written largely by the Borland developers who actually created the tools described, its style is terse and technical. Every effort has been made to present the topics clearly and in an easy-to-read manner, but the presentation is not a "tutorial," nor are basic concepts of the tools discussed at great length. It is best viewed as a collection of technical papers by developers for developers, presenting hard to find information in a convenient and readily-accessible form. In the chapters which follow, individual specifications will be presented for these Borland tools and standards: Tools discussed C++ object mapping: a detailed description of the Borland C++ implementation's internal strategy for representing objects of various types. Included are the compiler's name mangling rules and discussions of class datas and function members, object initialization, hidden parameters, RTL helper functions, virtual tables and vtable pointers, and dynamically dispatched virtual tables (DDVTs). Object file format: a listing of the structure and content of each type of record emitted by Borland C++ when it produces object files. VIRDEF records: a discussion and format listing of Borland's VIRDEF record type. VIRDEF records are utilized by the linker to support virtual definitions for some C++ types. A VIRDEF record is otherwise similar to a COMDEF record. Symbol table format: presents a brief discussion and layout of the general symbol table which appears at the head of each .EXE file. The symbol table contains TLINK debugger and browser information. Project file format: a detailed layout of the Borland C++ Project file format, used by IDE's Project Make facility. Borland Graphics Interface: describes BGI driver architecture, headers, status and vector tables, structure, and provides a cookbook and examples. Introduction Page 2 ObjectWindows: Borland's ObjectWindows Library (OWL) is a complete application framework for Windows developers. This chapter presents the technical specification for the library, including class structure, protocol and behavior, as well as implementation notes. Borland Windows Custom Controls: presents the technical specifications, usage conventions, and a listing of notification messages in the BWCC custom controls and dialog classes. Borland Help System: defines the Borland Help System, including the source text file format, binary Help file format, and the run-time Help engine. Accompanying software The accompanying Examples and Supplementary Software disk contains a number of brief example programs utilizing the information contained in this book. The examples are referenced in the chapter(s) to which each example applies. A brief disclaimer The information presented in this guide is for the benefit of advanced developers who wish to take advantage of various internal features and formats of the Borland tools. We hope this information is helpful to you and enhances the usefulness of Borland's language development products. Due to its highly technical nature, this material is not documented in the product manuals, and cannot be supported by our customer service staff. Introduction Page 3 Chapter 1 Page 4 CHAPTER ________________________________________________________________________________ 1 C++ object mapping This chapter describes how Turbo C++ and Borland C++ handles memory for C++ objects. The following applies both to the 16-bit (segmented address space) and the 32-bit versions of BCC. Whenever the text has a near or far pointer, this applies to the 16-bit version, and a 32-bit (flat) pointer is to be substituted for the 32-bit version. When the text describes two near and far flavors of the same data structure, a single version using 32-bit flat pointers is to be used for the 32-bit product. Nonstatic data members Borland C++ compilers allocate space for nonstatic data members in order of declaration and regardless of access specifiers. When the word alignment compiler option is turned on, all members larger than 1 byte are aligned on a word boundary (the 32-bit compiler allows alignment on both a multiple of 2 and a multiple of 4 offset, depending on the state of the alignment options and/or the presence of #pragma pack). Nonvirtual base classes Nonvirtual base class members, including compiler defined members, such as vtable pointers, always precede any derived class members, and are allocated in order of declaration, as shown in the following example. Padding is inserted if dictated by the state of compiler alignment options. class B { int b1, b2; }; class D:B { int d; }; The following diagram represents an instance of D: Chapter 1 Page 5 ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ B::b1 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ B::b2 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ D::d ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Virtual base classes At the point where a particular base would occur in an object if the base weren't virtual, which is the case in the previous example, a virtual base class pointer is stored instead, and all of the virtual bases for an object follow all of the nonvirtual bases as well as the derived class, in the order of construction as specified by the language. The virtual base class pointer is always a 16-bit offset pointer, because a class instance can't span a segment boundary. A compiler option offered for backward compatibility with previous releases of Turbo C++ and Borland C++ allows the virtual base class pointer to be either a near or a far pointer, depending on size of the this pointer for that class (16-bit compiler only; with the 32-bit compiler, the virtual base pointers are always flat 32-bit pointers). The compiler will insert a hidden 'unsigned int' (i.e. 16-bit for the 16-bit compiler, 32-bit for the 32-bit compiler) displacement member immediately preceding the virtual base class sub-object when the following conditions exist: a class has either a user-defined constructor or destructor, or both the derived class overrides a virtual function defined in one if its virtual bases The displacement member always equals zero with the following exception, which occurs during construction/destruction of the derived class object: If the derived object is embedded in another class and the virtual base class isn't at the same offset from the derived class as it would be in an object of the derived class, then the displacement member is nonzero. The nonzero displacement member is then used in virtual function thunks to ensure that a correct value is passed to the virtual function for the this parameter. For compatibility with older versions of Turbo C++ and Borland C++, a compiler option disables the addition of the hidden displacement member on a per-class basis. The compiler ensures that the derived class of a virtual base with another virtual base has the 'indirect' virtual base as its virtual base for the following reasons: to represent member pointers capable of pointing to members of virtual base classes in a compact and efficient way to limit the involvement of 'derived*' to 'base*' casts to just one virtual base class pointer indirection Chapter 1 Page 6 The compiler adds such virtual base classes following any user-specified base classes, in the order of construction, but the addition occurs only when the particular virtual base can't already be reached from the derived class through only one level of virtual inheritance. The presence of compiler-added virtual base classes doesn't have side-effects such as changing visibility rules. A compiler-added virtual base class is used for casts of pointers and for pointers of members of the virtual base. The representation of member pointers is discussed later. The following example shows the declaration of the simplest virtual base: class VB { int vb; }; class D:virtual VB { int d; }; The instance of D has the following layout: ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ VB sub-obj ptr ÆÍÍÍÍ» ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ º ³ D::d ³ º ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´<ÍÍͼ ³ VB::vb ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The following example shows the declaration of an indirect (or doubly) virtual base: class VB1 { int vb1; }; class VB2 { int vb2; }; class A:virtual VB1 { int a; }; class B:virtual VB2 Chapter 1 Page 7 { int b; }; class C:virtual VB2 { int c; }; class D:virtual A, virtual B, C { int d; }; An instance of D has the following layout: D ÄÄÄÄ> ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ A sub-obj ptr ³ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ>ÍÍÍÍÍÍÍÍÍÍÍ» ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ º ³ B sub-obj ptr ³ÍÍÍÍÍÍÍÍÍÍÍÍÍÍ>ÍÍÍÍÍÍÍÍÍÍ» º D::C ÄÄÄÄ> ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ º º ³ VB2 sub-obj ptr ³ÍÍÍÍÍÍÍÍÍÍÍ>ÍÍÍÍÍÍÍ» º º ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ º º º ³ C::c ³ º º v ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ º v º ***** ÄÄÄÄ> ³ VB1 sub-obj ptr ³ÍÍÍÍÍÍÍ>ÍÍÍÍÍ» v º º ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ º º º º ³ D::d ³ º º º º D::VB1 ÄÄÄÄ> ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ <ÍÍÍÍÍÍÍÍÍÍͺ º º º ³ VB1::vb1 ³ º º º º D::A ÄÄÄÄ> ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ <ÍÍÍÍÍÍ<ÍÍÍͺÍÍ<ÍͺÍÍ<ÍͺÍ<ͼ ³ VB1 sub-obj ptr ³ÍÍÍÍÍÍ>ÍÍÍÍÍͼ º º ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ v v ³ A::a ³ º º D::VB2 ÄÄÄÄ> ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ <ÍÍÍÍÍÍÍÍ<ÍÍÍÍÍÍÍͼ º ³ VB2::vb2 ³ ^ º D::B ÄÄÄÄ> ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ <ÍÍÍÍÍÍÍÍÍÍÍ<ÍÍÍÍͺÍÍ<Íͼ ³ VB2 sub-obj ptr ³ ÍÍÍÍÍÍÍÍÍ>ÍÍÍÍÍÍÍͼ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ B::b ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The virtual base VB2 is reachable from D through only one level of virtual inheritance due to the base class C; therefore, VB2 isn't added by the compiler as a virtual base of D. The diagram shows the VB1 base pointer (see *****), which is added by the compiler. The following example shows a hidden displacement member: class B { Chapter 1 Page 8 B(); virtual void f(); int b; }; class X:virtual B { int x; }; class Y:X { Y(); virtual void f(); int y; }; class Z:Y { int z; }; An instance of Y has the following layout: ÚÄÄYÄÄ> ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ^ ³ B sub-obj ptr ÃÄÄÄÄ>ÄÄÄ¿ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ³ X::x ³ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ³ X/Y vtable ptr ³ v ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ³ Y::y ³ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ v ³ ³ ³ ÀÄÄBÄÄ> ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ <ÄÄÄÄÄÄÙ ³ B::b ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ B vtable ptr ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ An instance of Z has the following layout: Chapter 1 Page 9 ÚÄÄYÄÄ> ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ^ ³ B sub-obj ptr ÃÄÄÄÄ>ÄÄÄ¿ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ³ X::x ³ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ³ X/Y vtable ptr ³ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ v ³ ³ Y::y ³ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ³ Z::z ³ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ v ³ ³ ³ ÀÄÄBÄÄ> ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ <ÄÄÄÄÄÄÙ ³ B::b ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ B vtable ptr ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ As shown by the diagrams, the displacement between the B sub-object and the base of Y differs by 2 bytes, depending on whether the object is of type Y or Z. The displacement member will be set to -2 in the constructor Z::Z before X::X is called. The displacement member is reset to zero after the call to X::X has been completed. The virtual table thunk for the Y::f entry in the B part of X/Y's vtable will adjust the value of this by the current value of the displacement member, which is zero unless the current object is being constructed or destructed. Empty classes A class without any nonstatic data members is allocated 1 or 2 bytes (1, 2, or 4 bytes with the 32-bit compiler), depending on alignment options selected. Exception: if the class has virtual functions, the instance simply consists of the vtable pointer, and no padding is added. Addressing of class instances and this The address of a class instance is always the first byte allocated. For derived classes, the address is typically the first member of the 'root' class (the base-most base class). The size of the this pointer defaults to the default pointer size for the memory model in effect. Declaring the class itself as near or far overrides this default. A derived class inherits the size of this from the first base, and all of the following bases (if any) must use the same this size. Virtual table pointers When a vtable pointer is introduced in a class, it's inserted before any user- defined members of that class and after any base class sub-objects; a compiler option forces the vtable pointer member to be added after all user-defined Chapter 1 Page 10 members of the class, allowing many C++ structures with virtual function members to be easily shared with other languages, such as C. In the huge memory model, the vtable pointer is always far, while in all other memory models, the vtable pointer defaults to near. Declaring a class as huge or _export has the following consequences: overrides the default, making the vtable pointer far allocates the vtable either in the code segment or in a data segment specified by compiler options A vtable pointer in a derived class is shared with the first base if the base is nonvirtual and if it already contains a vtable pointer. Virtual tables A virtual table is a table of function pointers; near and far pointers can be arbitrarily mixed. No padding is added to align the far pointers. Virtual function calls, virtual thunks When a virtual function is called using the virtual mechanism, the value passed for this always points to the appropriate sub-object by the time execution arrives at the virtual function. When multiple inheritance is involved, any virtual functions inherited from a virtual base (or from a base that isn't the first base) and overridden in the derived class are dispatched through a virtual thunk. The pointer in the virtual table points to this thunk, which then adjusts the this value on the stack and jumps to the function body itself. A virtual thunk in a virtual base's vtable adds the value of the hidden 16-bit (32-bit for the 32-bit compiler) displacement member (described previously) to the this value under the following condition: a virtual function overrides a virtual in a virtual base class containing one or more user-defined constructors or a user-defined destructor. This technique ensures passage of the correct this value to the function regardless of the relative distance between the derived class and the virtual base. The relative distance might be different than the distance between the derived class and the 'pure' derived instance. Calling conventions for member functions The default calling convention for member functions is cdecl with user arguments pushed right-to-left, followed by the this pointer; the caller pops the arguments from the stack. With the pascal calling convention, any user arguments are pushed first from left to right, this is again pushed last, and the callee pops the arguments from the stack. A compiler option is available that passes this as the first argument to pascal member functions (for compatibility with other compilers and previous versions of Turbo C++ and Borland C++). Chapter 1 Page 11 Pointers to class members There are three general categories of member pointers: single inheritance (SI), multiple inheritance without virtual bases (MI), and the most general (VB). See the Borland C++ User's Guide for more information on how the compiler (or the user) chooses the effective representation for a specific member pointer type. Sometimes when a VB member pointer, which is capable of pointing to members of virtual bases, is cast to another member pointer type, the cast can't be carried out using inline code, and the compiler generates a call to an RTL helper function, which is discussed in more detail later. Note that for all categories of member pointers, a NULL pointer value has all the fields of the member pointer equal to zero. When testing a member pointer value for NULL, the compiler might test only some fields of the member pointer. Pointers to data members The following internal representation of pointers to data members describe two categories of pointers: unrestricted pointers, which can point to any member of any class, and pointers that can't point to members of virtual base classes: ______________________ SI/MI data member pointer _______________ size_t member_offset; ________ The SI/MI data member pointer is an offset within the class instance of the member being pointed to with one added to it, allowing zero to be used as a NULL pointer. ______________________ VB data member pointer _______________ size_t member_offset; size_t vbcptr_offset; ________ The VB data member pointer consists of two offsets; if vbcptr_offset is nonzero, the pointer points to a virtual base class member, and vbcptr_offset gives the offset of the virtual base class pointer within the object plus one. member_offset then specifies the offset within that virtual base class to the member being pointed to. When vbcptr_offset is zero, the pointer is treated just like the "SI/MI" data member pointer. Chapter 1 Page 12 Pointers to function members Pointers to member functions resemble pointers to data members, except they always contain a function pointer. If the function is nonvirtual, the function pointer either points to the member function or to a virtual call thunk that uses the virtual mechanism to transfer control to an appropriate virtual function. The compiler creates such thunks automatically. When calling through a pointer to a member function, the appropriate value must be passed to the function for the this parameter. When calling using the ->* operator, the value equals the object pointer, while when calling using the .* operator, the value equals the address of the object. The address must be adjusted based on additional fields in the member pointer: ______________________ SI function member pointer _______________ void (*func_addr)(); The SI function member pointer contains the address of the member function. The function pointer is appropriately typed, based on the member pointer type. ______________________ MI function member pointer _______________ void (*func_addr)(); size_t member_offset; ________ The MI function member pointer adjusts the this value passed to the member function by member_offset - 1. ______________________ VB function member pointer _______________ void (*func_addr)(); size_t member_offset; size_t vbcptr_offset; ________ The VB function member pointer adjusts the this value passed to the member function with an algorithm similar to the one that adjusts offsets for VB data member pointers. Chapter 1 Page 13 Static data members Static data members default to near in all memory models except the huge model; however, static data members of classes declared _export always default to far in all memory models. __export/__import classes Declaring a class __export causes all of its noninline member functions and static data members to be exported, it also makes the vtable pointer far, and allocates the virtual table for the class in the code segment; moreover, declaring a class __export or __import causes all of the static data members and member functions of the class to default to far. Passing classes by value When a function accepts a class with constructors argument, the actual argument value is copy-constructed onto its place on the stack, and the called routine calls the destructor for the argument if its class has a destructor. A compiler option causes the compiler to convert class with constructors arguments to reference to class arguments, and the compiler creates temporary storage at the calling site to hold the argument value (for compatibility with older versions of Turbo C++ and Borland C++). Initialization and finalization of nonlocal static objects The compiler initializes and finalizes nonlocal static objects in each compilation as required. The functions included for initialization and finalization are registered through the standard Turbo C++ and Borland C++ #pragma startup/exit mechanism. Conventions for constructors and destructors When the compiler passes a hidden parameter in addition to this, for example, when calling a constructor, the parameter is passed as if it were to the right of this and to the left of the first user argument if such an argument exists. Constructors The compiler passes a constructor the address of object memory to be constructed, or it passes a zero for this, in which case the constructor allocates the memory for the object through the operator new. If the allocation fails, the constructor immediately returns zero; in all other cases, the constructor returns the address of the object constructed. The compiler gives constructors for classes with any virtual bases (direct or indirect) an extra int parameter to indicate the following action: A zero means the constructor should construct all virtual base classes. (The class is known to be the most-derived class, and the location of all virtual bases within the object is known at compile time.) Chapter 1 Page 14 A nonzero means virtual bases have already been constructed by a derived class constructor. Destructors A destructor tests this for NULL before taking other action on an object. If this is NULL, the destructor immediately returns. All destructors are passed an extra int parameter that contains two bit flags: 0x01 When this bit is on, the destructor calls operator delete to deallocate the memory taken up by the object, then the destructor returns. 0x02 When this bit is on, all virtual bases are destroyed. This bit is only used for classes with virtual bases. RTL helper functions The run-time library supplies several helper functions to the compiler for allocating, deleting, and copying certain arrays of classes. The following functions, _vector_apply_ and _vector_applyv_, have "C" linkage. extern "C" void _vector_apply_ ( void far * dest, // address of destination array void far * src, // address of source array size_t size, // size of each object unsigned count, // number of objects unsigned mode, // type of function to call ... // operator=/copy-constructor address here ) extern "C" void _vector_applyv_ ( void far * dest, void far * src, size_t size, unsigned count, unsigned mode, ... ) _vector_apply_ and _vector_vapply_ assign or copy-construct class elements of the type array of class type. Since the operator= or the copy-constructor might be a near or far function, and take a near or far this value, mode is passed to determine how to cast this. A near pointer must be passed for near functions and a far pointer for far functions, and it's impossible to determine the argument type until runtime; consequently, varargs is used to resolve the problem. The compiler guarantees that source and destination are both near or both far. Chapter 1 Page 15 The version with the v suffix passes a second argument of zero for copy- constructors of classes with virtual bases. The following list shows the interpretation of the mode for _vector_apply_ and _vector_vapply_: far function 0x01 pascal call 0x02 far pointer 0x04 The following functions, _vector_new_ and _vector_vnew_, which return near pointers to void, have C++ linkage. They are used only in the tiny, small, and medium memory models: void near * _vector_new_ ( void near * ptr, // address of array (0 means allocate) size_t size, // size of each object unsigned count, // how many objects unsigned mode, // mode bits (see below) ... // constructor address passed here ); void near * _vector_vnew_ ( void near * ptr, size_t size, unsigned count, unsigned mode, ... ); The following functions, which return far pointers to void, exist in all memory models: void far * _vector_new_ ( void far * ptr, size_t size, unsigned long count, unsigned mode, ... ); void far * _vector_vnew_ ( void far * ptr, size_t size, unsigned long count, unsigned mode, ... ); Chapter 1 Page 16 The following list shows the interpretation of the mode for _vector_new and _vector_vnew: far function 0x01 pascal call 0x02 far pointer 0x04 store element count 0x10 huge array (array > 64K) 0x40 The _vector_new_ and _vector_vnew_ routines construct arrays of class type. If ptr is NULL, the routines allocate the space for the array. If mode has 0x10 set, allocated space includes a count field stored at the beginning. If mode has 0x40 set, the pointer returned must be adjusted to prevent a class from crossing the 64K boundary, and the address passed back is adjusted accordingly. Since the constructor for the class might be a near or a far function, and take a near or far this value, mode is passed to allow correct casting. A near pointer must be passed for near functions and a far pointer for far functions, and it's impossible to determine the argument type until runtime; consequently, varargs is used to resolve the problem. The far versions of _vector_new_ and _vector_vnew_ are used in the small data memory models for arrays of far classes, regardless of whether or not they're huge. The far and near versions of _vector_vnew pass a second argument, zero, to the constructor. These versions are used for classes with virtual bases. The following version of function _vector_delete_ is used only in the tiny, small, and medium memory models: void _vector_delete_ ( void near * ptr, // address of array size_t size, // size of each object unsigned count, // how many objects unsigned mode, // how to call ... // destructor address passed here ) The following version of function _vector_delete_ exists in all memory models: void _vector_delete_ ( void far * ptr, size_t size, unsigned long count, unsigned mode, ... ) The following list shows the interpretation of the mode for _vector_delete_: Chapter 1 Page 17 far function 0x01 pascal call 0x02 far pointer 0x04 deallocate 0x08 stored element count 0x10 huge array (array > 64K) 0x40 The _vector_delete_ routines destroy arrays of class type. If mode has 0x08 set, the routines deallocate the space for the array after destroying the elements. When mode has 0x18 set, causing deallocation to occur and count to be used, the count is retrieved from the count field stored in a 16-bit word just below the array. Since the destructor for the class might be a near or a far function, and take a near or far this value, mode is passed to allow correct casting. A near pointer must be passed for near functions and a far pointer for far functions, and it's impossible to determine the argument type until runtime; consequently, varargs is used to resolve the problem. The far version of _vector_delete is used in the small data memory models for arrays of far classes, regardless of whether or not they're huge. Name mangling There are four basic forms of encoded names in Borland C++: 1. @className@functionName$args This encoding denotes a member function functionName belonging to class className and having arguments args. Class names are encoded directly. The following example shows a className in an encoded name: @className@... The class name may be followed by a single digit; the digit value contains the following bits (these can be combined): 0x01 the class uses a far vtable 0x02 the class uses the -po calling convention 0x04 the class has an RTTI-compatible virtual table; this bit is only used when encoding the name of the virtual table for the class The digit is encoded as an ASCII representation of the bit mask value, with 1 subtracted (so that, for example, the class prefix for a class 'foo' that uses far vtables would be '@foo@0'). See the next section on the encoding of function names and argument types. 2. @functionName$args This form of encoding denotes a function functionName with arguments args. 3. @className@dataMember This form of encoding denotes a static data member dataMember belonging to class className. Names of classes and data members are encoded directly. The following example shows a member myMember in class myClass: @myClass@myMember 4. @className@ Chapter 1 Page 18 This name denotes a virtual table for a class className. As mentioned previously, class names are encoded directly. Encoding of nested and template classes The following form encodes a name of a class lexically nested within another class: @outer@inner@... A template instance class encodes the name of the template class, along with the actual template arguments, in the following way: %templateName$arg1$arg2 ..... $argn% Each actual argument starts with a letter, specifying the kind of argument it is: t type argument i nontype integral argument g nontype nonmember pointer argument m nontype member pointer argument The first letter is followed by the encoded type of the argument. For a type argument, this code also represents the argument's actual value. For other kinds of arguments, the type code is followed by $ and the argument value, encoded as an ASCII number or symbol name. An instance of template whose name is vector is encoded as shown in the following example: %vector$tl$ii$100% Encoding of function names The encoded functionName might denote either a function name, a function such as a function such as a constructor or destructor, an overloaded operator, or a type conversion. Ordinary functions Ordinary function names are encoded directly, as shown in the following examples: foo(int) --> @foo$qi sna::foo(void) --> @sna@foo$qv The string $qi denotes the integer argument of function foo; '$qv' denotes no arguments in sna::foo. Chapter 1 Page 19 Constructors, destructors, and overloaded operators_____________________________________________________________ The following information covers argument encoding in more detail. Constructors, destructors, and overloaded operators are encoded with a $b character sequence, followed by a character sequence from the following table: Character Meaning Sequence _____________________________________________________________________ ctr constructor dtr destructor add + adr & and & arow -> arwm ->* asg = call () cmp ~ coma , dec -- dele delete div / eql == geq >= gtr > inc ++ ind * land && lor || leq <= lsh << lss < mod % mul * neq != new new not ! or | rand &= rdiv /= rlsh <<= rmin -= rmod %= rmul *= ror |= rplu += rrsh >>= rsh >> rxor ^= sub - Chapter 1 Page 20 subs [] xor ^ nwa new [] dla delete [] ___________________________________________________________ The following examples show how arguments are encoded with character sequences, add, ctr, and dtr from the previous table: operator+(int) --> @$badd$qi plot::plot() --> @plot@$bctr$qv plot::~plot() --> @plot@$bdtr$qv The string $qv denotes no arguments in the plot constructor or destructor. Type conversions Encoding of type conversions is accomplished with the $o character sequence, followed by the distinguishing return type of the conversion as part of the function name. The return type follows the rules for argument encoding, explained later. The lack of arguments in a conversion is made explicit in the mangling by adding $qv to the end of the encoded string. Example: foo::operator int() --> @foo@$oi$qv foo::operator char *() --> @foo@$opzc$qv The i following $o in the first example denotes int; the pzc in the second example denotes a near pointer to an unsigned char. Encoding of arguments The number and conbinations of function arguments make argument encoding the most complex aspect of name mangling. Argument lists for functions begin with the characters $q. Type qualifiers are then encoded as shown in the following table: ________________________________________________________________________________ Character Meaning Sequence ______________________________________________________________________ up huge ur _seg u unsigned z signed x const w volatile __________________________________________________________ Encoding of built-in types follows that for applicable type qualifiers, in accordance with the following table: ________________________________________________________________________________ Character Meaning Sequence ______________________________________________________________________ Chapter 1 Page 21 v void c char s short i int l long f float d double g long double e ... _______________________________________________________________ Encoding of non-built-in types follows that for applicable type qualifiers, in accordance with the following table: ________________________________________________________________________________ Character Meaning Sequence ______________________________________________________________________ (an enumeration or class name) p near * r near & m far & n far * a array M member pointer (followed by class and base type) __________________ The appearance of one or more digits indicates that an enumeration or class name follows; the value of the digit(s) denotes the length of the name, as shown in the following examples: foo::myfunc(myClass near&) is mangled as @foo@myfunc$qr7myClass foo::myfunc(anotherClass near&) is mangled as @foo@myfunc$qr12anotherClass A character x or w may appear after p, r, m, or n to denote a constant or volatile type qualifier, respectively. The character q appearing after one of these characters denotes a function with arguments the follow in the encoded name, up to the appearance of a $ character, and finally a return type is encoded. The following example show how these encoding rules are applied: @foo@myfunc$qpxzc is mangled as foo::myfunc(const char near*) @func1$qxi is mangled as func1(const int) @foo@myfunc$qpqii$i is mangled as foo:myfunc(int (near*)(int,int)) Array types are encoded as a, followed by a dimension encoded as an ASCII decimal number and a $, and finally the element type, as shown in the following example. foo( int (*x)[20] ) is mangled as @foo$qpa20$i Encoded arguments are concatenated in the order of appearance in the function call. The character t followed by an ASCII character encodes the arguments when a number of identical nonbuiltin types are function arguments. The ASCII Chapter 1 Page 22 character, ranging from ASCII 31H - 39H and 61H - 7FH (1 to 9 and a onward), denotes which argument type to duplicate, as shown in the following example: @plot@func1$qdddiiilllpzctata is unmangled to plot::func1(double, double, double, int, int, int, long, long, long, char near*, char near*, char near*) The two duplicate ta character sequences at the end of the encoded name denote the tenth argument, encoded as pzc. Dynamically dispatchable virtual tables The DDVT table always precedes the 'regular' virtual table for the given class. The DDVT is located at negative offsets from the virtual table pointer. The following layout shows the format of the DDVT: void (far *fpt[count])(); unsigned idt[count]; unsigned count; void *basep; the regular virtual table starts here: void (*vtab[])(); The fpt and idt tables contain the addresses and IDs, respectively, of all DDVT functions introduced or overridden in the class. The count holds the number of entries in the tables. basep holds the address of the virtual table for the base class or zero if the class has no base; the size of the base class pointer is the same as the virtual table pointer for the class. The pointer is a far pointer for huge classes. For example, consider the following two classes: struct base { virtual f() = [11]; virtual g() = [22]; virtual h(); }; struct der:base { f(); virtual i() = [33]; h(); }; The following table is the DDVT/virtual table for class base: dd @base@f$qv ; addr of foo::f() dd @base@g$qv ; addr of foo::g() Chapter 1 Page 23 dw 11 ; ID for f() dw 22 ; ID for g() dw 2 ; 2 entries in DDVT dw 0 ; no base class base_vtable: dd @base@h$qv ; addr of base::h() The following table is the DDVT/virtual table for class der: dd @der@f$qv ; addr of der::f() dd @der@i$qv ; addr of der::i() dw 11 ; ID for f() dw 33 ; ID for i() dw 2 ; 2 entries in DDVT dw base_vtable ; base class vtable addr der_vtable: dd @der@h$qv ; addr of der::h() Chapter 1 Page 24 CHAPTER _________________________________________________ 2 Object file contents This chapter covers the comment records sent to the object file by Borland C++ version 4.0. Other Borland compilers may not emit all of the records described here. The comment records are actually Intel Object Module Format (OMF-86) Comment records with the following specifications: _________________________________________________ Value or Length Description ____________________________ 0x88 COMMENT record byte 2 bytes record length 0x00 A control byte (always zero) 1 byte Comment record class (see below) n bytes Data (depends on Comment record class) 1 byte Checksum _______________________________ For fields described in this document, strings are stored as Pascal-style strings with a leading length byte, which might be zero. A zero length byte indicates a null string. An index is an OMF-86 index field. That is, if the value is below 128, then the index is a byte field with the index value; otherwise, the field is two bytes. The first byte has the high bit set and the remaining bits are the seven high-order bits of the index. The second byte is the low-order 8 bits of the index. Type indices in the are the type indices defined for the .EXE file tables. Immediate indices 0 to 23 refer to scalar types. Type index 0 indicates an unknown type. Any type index higher than 23 indicates the index of a type record defined in the current file. Each type record contains its Chapter 2 Page 25 own index, since the output of type records isn't necessarily in index order. The official Intel-type index fields are always zero, because MS-Link uses them for special purposes. The order of comment records inside the object file is fairly flexible. Unless the description of a comment record specifies ordering requirements, the comment record might appear anywhere between the module header and module end records. The must appear immediately after the module header record and before any other type records. The compiler identification need not be for Turbo C++, nor is it absolutely necessary that a compiler id record appear at all. Turbo object file comment records This section dissects each comment record class. The memory location of the record class and its name appear on the left-hand side of the page, and a description of the record is located on the right-hand side of the page. 0x00 Compiler identification string string A descriptive name reflecting the name of the translator used to generate this object file. For instance "Turbo Assembler Version 2.0". 0xe0 External symbol type index index The type index of an external symbol. External symbols must be placed one symbol per EXTDEF record. This comment record supplies the type index of the external symbol located just previous to it in the object file. If the debug information version record 0xf9 appears in the object file, the following fields are represented: index The index of the source file that caused the record to be emitted. Chapter 2 Page 26 word The line number of the instruction in the source code line that caused the record to be emitted. The word is present only if the previous index is nonzero. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. 0xe1 Public symbol type index index The type index of a public symbol. Public symbols must be placed one symbol per PUBDEF record. This comment record supplies the same type index as the public symbol located just previous to it in object file. byte If the symbol is a function with a valid BP, the byte contains the third bit set to one (hex 0x8), and the upper four bits set to the number of words between the BP value and the return address. If the debug information version record (0xf9) appears in the object file, the following fields are present: index Index of source file that caused this record to be emitted. word The line number of the instruction in the source code line that caused the record to be emitted.The word is present only if the previous index equals nonzero. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. 0xe2 Structure member definition Typically, all the members of a single structure are written to a single member record. If the number of members is so great that the OMF record exceeds 8K, or if the OMF record exceeds 8K for some other reason, the members of a single structure might be spanned across multiple records. Only the last member of the structure has the terminating bit (the high bit of the first byte) set. No more than one structure can appear in a member definition record. If the debug version record 0xf9 doesn't appear in the object file, then one or more member definition records for a structure are written immediately before the type record for that structure; otherwise, the structure member Chapter 2 Page 27 definition records must appear after the type for the structure, and after all the types that the member definition records reference. * A consecutive set of member definition records. Each record consists of the following information: 1st byte: * 0x60 Static member * 0x50 Conversion * 0x48 Member function, which might be combined with the following bits: * 0x01 destructor * 0x02 constructor * 0x03 static member function * 0x04 virtual member function If none of the previous values are present, then the following interpretation of the byte applies: low six bits If the member is a bit field, this field represents the number of bits in the field; otherwise, the field is set to zero. seventh bit This bit is set to zero if next bit is a normal member or to one if the next bit is a New Offset record. high bit This bit is set to zero if there are more members in the current structure or to one if this is the last member in the structure. For normal members the following rule applies: string The member name. A zero byte is used for unnamed members. Since no explicit offset for each member is given, offsets are computed by counting the length of each member. When holes exist from bit fields not filling a byte or Chapter 2 Page 28 when word alignment is used, an unnamed member is emitted. Such a member is always a bit field member with the appropriate number of pad bits. Although the compiler currently behaves according to this description, it accepts nonbit field unnamed members. index The member type. For conversions, this index specifies the target type of the conversion, for example int for "operator int();". For New Offset members the following information applies: double word The new byte offset of the records that follow it. The double word allows variant records, since each variant portion can be started with a New Offset member. As a double word, this field is suitable for large structures. 0xe3 Type definition One type is defined in each type definition record. The format of the type record depends on the type identification (TID) byte. See TID values defined in the EXE debug table format beginning on page 51. TLINK defines a set of universal scalar types to save space in the object files. For integer range types, the type is stored with the maximum range for each type. If an index of less than twenty- four (decimal) appears in the object file, one of the pre-assigned types is indicated, and no type definition appears in the object file. The following list shows the set of and their assigned indices: _________________________________________________ Index Type ___________________________________ 1 void 2 signed char 4 signed short int 6 signed long int 8 unsigned char 10 unsigned short int 12 unsigned long int 14 float 15 double 16 long double 17 Pascal 6-byte real Chapter 2 Page 29 18 Pascal boolean 19 Pascal character type 21 8-byte signed range 22 8-byte unsigned range 23 10-byte value (tbyte) __________________ index The index of the type being defined. All types must have a valid index of twenty-four (decimal) or greater, and the indices must be unique within the object file. There's no requirement to write types in any particular order. All of the type indices for a given file form a contiguous block beginning at twenty- four and proceeding to the highest numbered index. Since some types occupy eight bytes and others sixteen bytes in the .EXE file, the TID values requiring sixteen bytes reserve their own type index as well as the next higher type index. string The type name, if any exists. For C, the type name is used only for structure, union, and enum tags. For Pascal, any type might be named. word The size in bytes of the type. TID byte This is the TID of the type being defined. These following list shows the : _________________________________________________ Name Value Description _______________ TID_VOID 0x00 Unknown. TID_LSTR 0x01 Basic literal string. TID_DSTR 0x02 Basic dynamic string. TID_PSTR 0x03 Pascal style string. TID_SCHAR 0x04 1 byte signed integer range. TID_SINT 0x05 2 byte signed integer range. TID_SLONG 0x06 4 byte signed integer range. TID_SQUAD 0x07 8 byte signed integer. TID_UCHAR 0x08 1 byte unsigned integer range. TID_UINT 0x09 2 byte unsigned integer range. Chapter 2 Page 30 TID_ULONG 0x0A 4 byte unsigned integer range. TID_UQUAD 0x0B 8 byte unsigned integer. TID_PCHAR 0x0C Pascal character range (no arithmetic). TID_FLOAT 0x0D IEEE 32-bit real. TID_TPREAL 0x0E Turbo Pascal 6-byte real. TID_DOUBLE 0x0F IEEE 64-bit real. TID_LDOUBLE 0x10 IEEE 80-bit real. TID_BCD4 0x11 4 byte BCD. TID_BCD8 0x12 8 byte BCD. TID_BCD10 0x13 10 byte BCD. TID_BCDCOB 0x14 COBOL BCD. TID_NEAR 0x15 Near pointer. TID_FAR 0x16 Far pointer. TID_SEG 0x17 Segment pointer. TID_NEAR386 0x18 386 32-bit offset pointer. TID_FAR386 0x19 386 48-bit far pointer. TID_CARRAY 0x1A C array - 0 based. TID_VLARRAY 0x1B Very Large 0 based array. TID_PARRAY 0x1C Pascal array. TID_ADESC 0x1D Basic array descriptor. TID_STRUCT 0x1E Structure. TID_UNION 0x1F Union. TID_VLSTRUCT 0x20 Very Large Structure. TID_VLUNION 0x21 Very Large Union. TID_ENUM 0x22 Enumerated range. TID_FUNCTION 0x23 Function or procedure. TID_LABEL 0x24 Goto label. TID_SET 0x25 Pascal set. TID_TFILE 0x26 Pascal text file. TID_BFILE 0x27 Pascal binary file. TID_BOOL 0x28 Pascal boolean. TID_PENUM 0x29 Pascal enumerated range (no arithmetic). TID_PWORD 0x2A Pword TID_TBYTE 0x2B Tbyte TID_SPECIALFUNC 0x2D Member/Duplicate function TID_CLASS 0x2E C++ Class TID_HANDLEPTR 0x30 Handle based ptr TID_MEMBERPTR 0x33 Type pointed to by a class member pointer. TID_NREF 0x34 Near reference TID_FREF 0x35 Far reference TID_NEWMEMPTR 0x38 New stype member ptr ______ The format of the remainder of the type record depends on the TID byte as shown in the following table: Chapter 2 Page 31 Simple types TID_VOID TID_FLOAT TID_BCD8 TID_TFILE TID_LSTR TID_TPREAL TID_BCD10 TID_BOOL TID_DSTR TID_DOUBLE TID_ADESC TID_SCHAR TID_SQUAD TID_LDOUBLE TID_STRUCT TID_PWORD TID_UQUAD TID_BCD4 TID_UNION TID_TBYTE Pascal string type TID_PSTR byte The maximum size of the string. Labels TID_LABEL byte Zero if near, one if far. Integral range types TID_SCHAR TID_SLONG TID_UINT TID_PCHAR TID_SINT TID_UCHAR TID_ULONG The integral range types are a hierarchy of related types that form a tree. The root of the tree is the general type, which is stored explicitly as a range. The parent type is zero and the lower and upper bounds are the entire range of values storable in the size of memory indicated by the TID. The bound values are interpreted as signed or unsigned according to the TID. The Pascal character TID (TID_PCHAR) is stored as an unsigned character-sized range, except arithmetic isn't allowed on objects of Pascal character type. For all types, a 4-byte upper and lower bound value allows standard treatment of range checking. The sub-fields are stored as shown in the following list: * index The parent type index * double word The lower bound of the range Chapter 2 Page 32 * double word The upper bound of the range Cobol-style BCD TID_BCDCOB byte The position of the decimal point. The number of total digits is determined from the size, using 2 digits per byte, except for the last byte, which has one digit and a sign. The decimal position is the number of digits to the right of the decimal point. Pointer types TID_NEAR TID_SEG TID_FAR386 TID_FREF TID_FAR TID_NEAR386 TID_NREF All pointer types have an index field for the pointed-to type. All have an additional byte field following the pointed-to type field that consists of extra information as follows: TID_NEAR and TID_NEAR386 The segment base of the pointer: _________________________________________________ Value Segment register _________________________ 0x0 segment register unspecified. 0x1 ES relative 0x2 CS relative 0x3 SS relative 0x4 DS relative 0x5 FS relative 0x6 GS relative ______________________________ TID_FAR and TID_FAR386 0x0 far pointer arithmetic (no segment adjustments). 0x1 huge pointer arithmetic (segment adjustments to avoid offset wrap- around). Chapter 2 Page 33 TID_SEG 0x0 ignored TID_NREF 0x0 ignored TID_FREF 0x0 ignored Array types TID_CARRAY index The index of the element type. The dimension of the array is determined by dividing the size of the overall array by the size of each element. No padding is assumed between array elements. TID_VLARRAY word The upper 16 bits of the array size. This word is placed so that the normal type size field and this one can be considered a double word size. index The index of the element type. The dimension of the array is determined as it is for normal C arrays. TID_PARRAY index The element type. index The type of the dimension. The number of elements in the array and their indices are determined by the dimension type, which is normally some sort of integral or enum range. Chapter 2 Page 34 Very large structure types TID_VLSTRUCT and TID_VLUNION word The upper 16 bits of the size of the struct or union. This word is placed so that the upper 16 bits and the normal type size can be considered a double word size. Enumerated types TID_ENUM and TID_PENUM index The index of the parent type. word The lower bound of the range (considered a signed integer range). word The upper bound of the range (considered a signed integer range). If the debug information version record (0xf9) has appeared in the object file, then the following field is present: index Index to the first member of the enum. That is, this is an index to the structure member definition record that defines members to the enum. Function types TID_FUNCTION index The type index of the type returned. byte The language modifier byte: 0x0 Near C function 0x1 Near Pascal function 0x2 Unused. 0x3 Unused. 0x4 Far C function 0x5 Far Pascal function 0x6 Unused. 0x7 Interrupt function. Chapter 2 Page 35 byte This byte is set to one if the function accepts a variable number of arguments; otherwise, it is zero. Sets TID_SET index The parent type. Binary files TID_BFILE index The element type. Member/duplicate functions TID_SPECIALFUNC index The type index of the return type. byte Language modifier byte (same as regular functions). byte Bit 0 is set to indicate a member function; bit 1 is set to indicate a duplicate function; bit 2 set to indicate an operator function; bit 3 set to indicate internal linkage; bit 4 set to indicate this is a Pascal function passing 'this' as last parameter. index The type index of the class if the function is a member function. index Word offset in the virtual table if the function is a member function. name if the function is a nonlocal member function. this should appear as a local symbol in the second inner scope of a member function, not in the outermost (parameter) scope. Chapter 2 Page 36 C++ Class TID_CLASS index The class index for this class. Pointed-to members TID_MEMBERPTR index The type index of the pointed-to type. index The class index of the class whose members are pointed to. New style pointed-to members TID_NEWMEMBERPTR byte Member pointer flags. index The type index of the pointed-to type. index The class index of the class whose members are pointed to. 0xe4 Enum member definitiona Typically, all members of a single enum are written to a single member record. If the number of members is so great that the OMF record exceeds 8K, or if the OMF record exceeds 8K for some other reason, the members of a single enum might be spanned across multiple records. Only the last member of the enum has the terminating bit (high bit of first byte) set. No more than one enum can appear in a member definition record. If the debug version record 0xf9 has not appeared in the object file, then one or more member definition records for an enum are written immediately before the type record for that enum; otherwise, the enum member definition record must appear after the type for the enum. Chapter 2 Page 37 Each record in a consecutive set of member definition records consists of the following data: byte 0x80 for the last member of the enum, otherwise this byte is set to zero. string The member name. word The member value. 0xe5 Begin scope record Scopes are defined by a pair of begin-scope end- scope records. The relationships of nested scopes are specified by enclosing the begin/end records of one scope between the begin/end records of another. Local symbols are defined for a scope by having the locals definition records between the begin/end records of the scopes. index The segment index of the segment containing the scope. This segment must be the same as the segment of the starting address. word The offset, relative to the code segment, of the start of this scope. 0xe6 Locals definition record This record consists of a set of symbol definitions,all local to the innermost enclosing scope. The following list shows the contents for each symbol: string The symbol name. index The symbol type index. byte The symbol class byte. The remainder of the symbol depends on the value of the symbol class byte: SC_TYPEDEF (6) and SC_TAG (7) If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. Chapter 2 Page 38 SC_STATIC (0) index The group index of the symbol. * index The segment index of the segment containing the symbol. For an absolute symbol this must be an absolute segment. * word The offset relative to the given segment of the symbol. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. SC_ABSOLUTE (1) index The segment index of the segment containing the symbol. For an absolute symbol the index must be an absolute segment. * word The offset relative to the given segment of the symbol. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. SC_AUTO (2) and SC_PASVAR (3) word The signed offset, relative to BP, of the symbol. For Pascal variable parameter symbols, the location contains the address of the symbol. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. SC_REGISTER (4) byte A register id. Register ids map to registers as follows: 0x00 AX 0x01 CX 0x02 DX 0x03 BX 0x04 SP 0x05 BP 0x06 SI 0x07 DI 0x08 AL 0x09 CL 0x0A DL 0x0B BL 0x0C AH 0x0D CH 0x0E DH 0x0F BH 0x10 ES 0x11 CS 0x12 SS 0x13 DS 0x14 FS 0x15 GS 0x18 EAX 0x19 ECX 0x1A EDX 0x1B EBX 0x1C ESP 0x1D EBP 0x1E ESI 0x1F EDI If the register ID value is greater than 0x28, the field then specifies an offset (minus 0x28) into the optimized symbols table which is the live range information for this variable. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. Chapter 2 Page 39 SC_CONST (5) dword The 32-bit constant value. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. SC_OPT (8) index The number of entries for this local. Each entry represents a different location for the local for a different set of code offsets; hence, a single SC_OPT sub-record represents a complete list of optimized symbol records for the debugger. The following section describes the format of the entries: * word Starting offset of the live range of the variable. The offset is relative to the offset of the outermost enclosing scope. * word Ending offset of the live range of the variable. The offset is relative to the offset of the outermost enclosing scope. * byte One of SC_AUTO, SC_PASVAR or SC_REGISTER. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. SC_AUTO and SC_PASVAR * word The signed offset, relative BP, of the symbol. For Pascal variable parameter symbols, the location contains the address of the symbol. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. SC_REGISTER * byte The register id. SC_OPT is complex to be able to handle the difficulties encountered when a variable lives in a register, is spilled to the stack, and then is moved to a register again. This complexity does not exist in Borland C++ Version 4.0, because split live ranges are not implemented; however this specification was written with the intent of covering all contingencies, such as the compiler getting smarter with live ranges. Chapter 2 Page 40 If the debug information version record 0xf9 appears in the object file, the following fields are present: * index Index of source file that defined this record to be emitted. * word Line number of source code that caused this record to be emitted. This word is present only if the previous index is nonzero. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. 0xe7 End of scope word The offset relative to the code segment of the end of the scope. 0xe8 Select source file This comment is placed before any line numbers for a particular file. It's not needed if line numbers aren't generated before the next source file is encountered. index The source-file index of the new source file. If no further data exists in this record, then this index refers to an existing source file specified in a Select Source File record; otherwise, it is followed by the source file name and time stamp. string The source file name, relative to the current path. dword The DOS date and time stamp for the file. 0xe9 Dependency file definition This comment is included for each distinct source and include file in the object module. The records should be placed near the top of the object file, since a MAKE utility must scan the file for dependency records. The first dependency record must precede any noncomment record other than the THEADR record. dword The DOS date and time stamp for the file. string The name of the source file. The string opens the file. For Turbo C, if an found in a -I directory, the directory name is prepended to the filename, Chapter 2 Page 41 allowing the MAKE utility to check dependencies by simply retrieving the file time stamp without searching through a path. If the record has zero length, then there are no more dependency records in the object file. 0xea Compile parameters record 1st byte The source language for this object file. If an assembler source contains debugging information, the language is the one specified in the source, not assembly language. The following language types are defined: 0 - unspecified 1 - C 2 - Pascal 3 - Basic 4 - Assembly 5 - C++ 2nd byte 1 bit This bit is one if underbars were prepended to C language source symbols, otherwise, it's zero. 3 bits These bits specify the and, therefore, the default pointer sizes for this source: 0 - Tiny 1 - Small 2 - Medium 3 - Compact 4 - Large 5 - Huge 6 - 80386 Small 7 - 80386 Medium 8 - 80386 Compact 9 - 80386 Large Code pointers are near in the Tiny, Small, and Compact models, and far otherwise. Data pointers are near in the Tiny, Small and Medium Models, and far otherwise. The 80386 models are analogous to the corresponding 8086 models: A near has a 32-bit offset, and a far 80386 pointer is a 48-bit pointer. Chapter 2 Page 42 0xeb External symbol matched type index The following fields are repeated as many times as necessary to fit in the record. * string The symbol name itself. * index The type index of the symbol. If the debug information version record 0xf9 appears in the object file, the following fields are present: * index Index of the source file that caused this record to be emitted. * word Line number of source code line that caused the record to be emitted. This word is present only if the previous index is nonzero. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. 0xec Public symbol matched type index The following fields are repeated as many times as necessary to fit in the record. * string The name of the public symbol. * index The type index for the symbol. * byte This byte contains the same information as the valid BP byte previously defined. If the debug information version record 0xf9 appears in the object file, the following fields are present: * index Index of source file that caused this record to be emitted. * word Line number of source code line that caused this record to be emitted. This word is Chapter 2 Page 43 present only if the previous index is nonzero. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. 0xed Class definition This record describes classes. The class definition records have the following format: * byte 0 = class description Class descriptions Class descriptions have the following format: * index Class index for the class. TID_CLASS and TID_MEMBERPTR type records refers to this index. * word Offset (in bytes) of the . If the debug information version record 0xf9 appears in the object file, the following field is present: * index Index to the first member of the structure; that is, an index to the structure member definition record that defines members to this class. byte Info bits: bit 0: Class declared as 'struct' bit 1: 'huge' class (far vtable pointer) bit 2: 'far' class (far 'this' pointer) bit 3: 'far' class that uses 'near' vbase pointers bit 4: a union * index The number of parent indices that follow. * word(s) Indices of parent classes (repeated); the highest bit is set for virtual base classes. Chapter 2 Page 44 Note If a class definition appears between begin-scope and end-scope records, it is interpreted as a locally defined class. 0xee Coverage offset record To aid in profiling, the compiler emits offsets to delimit the start and end of basic blocks. The offsets, if taken pairwise, define the beginning and end of a basic block. The offsets are relative to the specified logical segment defined in the object file. * index The segment index of the segment, corresponding to the offsets that follow. * array of words Each word corresponds to an offset. The length of the array is dictated by the length of the OMF record. 0xf5 Begin large scope record Scopes are defined by a pair of begin-scope, end-scope records. The relationships of nested scopes are specified by enclosing the begin/end records of one scope between the begin/end records of another. Local symbols are defined for a scope by having the locals definition records between the begin/end records of the scopes. * index The segment index of the segment containing the scope. This segment must be the same as the segment of the starting address. * double word The large offset, relative to the code segment, of the start of this scope. Chapter 2 Page 45 0xf6 Large offset locals definition record This record consists of a set of symbol definitions, all local to the innermost enclosing scope. The following list shows the contents for each symbol: * string The symbol name. * index The symbol type index. * byte The symbol class byte. The remainder of the symbol depends on the value of the symbol class byte: SC_STATIC (0) index The group index of the segment containing the symbol. * index The segment index of the segment containing the symbol. * double word The large offset relative to the given segment of the symbol. SC_ABSOLUTE (1) index The segment index of the segment containing the symbol. For an absolute symbol the segment must be absolute. * double word The large offset relative to the given segment of the symbol. Chapter 2 Page 46 SC_AUTO (2) and SC_PASVAR (3) double word The signed large offset, relative to BP, of the symbol. For Pascal variable parameter symbols, the location contains the address of the symbol. 0xf7 Large end of scope double word The large offset relative to the code segment of the end of the scope. 0xf8 Member function This record has to be located immediately after the outermost begincope record for every member function. It contains one field: * string Mangled name of member function. 0xf9 Debug Information Version This record immediately follows the compiler identification comment record. It specifies the major and minor version numbers of the debug information present in this file. If the major version of the debug information is higher than the major version that the linker understands, then all debug information is ignored. The minor version is ignored by the linker and is only used for diagnostic tools such as TDUMP. Borland C++ Version 4.0 emits version 4.01 debugging information. The record contains two fields: * Major byte Major version of the debug information. * Minor byte Minor version of the debug information. 0xfa Module optimization flags Thi record presents the module optimization flags previously described. The compiler is responsible for emitting flags, which the linker passes unchanged to the debug information in the .EXE file. * dword Optimization flags. Chapter 2 Page 47 The following flags are currently defined: #define MO_globalCSEs 0x0001 #define MO_localCSEs 0x0002 #define MO_inductVars 0x0004 #define MO_codeMotion 0x0008 #define MO_regAlloc 0x0010 #define MO_loadOptim 0x0020 #define MO_loopOpt 0x0040 #define MO_intrinsics 0x0080 #define MO_deadStorElim 0x0100 #define MO_copyProp 0x0200 #define MO_jumpOpt 0x0400 #define MO_speed_size 0x0800 #define MO_noAliasing 0x1000 .OBJ extensions for 32 bits The .OBJ spec was originally designed for the 16-bit world. Fortunately, its designers only allotted even numbered record types to the standard 16-bit records. The following extension uses the odd numbered record types to represent the 32-bit equivalents where needed. SEGD32 (99h) Size field is 32 bits. LEDA32 (A1h) Offset field is 32 bits. LIDA32 (A3h) Offset field is 32 bits, iteration count fields are 32 bits. PUBD32 (91h) Offset field is 32 bits. MODE32 (8Bh) Starting offset field is 32 bits. LINN32 (95h) Offset is 32 bits. FIXU32 (9Dh) Offset and displacement are 32 bits. In the SEGDEF and SEGD32 records, the ACBP byte is redefined as follows: Bit 0 (formerly InPage) now means USE32 when set. The align types are extended to include DWORD alignment after PAGE alignment. This specification can be extended to include other record types, as needed. The 16-bit equivalent of any record can be used until one or more fields exceed the 16-bit size limitation. TASM uses such a minimalist approach in generating records to save space. Chapter 2 Page 48 VIRDEF Records The following modified record is provided for the linker to support unique instantiation of virtual tables, "out of line inlines" and various thunks the compiler generates. The mechanism is called "" for and it is similar to an initializable COMDEF. It begins with a change to the . A is identical to a COMDEF record with the exception that the "segment type" must be a number in the range 1..0x5F (instead of the 0x61 and 0x62 far and near COMDEF types); it is to be interpreted as a segment index, and may refer to any SegDef in the current module, with the meaning that the VIRDEF is to be appended to that segment IF it is instantiated; the record format is like that for a near COMDEF, with a single length count. The VIRDEF defines both a Public name and an External Index in the same way as a COMDEF does. VIRDEFs cannot be resolved onto a Public or a COMDEF of the same name: any attempt to mix will be a link time error. All VIRDEFs of the same name will be taken to be identical. When all sources files have been read and the linker has decided which modules are to be kept and which modules are to be discarded it scans the list of instances of each VIRDEF. It ignores instances which are in discarded modules, and selects the instance which is the first of the largest instances (or the first if all are equal in size). That instance is updated as the actual public symbol. Its segment is chosen (in the case where the VIRDEFs do not all attach to the same segment) and its module is noted. Only the LEDATA records from that module will be used, the others will be ignored. VIRDEFs may be attached to either data or code segments. If a uniform choice of segment is not made and the code generated to reference the VIRDEF cannot reach the target then it generates fixup overflows in the usual way: it is not an error to have a single name of VIRDEF with Chapter 2 Page 49 different segments unless it results in overflows. A COMDEF may be seen as a "special case" of a VIRDEF, one which is attached to either BSS or an invented FAR segment, and which is never initialized with LEDATA. When a reference is made to a VIRDEF from other object file records, the index that refers to the VIRDEF will be greater than 0x4000. To use the index, subtract 0x4000, and use it as a normal index. These changes will not be compatible with Microsoft's LINK but only occur in C++ code. Chapter 2 Page 50 CHAPTER _________________________________________________ 3 Symbol table format TLINK's debugging output is written at the end of the load image in the .EXE file. An image that does not include extra information beyond the image size has no debug information. If extra data is written beyond the load image, check the first word for the number 0x52fb. The debug information begins with a header describing the sizes of the remaining tables. This header is defined as follows: struct debug_header { unsigned short magic_number; /* To be sure who we are */ unsigned short version_id; /* In case we change things */ unsigned long names; /* Names pool size in bytes */ unsigned long names_count; /* Number of names in pool */ unsigned long types_count; /* Number of type entries */ unsigned long members_count; /* Structure members table */ unsigned long symbols_count; /* Number of symbols */ unsigned long globals_count; /* Number of global symbols */ unsigned long modules_count; /* Number of modules (units)*/