Storing Build Time Information In ELF Binary Files In a Format Suitable for Static Analysis A specification by Nick Clifton. Version 3.0. Dec 20, 2017. ChangeLog: For v3.0: Add end-of-range addresses to description field, to allow for gaps in the coverage. For v2.0: Added an "owner" string at the start of the name field so that naive processors of Elf Notes are less likely to complain. ---------------------------------------------------------------------- * The information is stored in a new section in the file using the ELF NOTE format. This format is already well defined, and supported by binary tools. Creator tools (compilers, assemblers) place the notes into the binary files. Consumer tools (readelf, built-by) read the notes and answer questions about the binaries concerned. Processing tools (linker, objcopy) combine and merge the notes as needed. Note - this specification could have used the gnu_attributes format instead, but this has a few drawbacks: - Support for section and symbol tagged gnu_attributes is not currently implemented anywhere, and these features are needed. Adding support for them to the binutils would mean that the specification is not backwards compatible with older tools. * The gnu attributes specification requires that when merging attributes from multiple input files the tool performing the merge must create new, section-relative attributes where conflicts occur. Similarly conflicts between section-relative attributes must be resolved by the creation of new symbol-relative attributes. Adding support for this requirement would have again introduced a barrier to backwards compatiblity, and would also result in the attribute section being larger than an equivalent section using this proposal. * The information is stored in a new section called .gnu.build.attributes. (The name can be changed - it is basically irrelevant anyway, it is the new section flag (defined below) that matters). The section has the type SHT_NOTE. The section has a new flag set: SHF_GNU_BUILD_ATTRIBUTES (suggested value: 0x00100000). The sh_link and sh_info fields of this section should be set to 0. The sh_align field should be set to 4, even on 64-bit systems. This does mean that 8-byte values inside the notes might have to be relocated using unaligned relocs, and that they might have to be read and written using unaligned loads and stores. * The type field of a note is used to distinguish the range of memory over which an attribute applies. The name field identifies the attribute and gives it a value. The description field specifies the starting and ending addresses for where the attribute is applied. Two new note types are defined: NT_GNU_BUILD_ATTRIBUTE_OPEN (0x100) and NT_GNU_BUILD_ATTRIBUTE_FUNC (0x101). These are used by the description field (see below). The description field of the note is either 0-bytes long, or else a pair of 4-byte wide (for 32-bit targets) or 8-byte wide (for 64-bit targets) addresses which indicate the starting and ending location for the attribute. If the description field is empty, the note should be treated as if it applies to the same region as the nearest preceeding note of the same type (ie either OPEN or FUNC). In unrelocated files the addresses should instead be zero, with a relocation present to set the actual value once the file is linked. The numbers are stored in the same endian format as that specified in the EI_DATA field of the ELF header of the file containing the note. The size of the numbers is dictated by the EI_CLASS field of the ELF header. The name field identifies the type and value of the attribute. The name starts with the string "GA", which is an abbreviation for GNU Attribute. The abbreviation is used in order to save space. The string is there so that tools that do not know about these notes will still be able to parse the note structure. The character following the identifier string indicates the kind of attribute, based upon the following table: * - The attribute takes a numeric value. Numbers are stored in little endian binary format. $ - The attribute takes a string value. ! - The attribute takes a boolean value, and the value is false. + - The attribute takes a boolean value, and the value is true. The next character indicates the specific attribute: Character Allowed Meaning --------------- Types ------- 0 Reserved for future use. 1 $ Version of the specification supported and producer(s) of the notes (see below). 2 * Stack protector 3 !+ Relro 4 * Stack size 5 $ Build tool & version 6 $* ABI 7 * Position Independence Status: 0 => static, 1 => pic, 2 => PIC, 3 => pie 8 !+ Short enums 9..31 Reserved for future use. 32..126 $*!+ An annotation type not explicitly defined by this specification. 127+ Reserved for future use. For * and $ type attributes the value is then appended. Per the ELF note spec the name must end with a NUL byte. Some examples: GA*foo\0\001\0\002\0 Attribute 'foo' with numeric value 0x20001 GA*bar\00\0 Attribute 'bar' with numeric value 0 GA$fred\0hello\0 Attribute 'fred' with string value "hello" GA*4\377\377\0 Attribute stack size with numeric value 0xffff GA*21\0 Atrribute -fstack-protector. GA*24\0 Atrribute -fstack-protector-explicit. GA$\0013p5\0 Supports spec version 3, created by plugin version 5. GA$5gcc v7.0\0 Attribute build tool "gcc v7.0" Multiple notes for the same attribute can exist, providing that they have different values and that their description address ranges do not overlap. The exception to this rule is that NT_GNU_BUILD_ATTRIBUTE_FUNC attributes are allowed to overlap NT_GNU_BUILD_ATTRIBUTE_OPEN attributes. The first note should be a version note. The version note string should consist of an odd number of characters. The first character is the ascii code for the number of the version of this protocol supported by the notes. The next pair of characters indicate who produced the notes and which version of this producer has been used. A 'p' character indicates a compiler plugin. An 'l' character indicates the linker. An 'a' character indicates the assembler. Other characters may be defined in the future. Multiple producers can contribute to the notes. Their identifying pair of characters should be appended to the version note. The description field for the version note should not be empty. This note serves as the base address for other open notes that follow, allowing them to use an empty description field. * When the linker merges two or more files containing these notes it should ensure that the above rules are maintained. Simply concatenating the incoming note sections should ensure this. The linker can, if it wishes, create its own notes and append, or insert them into the note section. Eg to indicate that -z relro is enabled. The order of the notes from an incoming section must be preserved in the outgoing section. Notes do not have to be sorted by address range although this often happens automatically when sections are concatenated. If this is a final link, then relocations on the notes should of course be resolved. The linker, or another tool, may wish to eliminate redundant notes in the note section. When doing this the following rules must be observed: 0. [Optional] If relocations exist against the notes then they should not be merged. 1. Preserve the ordering of the notes. 2. Preserve any NT_GNU_BUILD_ATTRIBUTE_FUNC notes. 3. Eliminate any NT_GNU_BUILD_ATTRIBUTE_OPEN notes that have the same full name field as the immediately preceeding note with the same type of name and whoes address ranges coincide. 4. Combine the numeric value of any NT_GNU_BUILD_ATTRIBUTE_OPEN notes of type GNU_BUILD_ATTRIBUTE_STACK_SIZE. 5. If an NT_GNU_BUILD_ATTRIBUTE_OPEN note is going to be preserved and its description field is empty then the nearest preceeding OPEN note with a non-empty description field must also be preserved *OR* the description field of the note must be changed to contain the starting address to which it refers. * Note - this specification is intended for storing information for use by static tools. There is another, similar specification for storing information for use by run-time tools, specifically the dynamic loader. This specifcation also uses the ELF Note format, but it is intended to be a lot smaller, only storing information essential to the loader, and a lot faster to process. ToDo: Check if -grecord-gcc-switches preserved per-translation-unit info after linking. Handle -ffunction-sections.