Currently lld sets a size of zero on all the __start_foo and foo_start
symbols. This is fine for most architectures except for CHERI. On CHERI
pointers are capabilities that not only contain an address but also bounds
and access permissions. An expression like _DYNAMIC[0].d_tag will fail at
runtime if we don't set the ELF symbol size. When taking the address of a
global we set the length of that capability from the ELF symbol size and
if it is zero the CPU will trap when we load 8 bytes from a capability with
length zero.
Details
- Reviewers
ruiu
Diff Detail
- Build Status
Buildable 11824 Build 11824: arc lint + arc unit
Event Timeline
Can you share a bit more about how CHERI uses the symbol size? Is it only for known linker generated Sections like in your example above, or to support arbitrary user mysection_start symbols? Is the mysection_start symbol special cased within CHERI or is it just an arbitrary ELF Symbol. I'm just a bit curious about the implications of setting the symbol size, for example if this is for special sections like .dynamic we would only expect the _start and _end, and _DYNAMIC symbols to be defined, in an arbitrary mysection_start there could be multiple other symbols defined as well and having the symbol size set to the section size would mean its [0, size) range would overlap all other symbols in the section, this could potentially confuse other ELF processing tools that might skip over functions/data based on Symbol Size. I'm also curious as to how the symbol size is used? Does the OS/loader require the static symbol table to be present, what if it has been stripped?
On the assumption that there is a good reason for CHERI to use the symbol size I have some suggestions for the code.
- Given that addOptionalRegular is only used for start and end symbols and only uses the (now two) special cases of 0 and -1, I think we should rename the function to make it more specific to the purpose it is being used for.
- Would it be better to use a similar trick for Size as Value, for example if Size is -1 and symbol is defined with respect to an OutputSection then Size is Size of OutputSection? You'd then not need to track the state of the section symbols and have a potentially fragile one off update.
We don't do any special casing based on symbol names, we only look at the information from the symbol table. I think having special cases based on the symbol name is very fragile so I'd rather set a size on all linker defined symbols that are used as dereferenceable pointers.
I'm just a bit curious about the implications of setting the symbol size, for example if this is for special sections like .dynamic we would only expect the _start and _end, and _DYNAMIC symbols to be defined, in an arbitrary mysection_start there could be multiple other symbols defined as well and having the symbol size set to the section size would mean its [0, size) range would overlap all other symbols in the section, this could potentially confuse other ELF processing tools that might skip over functions/data based on Symbol Size. I'm also curious as to how the symbol size is used? Does the OS/loader require the static symbol table to be present, what if it has been stripped?
I hadn't thought about the potential to confuse other ELF processing tools. However, wouldn't that also break on code like this?
.global foo_with_setup .ent foo_with_setup foo_with_setup: .... .... foo_fast: .... .... .end foo_with_setup jal foo_with_setup ... jal foo_fast
In CHERI each pointer (capability) is architecturally 256 bits (compressed to 128 in memory): 64 bit base address + 64 bit offset + 64 bit length and <64bits attributes (read data/store data/executable/load capability/store capability, etc.). During the initial startup we have to initialize all global capabilities using the information from the ELF symbols.
Currently we generate a special ELF section __cap_relocs containing a capability "relocations" for every global pointer. For historic reasons we don't use ELF relocations but an array of structs that is processed by crt_init_globals() in the initial startup code.
This separate section also has the advantage that we don't need any symbol tables at runtime but can just loop over this array containing addresses resolved at static link time.
We need sizes on every ELF symbol that can be used as a pointer in order to set the correct runtime-bounds on the resulting capability so that we can find all bounds violations.
Say we have the following code:
char x[20]; char* y = &x[20]; int foo(void) { char c = y[-1]; // okay c = y[0]; // trap at runtime return c; } __attribute__((weak)) extern Elf_Dyn* _DYNAMIC; int parse_dynamic(void) { Elf_Dyn* dynp = _DYNAMIC; for (; dynp->d_tag != DT_NULL; dynp++) { // ... } return 0; }
Currently we can generate a __cap_relocs section with the following contents (the load address will be added to these addresses before the globals are initialized):
CAPABILITY RELOCATION RECORDS: 0x0000000000020000 Base: x (0x0000000000040000) Offset: 0x0000000000000014 Length: 0x0000000000000014 Permissions: 0x00000000 0x0000000000030000 Base: y (0x0000000000020000) Offset: 0x0000000000000000 Length: 0x0000000000000010 Permissions: 0x00000000 0x0000000000030010 Base: _DYNAMIC (0x00000000000002b0) Offset: 0x0000000000000000 Length: 0x0000000000000000 Permissions: 0x00000000 SYMBOL TABLE: 0000000000000000 *UND* 00000000 00000000000002b0 .dynamic 00000000 .hidden _DYNAMIC 0000000000028010 .got 00000000 .hidden _gp 0000000000030000 l d .cap_table 00000000 .hidden .cap_table 0000000000030000 l .cap_table 00000010 .hidden y@CAPTABLE 0000000000030010 l .cap_table 00000010 .hidden _DYNAMIC@CAPTABLE 0000000000010000 g F .text 0000001c foo 0000000000010020 g F .text 00000024 parse_dynamic 0000000000040000 g .bss 00000014 x 0000000000020000 g .data 00000010 y
The problem here is because LLD says _DYNAMIC has size 0 we also create a capability at runtime that has a base of _DYNAMIC but with a length of zero so we will crash as soon as we load a single byte from this capability.
In order to not crash while iterating over _DYNAMIC we need the __cap_reloc for _DYNAMIC to be this instead:
CAPABILITY RELOCATION RECORDS: 0x0000000000030010 Base: _DYNAMIC (0x00000000000002b0) Offset: 0x0000000000000000 Length: 0x0000000000000120 Permissions: 0x00000000 SYMBOL TABLE: 00000000000002b0 .dynamic 00000120 .hidden _DYNAMIC
To generate those values we use Symbol->getSize<ELFT>() which currently returns zero for _DYNAMIC and __init_array_start, etc.
So far I have found this to be a problem with _DYNAMIC, the init+fini array start symbols as well as sections that have a c indentifier so code assumes it can simply dereference __start_foo.
On the assumption that there is a good reason for CHERI to use the symbol size I have some suggestions for the code.
- Given that addOptionalRegular is only used for start and end symbols and only uses the (now two) special cases of 0 and -1, I think we should rename the function to make it more specific to the purpose it is being used for.
- Would it be better to use a similar trick for Size as Value, for example if Size is -1 and symbol is defined with respect to an OutputSection then Size is Size of OutputSection? You'd then not need to track the state of the section symbols and have a potentially fragile one off update.
I like the idea of using Size == -1 as the output section size as this avoids the fragile state update. I will update the patch if you agree that it makes sense include this change in upstream LLD.
I hope my explanation made some sense. I know that compared to most architectures CHERI is rather exotic and therefore has some additional requirements. However, I feel like having sizes defined on symbols that are generally used as pointers would be a good patch to upstream.
I think I understood your motivation to make this change, but it feels like this is still too specific to your target. Traditionally, these start/end symbols are NOTYPE symbols with size zero, and they are used as markers where their corresponding sections start/end. So, my understanding is that start symbols just point to where sections begin, and they do not "cover" sections, as the name implies. So, this change doesn't feel semantically correct to me.
In addition to that, because these symbols are being used in the wild since (I believe) the early Unix era, there's a chance that compatibility issues would arise if you change the symbol values. It seems a bit too risky. It's probably an unavoidable consequence that if you choose ELF as a file format, you'll have start/end symbols with size zero.
I wonder if you have thought about doing this in your loader. Your loader could identify start/end symbols and fix their values, couldn't it?
Currently we generate a special ELF section __cap_relocs containing a capability "relocations" for every global pointer. For historic reasons we don't use ELF relocations but an array of structs that is processed by crt_init_globals() in the initial startup code.
This separate section also has the advantage that we don't need any symbol tables at runtime but can just loop over this array containing addresses resolved at static link time.
We need sizes on every ELF symbol that can be used as a pointer in order to set the correct runtime-bounds on the resulting capability so that we can find all bounds violations.
Thanks for the detailed answer. Does the static linker create the cap_relocs section, or is it some standalone tool? If it is the static linker itself, then presumably there could be alternative ways of obtaining the size for the capability that would not involve propagating the change into the symbol table. For example set a flag on the symbol definition so that when you are writing out the cap_relocs and encounter a flagged symbol with size 0, you can calculate it from the OutputSection. Not as elegant for the __cap_relocs section but it might be ok to upstream the flagging mechanism?
Originally we used a standalone tool that extracted the information from the symbol table because we had to use BFD for linking. Now we can use LLD (as long as we apply the multi-GOT patch D31528) for building all of FreeBSD (CheriBSD) so I have integrated the __cap_relocs generation into our fork of LLD. For some build we still use the standalone tool but we are now migrating towards using the integrated one for everything.
Setting a flag on the symbol so that we can calculate the real size when we fill in the __cap_relocs sections would be a great solution. @ruiu would adding such a flag to Symbol be acceptable?
We could do it in the load however we would then have to hardcode a list of all special symbol names there. Also this will not work for static linking because we don't have a loader there and need LLD to fill in that information.
I feel like having this information in Symbol but not adding it to the symbol table would be ideal for us.
We could do it in the load however we would then have to hardcode a list of all special symbol names there. Also this will not work for static linking because we don't have a loader there and need LLD to fill in that information.
I think hardcoding a list of all special symbol names is not necessarily bad because the number of such symbols is small. Also, that's what lld does after all.
You still have a loader even for statically-linked executables, no? You don't need a dynamic linker to load a statically-linked executable, but either it is in a kernel or not, you still need a loader to map an executable to memory.