Dont emit mapping symbols for sections that contain only data.
Details
- Reviewers
rengolin t.p.northover kparzysz weimingz peter.smith - Commits
- rG74a7fa059484: Reland r298901 with modifications (reverted in r298932)
rG320848458b5d: Dont emit Mapping symbols for sections that contain only data.
rL299392: Reland r298901 with modifications (reverted in r298932)
rL298901: Dont emit Mapping symbols for sections that contain only data.
Diff Detail
Event Timeline
The ARM ELF ABI says that every section which contains data and code must have a mapping symbol, but sections containing only data may omit it. Keying this off the "allocatable" attribute seems wrong.
When I checked with GCC (Using tests) the behavior seems to be non allocatable sections do not have mapping symbols.
Why would you want mapping symbols for non allocatable sections anyways ? Isnt the mapping symbols meant only for proper disassembly ?
What would be an alternative to key off ? The next option is keying off section names, but this would make it odd.
Why would you want mapping symbols for non allocatable sections anyways ? Isnt the mapping symbols meant only for proper disassembly ?
Depends entirely on usage (embedded developers in particular do really weird things and I could see them creating a non-allocatable section that they read in later for giggles). In principle I think it's reasonable that any section which contains mixed data and code needs mapping symbols. There's no other way to unambiguously know what you're looking at.
What would be an alternative to key off ? The next option is keying off section names, but this would make it odd.
You'd keep track of whether a section contained only data and decide at the end whether you need to emit mapping symbols based on that. This actually appears to be what GCC does, if you try a .s file like this:
@ No mapping symbols: only contains data .section .debug_info,"",%progbits .word 42 @ Mixed stuff, so mapping symbols emitted .section .debug_lines,"",%progbits mov r0, #3 .word 42
Ah ok. Thanks for the info. If the input is 'C', the extent to which a non allocatable section that can contain data and code mixed is really a corner case, unless the author does inline assembly ?
From your description, It looks like the section that is being written needs to have a state what kind of contents does it contain.
Is there a record of that information that can be queried about a section in the current infrastructure ?
It looks like gcc doesnot purely go by whether section has data or not. It also checks for section permissions.
For example :-
$ cat b.s .section .foobar,"ax",%progbits .word 32 $ readelf -s b.o Symbol table '.symtab' contains 7 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00000000 0 SECTION LOCAL DEFAULT 1 2: 00000000 0 SECTION LOCAL DEFAULT 2 3: 00000000 0 SECTION LOCAL DEFAULT 3 4: 00000000 0 SECTION LOCAL DEFAULT 4 5: 00000000 0 NOTYPE LOCAL DEFAULT 4 $d 6: 00000000 0 SECTION LOCAL DEFAULT 5
Thanks for the review.
Ah ok. Thanks for the info. If the input is 'C', the extent to which a non allocatable section that can contain data and code mixed is really a corner case, unless the author does inline assembly ?
Yep, and weird inline assembly at that. But we're writing an assembler not just a compiler so we should deal with it properly.
I'm pretty strongly against a hack like using "allocate" here: we're already in conformance with the ABI so the only purpose could be working around broken tools elsewhere and any improvement we make would be QoI. So we should actually implement it properly if we're going to.
Is there a record of that information that can be queried about a section in the current infrastructure ?
The information is only known at the end of the section (after the symbols have been emitted in the current scheme). Unfortunately I haven't come up with a good way of delaying it yet, I'll see if I can think a bit more about it.
lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp | ||
---|---|---|
479 | Now this is looking a lot more promising! But I don't see the need for any exceptions to an EMS_Data default. That seems to be exactly what the ABI says should happen. |
lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp | ||
---|---|---|
479 | EmitDataMappingSymbol relies on the value of LastEMS to emit a $d or not. If the stream had any instructions, LastEMS points to what the stream would be (EMS_ARM or EMS_Thumb). |
I'm afraid I've spotted a bug in the scheme. If a section starts with data and then gets an instruction there won't be an initial $d:
.section .debug_info .word 42 ldr r0, [r0]
We want $d at 0 and $a/$t at 4, but I think defaulting to EMS_Data skips the $d.
lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp | ||
---|---|---|
479 | Yep, I get how it works. But why does an SHF_EXECINSTR section deserve special handling here? As soon as an instruction is emitted the correct mapping symbol will be put in whether LastEMS was EMS_None or EMS_Data. And if for some strange reason no instruction is emitted, well, that still falls under the ABI permission not to emit mapping symbols. |
I see what you are saying.
The reason I put the SHF_EXECINSTR was to make the below test to work.
.section .foobar,"ax",%progbits .word 32
Though, The testcase that you have mentioned is a genuine test and GCC supports it!.
We can support that only if we can set the state of the section, that the section is going to contain instructions. This unfortunately doesnot exist in the infrastructure and GCC supports it!!.
Please let me know if you find a way of setting an initial state of the section prior to emitting an instruction. This could be the only thing that can fix the issue that you raised IMO.
I was discussing with Colin L and found a way to handle your case. Will post a patch soon.
Sorry about the delay, I got overtaken by events.
It's probably a good idea to make the same change to AArch64, if only to avoid divergence between the two backends.
lib/MC/MCObjectStreamer.cpp | ||
---|---|---|
174 | Why doesn't this use F? | |
lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp | ||
532–533 | I'm not convinced splitting these two functions up makes much sense. They interact in really weird ways based on the LastEMS properties. I think it would be clearer to have a single function that delays if LastEMS is EMS_None. void EmitDataMappingSymbol() { if (LastEMS == EMS_Data) return; else if (LastEMS == EMS_None) { // This is a tentative symbol, it won't really be emitted until it's actually needed. ElfMappingSymbolInfo *EMS = LastEMSInfo.get(); EMS->Loc = SMLoc(); EMS->F = getCurrentFragment(); auto *DF = dyn_cast_or_null<MCDataFragment>(EMS->F); if (DF) EMS->Offset = DF->getContents().size(); LastEMS = EMS_Data; return; } EmitMappingSymbol("$d"); LastEMS = EMS_Data; } | |
536 | FlushPendingMappingSymbol instead? | |
621–622 | I still don't see why we need this UseCodeAlign special case, I'm afraid. | |
683 | I think you'd just as well put the ElfMappingSymbol into the ElfMappingSymbolInfo at this point, and make it clear that it's the whole mapping-symbol state for a given section. |
Does AARCH64 use Mapping symbols anytime ? I havent seen GCC doing that.
Yep, $x and $d (see http://infocenter.arm.com/help/topic/com.arm.doc.ihi0056b/IHI0056B_aaelf64.pdf).
lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp | ||
---|---|---|
621–622 | Why? The ABI says nothing about an executable bit: "A section must have a mapping symbol defined at the beginning of the section; however, if the section contains only data then the mapping symbol may be omitted." |
lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp | ||
---|---|---|
621–622 | If the section is marked as containing instructions(on elf using the section flag, UseCodeAlign checks for section permission) and it contains data, dont we need a $d to specifiy beginning of data. GCC does it as the below test shows :- The section foobar has a section flag below to specify that it contains instructions. $ cat b.s .section .foobar,"ax",%progbits .word 32 $ readelf -s b.o Symbol table '.symtab' contains 7 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00000000 0 SECTION LOCAL DEFAULT 1 2: 00000000 0 SECTION LOCAL DEFAULT 2 3: 00000000 0 SECTION LOCAL DEFAULT 3 4: 00000000 0 SECTION LOCAL DEFAULT 4 5: 00000000 0 NOTYPE LOCAL DEFAULT 4 $d 6: 00000000 0 SECTION LOCAL DEFAULT 5 |
lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp | ||
---|---|---|
621–622 | Not according to how I read the ABI. It's quite possible GNU as adds unneeded symbols in that case, but I don't think "because GCC does it" is reason enough for us to add this complexity when we've got an actual document telling us what we should be doing. |
lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp | ||
---|---|---|
621–622 | How would objdump or anytool be able to find out that an executable section is having data ? Without adding a mapping symbol I believe it would be hard for objdump to figure out that information. |
lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp | ||
---|---|---|
621–622 | Sure, We can decide if we would want to mimic that behavior probably later then. |
The changes are causing a problem as tests in lld have been written with the assumption that objdump will be able to dump data inside of a text section. I am hence uploading a new patch that checks for section permissions as well. Please let me know how you want to go about it ?
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1108
.text .word patatino(target1) patatino:
for llvm-objdump to figure out there is data being referenced in a 'Code' section, the permissions would need to be checked.
Check this bot out, http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1108, seems like the test is broken by this revision.
I think the lld test in question is expecting a $d.0 mapping symbol because up to now this is what llvm-objdump has been outputting. It is not strictly necessary for the test which is checking the behaviour of the R_ARM_TARGET1 relocation in response to the --target1-rel flag. It just so happens that .text has been used for the test for what looks like convenience. Strictly speaking we would never see a R_ARM_TARGET1 relocation in an executable section (Should only be present in SHT_INIT_ARRAY or SHT_FINI_ARRAY) sections so I don't see any problem with fixing the lld test to match the new llvm-objdump output.
Why doesn't this use F?