This is an archive of the discontinued LLVM Phabricator instance.

Dont emit Mapping symbols for sections that contain only data.
ClosedPublic

Authored by shankare on Mar 7 2017, 4:26 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

shankare created this revision.Mar 7 2017, 4:26 PM

The ARM ELF ABI says that every section which contains data and code must have a mapping symbol, but sections containing only data may omit it. Keying this off the "allocatable" attribute seems wrong.

When I checked with GCC (Using tests) the behavior seems to be non allocatable sections do not have mapping symbols.

Why would you want mapping symbols for non allocatable sections anyways ? Isnt the mapping symbols meant only for proper disassembly ?

What would be an alternative to key off ? The next option is keying off section names, but this would make it odd.

Why would you want mapping symbols for non allocatable sections anyways ? Isnt the mapping symbols meant only for proper disassembly ?

Depends entirely on usage (embedded developers in particular do really weird things and I could see them creating a non-allocatable section that they read in later for giggles). In principle I think it's reasonable that any section which contains mixed data and code needs mapping symbols. There's no other way to unambiguously know what you're looking at.

What would be an alternative to key off ? The next option is keying off section names, but this would make it odd.

You'd keep track of whether a section contained only data and decide at the end whether you need to emit mapping symbols based on that. This actually appears to be what GCC does, if you try a .s file like this:

@ No mapping symbols: only contains data
.section .debug_info,"",%progbits
.word 42

@ Mixed stuff, so mapping symbols emitted
.section .debug_lines,"",%progbits
mov r0, #3
.word 42

Ah ok. Thanks for the info. If the input is 'C', the extent to which a non allocatable section that can contain data and code mixed is really a corner case, unless the author does inline assembly ?

From your description, It looks like the section that is being written needs to have a state what kind of contents does it contain.

Is there a record of that information that can be queried about a section in the current infrastructure ?

It looks like gcc doesnot purely go by whether section has data or not. It also checks for section permissions.

For example :-

$ cat b.s
        .section        .foobar,"ax",%progbits
         .word 32

$ readelf -s b.o

Symbol table '.symtab' contains 7 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000     0 SECTION LOCAL  DEFAULT    1
     2: 00000000     0 SECTION LOCAL  DEFAULT    2
     3: 00000000     0 SECTION LOCAL  DEFAULT    3
     4: 00000000     0 SECTION LOCAL  DEFAULT    4
     5: 00000000     0 NOTYPE  LOCAL  DEFAULT    4 $d
     6: 00000000     0 SECTION LOCAL  DEFAULT    5

Thanks for the review.

Ah ok. Thanks for the info. If the input is 'C', the extent to which a non allocatable section that can contain data and code mixed is really a corner case, unless the author does inline assembly ?

Yep, and weird inline assembly at that. But we're writing an assembler not just a compiler so we should deal with it properly.

I'm pretty strongly against a hack like using "allocate" here: we're already in conformance with the ABI so the only purpose could be working around broken tools elsewhere and any improvement we make would be QoI. So we should actually implement it properly if we're going to.

Is there a record of that information that can be queried about a section in the current infrastructure ?

The information is only known at the end of the section (after the symbols have been emitted in the current scheme). Unfortunately I haven't come up with a good way of delaying it yet, I'll see if I can think a bit more about it.

shankare updated this revision to Diff 91017.Mar 8 2017, 7:40 AM
shankare edited the summary of this revision. (Show Details)

Updated with review comments from Tim.

t.p.northover added inline comments.Mar 8 2017, 8:02 AM
lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp
479 ↗(On Diff #91017)

Now this is looking a lot more promising! But I don't see the need for any exceptions to an EMS_Data default. That seems to be exactly what the ABI says should happen.

shankare added inline comments.Mar 8 2017, 8:33 AM
lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp
479 ↗(On Diff #91017)

EmitDataMappingSymbol relies on the value of LastEMS to emit a $d or not. If the stream had any instructions, LastEMS points to what the stream would be (EMS_ARM or EMS_Thumb).

I'm afraid I've spotted a bug in the scheme. If a section starts with data and then gets an instruction there won't be an initial $d:

.section .debug_info
.word 42
ldr r0, [r0]

We want $d at 0 and $a/$t at 4, but I think defaulting to EMS_Data skips the $d.

lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp
479 ↗(On Diff #91017)

Yep, I get how it works. But why does an SHF_EXECINSTR section deserve special handling here?

As soon as an instruction is emitted the correct mapping symbol will be put in whether LastEMS was EMS_None or EMS_Data. And if for some strange reason no instruction is emitted, well, that still falls under the ABI permission not to emit mapping symbols.

I see what you are saying.

The reason I put the SHF_EXECINSTR was to make the below test to work.

.section        .foobar,"ax",%progbits
 .word 32

Though, The testcase that you have mentioned is a genuine test and GCC supports it!.

We can support that only if we can set the state of the section, that the section is going to contain instructions. This unfortunately doesnot exist in the infrastructure and GCC supports it!!.

Please let me know if you find a way of setting an initial state of the section prior to emitting an instruction. This could be the only thing that can fix the issue that you raised IMO.

I was discussing with Colin L and found a way to handle your case. Will post a patch soon.

shankare updated this revision to Diff 91115.Mar 8 2017, 5:24 PM
shankare retitled this revision from Dont emit Mapping symbols for Non allocatable sections. to Dont emit Mapping symbols for sections that contain only data..

Update with comments from Tim and address the issue that he mentioned.

Sorry about the delay, I got overtaken by events.

It's probably a good idea to make the same change to AArch64, if only to avoid divergence between the two backends.

lib/MC/MCObjectStreamer.cpp
174 ↗(On Diff #91115)

Why doesn't this use F?

lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp
535–536 ↗(On Diff #91115)

I'm not convinced splitting these two functions up makes much sense. They interact in really weird ways based on the LastEMS properties.

I think it would be clearer to have a single function that delays if LastEMS is EMS_None.

void EmitDataMappingSymbol() {
  if (LastEMS == EMS_Data)
    return;
  else if (LastEMS == EMS_None) {
    // This is a tentative symbol, it won't really be emitted until it's actually needed.
    ElfMappingSymbolInfo *EMS = LastEMSInfo.get();
    EMS->Loc = SMLoc();
    EMS->F = getCurrentFragment();
    auto *DF = dyn_cast_or_null<MCDataFragment>(EMS->F);
    if (DF)
      EMS->Offset = DF->getContents().size();
    LastEMS = EMS_Data;
    return;
  }
  EmitMappingSymbol("$d");
  LastEMS = EMS_Data;
}
540 ↗(On Diff #91115)

FlushPendingMappingSymbol instead?

630–631 ↗(On Diff #91115)

I still don't see why we need this UseCodeAlign special case, I'm afraid.

701 ↗(On Diff #91115)

I think you'd just as well put the ElfMappingSymbol into the ElfMappingSymbolInfo at this point, and make it clear that it's the whole mapping-symbol state for a given section.

shankare added inline comments.Mar 14 2017, 2:31 PM
lib/MC/MCObjectStreamer.cpp
174 ↗(On Diff #91115)

This is a good catch. Thanks!

lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp
535–536 ↗(On Diff #91115)

Sure.

540 ↗(On Diff #91115)

Sure, I will do that.

630–631 ↗(On Diff #91115)

The only time that we want to emit a Data symbol is if the section was set to be executable but it contains data.

701 ↗(On Diff #91115)

Ok.

Does AARCH64 use Mapping symbols anytime ? I havent seen GCC doing that.

Does AARCH64 use Mapping symbols anytime ? I havent seen GCC doing that.

Yep, $x and $d (see http://infocenter.arm.com/help/topic/com.arm.doc.ihi0056b/IHI0056B_aaelf64.pdf).

lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp
630–631 ↗(On Diff #91115)

Why? The ABI says nothing about an executable bit:

"A section must have a mapping symbol defined at the beginning of the section; however, if the section contains only data then the mapping symbol may be omitted."
shankare added inline comments.Mar 14 2017, 2:48 PM
lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp
630–631 ↗(On Diff #91115)

If the section is marked as containing instructions(on elf using the section flag, UseCodeAlign checks for section permission) and it contains data, dont we need a $d to specifiy beginning of data.

GCC does it as the below test shows :-

The section foobar has a section flag below to specify that it contains instructions.

$ cat b.s

.section        .foobar,"ax",%progbits
 .word 32

$ readelf -s b.o

Symbol table '.symtab' contains 7 entries:

Num:    Value  Size Type    Bind   Vis      Ndx Name
  0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
  1: 00000000     0 SECTION LOCAL  DEFAULT    1
  2: 00000000     0 SECTION LOCAL  DEFAULT    2
  3: 00000000     0 SECTION LOCAL  DEFAULT    3
  4: 00000000     0 SECTION LOCAL  DEFAULT    4
  5: 00000000     0 NOTYPE  LOCAL  DEFAULT    4 $d
  6: 00000000     0 SECTION LOCAL  DEFAULT    5
t.p.northover added inline comments.Mar 15 2017, 9:58 AM
lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp
630–631 ↗(On Diff #91115)

Not according to how I read the ABI. It's quite possible GNU as adds unneeded symbols in that case, but I don't think "because GCC does it" is reason enough for us to add this complexity when we've got an actual document telling us what we should be doing.

shankare added inline comments.Mar 21 2017, 10:18 AM
lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp
630–631 ↗(On Diff #91115)

How would objdump or anytool be able to find out that an executable section is having data ? Without adding a mapping symbol I believe it would be hard for objdump to figure out that information.

shankare added inline comments.Mar 21 2017, 10:28 AM
lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp
630–631 ↗(On Diff #91115)

Sure, We can decide if we would want to mimic that behavior probably later then.

shankare updated this revision to Diff 92692.Mar 22 2017, 12:57 PM
shankare edited the summary of this revision. (Show Details)

Address comments from Tim. I will make the AArch64 changes as a separate commit.

shankare updated this revision to Diff 92698.Mar 22 2017, 1:20 PM
t.p.northover accepted this revision.Mar 27 2017, 2:37 PM

Thanks for all the updates Shankar. I think this looks reasonable now.

This revision is now accepted and ready to land.Mar 27 2017, 2:37 PM

Thanks Tim for the detailed review.

This revision was automatically updated to reflect the committed changes.
shankare reopened this revision.Mar 28 2017, 9:21 AM

The changes are causing a problem as tests in lld have been written with the assumption that objdump will be able to dump data inside of a text section. I am hence uploading a new patch that checks for section permissions as well. Please let me know how you want to go about it ?

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1108

.text
  .word patatino(target1)
  patatino:

for llvm-objdump to figure out there is data being referenced in a 'Code' section, the permissions would need to be checked.

This revision is now accepted and ready to land.Mar 28 2017, 9:21 AM
shankare updated this revision to Diff 93253.Mar 28 2017, 9:57 AM

Check for section permissions to account for test cases that rely on the behavior.

Check this bot out, http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1108, seems like the test is broken by this revision.

Ah, ok, it was already reverted. r298932

peter.smith edited edge metadata.Mar 29 2017, 6:36 AM

I think the lld test in question is expecting a $d.0 mapping symbol because up to now this is what llvm-objdump has been outputting. It is not strictly necessary for the test which is checking the behaviour of the R_ARM_TARGET1 relocation in response to the --target1-rel flag. It just so happens that .text has been used for the test for what looks like convenience. Strictly speaking we would never see a R_ARM_TARGET1 relocation in an executable section (Should only be present in SHT_INIT_ARRAY or SHT_FINI_ARRAY) sections so I don't see any problem with fixing the lld test to match the new llvm-objdump output.

Thanks Peter, I will revise the lld test and recommit the change.

shankare updated this revision to Diff 93688.Mar 31 2017, 11:46 AM

Updated patch with reverting the previous change.

This revision was automatically updated to reflect the committed changes.