Page MenuHomePhabricator

[MC][ARM] make Thumb function also if type attribute is set
Needs ReviewPublic

Authored by falstaff84 on Feb 20 2020, 1:23 PM.

Details

Summary

Make sure to set the bottom bit of the symbol even when the type
attribute of a label is set after the label.

This fixes:
https://bugs.llvm.org/show_bug.cgi?id=44860

Diff Detail

Event Timeline

falstaff84 created this revision.Feb 20 2020, 1:23 PM
llvm/lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp
590

If it's overriding a virtual method, please mark it override.

597

This cast<MCSymbolElf looks redundant.

MaskRay added inline comments.
llvm/lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp
590

I am very sure this is emitSymbolAttribute (and the reason I got blamed)

I think this is close, but there are a couple of test failures with ninja check-llvm

llvm/lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp
591

Yes, to both of the above. This will need to be

bool EmitSymbolAttribute(MCSymbol *S, MCSymbolAttr Attribute) override
593

Sadly I don't think that this is going to work as IsThumb is the state at the point emitSymbolAttribute(S, Attribute) is called, but not necessarily the state at the declaration of S. With this change I see a couple of test failures:

Failing Tests (2):
    LLVM :: MC/ARM/mixed-arm-thumb-bl-fixup.ll
    LLVM :: MC/ARM/thumb2-beq-fixup.s

An extract from one points out the problem:

  .code  16
  .thumb_func
thumb_caller:
  beq.w internal_arm_fn
  beq.w global_arm_fn
  beq.w global_thumb_fn
  beq.w internal_thumb_fn

  .type  internal_arm_fn,%function
  .code  32
internal_arm_fn:
  bx  lr

At the point .type internal_arm_fn, %function is called IsThumb is true, emitSymbolAttribute will call setIsThumbFunc which is then cached. When internal_arm_fn is defined the state is Arm but the cache of ThumbFuncs remains.

I think this particular case might be fixed by only calling setIsThumbFunc when S is a definition. When internal_arm_fn is defined IsThumb will be false so we'll be ok.

It won't fix something like:

.thumb
corner_case_thumb:
 bx lr
.arm
some_arm_function:
.type corner_case_thumb, %function

I think that this is much less likely in practice, and would require the ArmBackend to track IsThumb for each symbol whether it was a function or not.

falstaff84 marked 2 inline comments as done.Feb 23 2020, 4:14 PM
falstaff84 added inline comments.
llvm/lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp
590

Not sure I understand, there is no definition of emitSymbolAttribute anywhere.

593

Hm, I see, I guess the only alternative is to actually store the Thumb information for every label. I was wondering whether we could change the semantics of ThumbFuncs in MCAssembler to contain all "potential" functions. My idea was to setIsThumbFunc whenever a label is emitted in a Thumb section, and then check where isThumbFunc is called whether that symbol was an actual function. But this seems not entirely trivial since isThumbFunc() is used in several places, and also in location where the ELF STT types are not available... So I gave up on this approach.

Your suggestion seems to pass all tests with the exception of the corner case you mentioned. However, it really seems to me that if such code exists, then the .type should be moved, also for readability :-)

  • Check if Symbol has previously been defined
  • Remove unnecessary cast
  • Mark function as override

I think you'll need to rebase and change to emitSymbolAttribute. I also think we could do with a comment. Other that that I think this is looking good. Adding some more reviewers to see if there are any objections, or alternative suggestions. Personally I'm inclined to accept as we get the following case wrong in the linux kernel (https://github.com/torvalds/linux/blob/master/arch/arm/kernel/entry-common.S) which has an idiom:

label:
   ...
   ENDPROC(label)

Where the ENDPROC has the .type label, %function

I think that the only case where we'd have got the type right and this change would break is:

.arm
label:
  ...
.thumb
.type label, %function

I think that changing state mid-function and setting the type afterwards is sufficiently unlikely that it is worth the change. The alternative is to track the state for every label with emitLabel, then lookup that state when .type is encountered. Even then this isn't ideal as we could have:

label:
.thumb
...
.type label, %function

Which would imply tracking state per address.

llvm/lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp
590

Can we add a comment explaining why we are overriding (extending) EmitSymbolAttribute? Something like:

// If a label is defined before the .type directive sets the label's type then the label can't be recorded as thumb function when the label is defined. We override emitSymbolAttribute() which is called as part of the parsing of .type so that if the symbol has already been defined we
can record the label as Thumb. FIXME: there is a corner case where the state is changed in between the label definition and the .type
directive, this is not expected to occur in practice and handling it would require the backend to track IsThumb for every label.
591

Apologies I typed the above wrong. it should be:

bool emitSymbolAttribute(MCSymbol *S, MCSymbolAttr Attribute) override

I think that this changed relatively recently. I think that if you rebased on top of master then you'd need to change this for LLVM to build.

I did not realize implicit .thumb_func was a thing.

I'm worried we're going to fix this edge case, and introduce some other weird edge case. I would prefer that we infer the marking based on whether the first instruction in the function is Thumb, as opposed to the state of the assembler when some label/directive is written. And if first instruction isn't known to be ARM or Thumb, we should error. (We can figure this out the same way we generate mapping symbols.)

I think using .type symbol, %function is more common when a file has to assemble in both Arm or Thumb state depending on whether -marm or -mthumb is given to the assembler.

Some experiments with GNU as imply that it does:

  • .thumb_func always results in a symbol with type STT_FUNC and bit 0 set to 1, regardless of the state at the point.
  • .type sym, %function before definition of sym, results in a symbol with type STT_FUNC and bit 0 set if the state at the point of the definition of sym is Thumb, 1 if the state is Arm at the point
  • .type sym, %function after the definition of sym will use the state of the first instruction to determine, initial data, such as .word is skipped over.
  • If a symbol is set to Thumb ahead of definition, a .type sym, %function after the definition will not correct it.

    For the case where .type comes after the definition. I think that using mapping symbols could work well, although I don't think that they could be used directly as we only record the last mapping symbol and we need the first mapping symbol after the definition. I think that if the pending labels have been flushed then a reverse search up the Symbols defined in MCAssembler looking for the closest mapping symbol in the same fragment as the definition would work.

I think an error message could be a bit more complicated to implement, especially if the type is set at the point of definition as in effect we'd need to catch the next mapping symbol and check with the definition, there is more than one code-path to track down. Maybe a follow up patch.

Test code for GNU as.

  .syntax unified

  .section .text.1, "ax", %progbits
  .global sym      
  .thumb_func
sym:
  .arm
  bx lr      

  .section .text.2, "ax", %progbits        
  .thumb
  .global sym2
  .type sym2, %function
sym2:
  .arm
  bx lr

  .section .text.3, "ax", %progbits        
  .thumb
  .global sym3
sym3:
  bx lr
  .arm
  bx lr
  .type sym3, %function      

  .section .text.4, "ax", %progbits        
  .arm
  .global sym4
sym4:
  bx lr
  .thumb
  bx lr
  .type sym4, %function      
        
 .section .text.5, "ax", %progbits
 .thumb
sym5:
 bx lr
 .pushsection .text
 .arm
sym6:
 bx lr
 .type sym5, %function   
 .type sym6, %function
 .popsection .text
        
 .section .text.7, "ax", %progbits
 .arm
 .type sym7, %function       
sym7:
 .thumb
 bx lr

 .thumb
 .section .text.8, "ax", %progbits
 .thumb_func
sym8:
 .arm
 bx lr
 .type sym8, %function       

 .thumb
 .section .text.9, "ax", %progbits
 .type sym9, %function        
sym9:
 .arm
 bx lr
 .type sym9, %function

 .section .text.10, "ax", %progbits
 .arm
 .type sym10, %function
 .thumb       
sym10:
 .arm
 bx lr
 .type sym10, %function

 .section .text.11, "ax", %progbits
 .thumb

sym11:
 .word 0
 bx lr
 .type sym11, %function

Gives:

16: 00000001     0 FUNC    LOCAL  DEFAULT    8 sym5
18: 00000000     0 FUNC    LOCAL  DEFAULT    1 sym6
21: 00000000     0 FUNC    LOCAL  DEFAULT    9 sym7
24: 00000001     0 FUNC    LOCAL  DEFAULT   10 sym8
27: 00000001     0 FUNC    LOCAL  DEFAULT   11 sym9
30: 00000001     0 FUNC    LOCAL  DEFAULT   12 sym10
33: 00000001     0 FUNC    LOCAL  DEFAULT   13 sym11
37: 00000001     0 FUNC    GLOBAL DEFAULT    4 sym
38: 00000001     0 FUNC    GLOBAL DEFAULT    5 sym2
39: 00000001     0 FUNC    GLOBAL DEFAULT    6 sym3
40: 00000000     0 FUNC    GLOBAL DEFAULT    7 sym4
MaskRay added a comment.EditedFeb 25 2020, 10:00 AM

The rules seem complex and error-prone. Are these any cases we should emit a warning? Will be nice to communicate that with binutils people as well.

On the Linux kernel side, it'd be nice to make a change anyway if that moves us away from an edge case.

The rules seem complex and error-prone. Are these any cases we should emit a warning? Will be nice to communicate that with binutils people as well.

Strictly speaking I'd say that the only explicit contradiction is where .thumb_func is used and yet the state is actually ARM. When .type symbol, %function is used, the user is implicitly relying on the Assembler to set the type of the symbol appropriately. If the assembler knows enough to warn then it knows enough to set the type of the symbol correctly. This requires a bit more tracking of state than the current rather arbitrary set of heuristics that both GNU as and MC are using. Actual practical usage of state changes are small as most files tend to be entirely Arm or entirely Thumb so the simple heuristics work for 99% of the cases. In practice I suspect that using a similar set of heuristics in both GNU and LLVM will aid portability, this would mean warning if the assembler detected that the ARM/Thumb state of the symbol disagreed with the ARM/Thumb state of the mapping symbols.

falstaff84 marked 2 inline comments as done and an inline comment as not done.Feb 25 2020, 3:12 PM

[...]
On the Linux kernel side, it'd be nice to make a change anyway if that moves us away from an edge case.

I tried changing Linux first, but it is not entirely trivial as there are different cases (where only the ENDPROC macro is used and hence would require more invasive changes). Also I think the Linux case is quite straight forward, as it is always defined within the label and at a point where the assembler is still in Thumb mode. Hence my current approach using the mode when the type directive is found works...

[...]

For the case where .type comes after the definition. I think that using mapping symbols could work well, although I don't think that they could be used directly as we only record the last mapping symbol and we need the first mapping symbol after the definition. I think that if the pending labels have been flushed then a reverse search up the Symbols defined in MCAssembler looking for the closest mapping symbol in the same fragment as the definition would work.

From the comments I guess an approach using mapping symbol is requested? While I understand conceptually what we want I am a bit lost how to implement this exactly. From what I understand we generate mapping symbols of some kind already? I tried to find them in code but wasn't really successful... Can you maybe give me a more concrete pointer?

The mapping symbols $a, $t and $d are emitted whenever there is a state change. See the EmitMappingSymbol() functions in ARMELFStreamer.cpp . As an example:

.thumb
label:
bx lr
.arm
bx lr
.type label, %function

would result in an object file with:

label:
$t
bx lr
$a
bx lr

The last mapping symbol emitted is cached in LastEMSInfo, and the last mapping symbol for a section is cached in LastMappingSymbols for when ChangeSection() is called. IIUC this doesn't help us very much as LastEMSInfo will only give us the same information as IsThumb. For the example above at the point that the .type label, %function directive were encountered we'd have LastEMSInfo pointing to the $a and IsThumb = false. What we need to do is find the closest mapping symbol at or after label, the "$t" in this case. I can think of a couple of approaches:
1.) Set up a map from MCSymbol to EmitMappingSymbolInfo that records the first "$a" or "$t" mapping symbol seen after each Symbol, if we then encounter a .type symbol, %function we can look up the state.
2.) All symbols, including mapping symbols will be stored in MCAssembler::Symbols. These are accessible via symbol_begin(), symbol_end() and symbols(). In theory we could walk the symbols in reverse on encountering the .type directive, stopping when we encounter the label we are setting the .type for. There are some details that will need to be resolved; symbols, including mapping symbols that are pending, will need to be flushed. I'm making the assumption that symbols are added to MCAssembler::Symbols in source order, but this will need to be checked.

Apologies for all the details, there may be a simpler way of doing this.

[..]
Apologies for all the details, there may be a simpler way of doing this.

No worries, thanks a lot for the pointer that definitely will help. Will try to tackle this but will probably take some days.