Make sure to set the bottom bit of the symbol even when the type
attribute of a label is set after the label.
I think this is close, but there are a couple of test failures with ninja check-llvm
Yes, to both of the above. This will need to be
bool EmitSymbolAttribute(MCSymbol *S, MCSymbolAttr Attribute) override
Sadly I don't think that this is going to work as IsThumb is the state at the point emitSymbolAttribute(S, Attribute) is called, but not necessarily the state at the declaration of S. With this change I see a couple of test failures:
Failing Tests (2): LLVM :: MC/ARM/mixed-arm-thumb-bl-fixup.ll LLVM :: MC/ARM/thumb2-beq-fixup.s
An extract from one points out the problem:
.code 16 .thumb_func thumb_caller: beq.w internal_arm_fn beq.w global_arm_fn beq.w global_thumb_fn beq.w internal_thumb_fn .type internal_arm_fn,%function .code 32 internal_arm_fn: bx lr
At the point .type internal_arm_fn, %function is called IsThumb is true, emitSymbolAttribute will call setIsThumbFunc which is then cached. When internal_arm_fn is defined the state is Arm but the cache of ThumbFuncs remains.
I think this particular case might be fixed by only calling setIsThumbFunc when S is a definition. When internal_arm_fn is defined IsThumb will be false so we'll be ok.
It won't fix something like:
.thumb corner_case_thumb: bx lr .arm some_arm_function: .type corner_case_thumb, %function
I think that this is much less likely in practice, and would require the ArmBackend to track IsThumb for each symbol whether it was a function or not.
Not sure I understand, there is no definition of emitSymbolAttribute anywhere.
Hm, I see, I guess the only alternative is to actually store the Thumb information for every label. I was wondering whether we could change the semantics of ThumbFuncs in MCAssembler to contain all "potential" functions. My idea was to setIsThumbFunc whenever a label is emitted in a Thumb section, and then check where isThumbFunc is called whether that symbol was an actual function. But this seems not entirely trivial since isThumbFunc() is used in several places, and also in location where the ELF STT types are not available... So I gave up on this approach.
Your suggestion seems to pass all tests with the exception of the corner case you mentioned. However, it really seems to me that if such code exists, then the .type should be moved, also for readability :-)
I think you'll need to rebase and change to emitSymbolAttribute. I also think we could do with a comment. Other that that I think this is looking good. Adding some more reviewers to see if there are any objections, or alternative suggestions. Personally I'm inclined to accept as we get the following case wrong in the linux kernel (https://github.com/torvalds/linux/blob/master/arch/arm/kernel/entry-common.S) which has an idiom:
label: ... ENDPROC(label)
Where the ENDPROC has the .type label, %function
I think that the only case where we'd have got the type right and this change would break is:
.arm label: ... .thumb .type label, %function
I think that changing state mid-function and setting the type afterwards is sufficiently unlikely that it is worth the change. The alternative is to track the state for every label with emitLabel, then lookup that state when .type is encountered. Even then this isn't ideal as we could have:
label: .thumb ... .type label, %function
Which would imply tracking state per address.
Can we add a comment explaining why we are overriding (extending) EmitSymbolAttribute? Something like:
// If a label is defined before the .type directive sets the label's type then the label can't be recorded as thumb function when the label is defined. We override emitSymbolAttribute() which is called as part of the parsing of .type so that if the symbol has already been defined we can record the label as Thumb. FIXME: there is a corner case where the state is changed in between the label definition and the .type directive, this is not expected to occur in practice and handling it would require the backend to track IsThumb for every label.
Apologies I typed the above wrong. it should be:
bool emitSymbolAttribute(MCSymbol *S, MCSymbolAttr Attribute) override
I think that this changed relatively recently. I think that if you rebased on top of master then you'd need to change this for LLVM to build.
I did not realize implicit .thumb_func was a thing.
I'm worried we're going to fix this edge case, and introduce some other weird edge case. I would prefer that we infer the marking based on whether the first instruction in the function is Thumb, as opposed to the state of the assembler when some label/directive is written. And if first instruction isn't known to be ARM or Thumb, we should error. (We can figure this out the same way we generate mapping symbols.)
I think using .type symbol, %function is more common when a file has to assemble in both Arm or Thumb state depending on whether -marm or -mthumb is given to the assembler.
Some experiments with GNU as imply that it does:
I think an error message could be a bit more complicated to implement, especially if the type is set at the point of definition as in effect we'd need to catch the next mapping symbol and check with the definition, there is more than one code-path to track down. Maybe a follow up patch.
Test code for GNU as.
.syntax unified .section .text.1, "ax", %progbits .global sym .thumb_func sym: .arm bx lr .section .text.2, "ax", %progbits .thumb .global sym2 .type sym2, %function sym2: .arm bx lr .section .text.3, "ax", %progbits .thumb .global sym3 sym3: bx lr .arm bx lr .type sym3, %function .section .text.4, "ax", %progbits .arm .global sym4 sym4: bx lr .thumb bx lr .type sym4, %function .section .text.5, "ax", %progbits .thumb sym5: bx lr .pushsection .text .arm sym6: bx lr .type sym5, %function .type sym6, %function .popsection .text .section .text.7, "ax", %progbits .arm .type sym7, %function sym7: .thumb bx lr .thumb .section .text.8, "ax", %progbits .thumb_func sym8: .arm bx lr .type sym8, %function .thumb .section .text.9, "ax", %progbits .type sym9, %function sym9: .arm bx lr .type sym9, %function .section .text.10, "ax", %progbits .arm .type sym10, %function .thumb sym10: .arm bx lr .type sym10, %function .section .text.11, "ax", %progbits .thumb sym11: .word 0 bx lr .type sym11, %function
16: 00000001 0 FUNC LOCAL DEFAULT 8 sym5 18: 00000000 0 FUNC LOCAL DEFAULT 1 sym6 21: 00000000 0 FUNC LOCAL DEFAULT 9 sym7 24: 00000001 0 FUNC LOCAL DEFAULT 10 sym8 27: 00000001 0 FUNC LOCAL DEFAULT 11 sym9 30: 00000001 0 FUNC LOCAL DEFAULT 12 sym10 33: 00000001 0 FUNC LOCAL DEFAULT 13 sym11 37: 00000001 0 FUNC GLOBAL DEFAULT 4 sym 38: 00000001 0 FUNC GLOBAL DEFAULT 5 sym2 39: 00000001 0 FUNC GLOBAL DEFAULT 6 sym3 40: 00000000 0 FUNC GLOBAL DEFAULT 7 sym4
The rules seem complex and error-prone. Are these any cases we should emit a warning? Will be nice to communicate that with binutils people as well.
On the Linux kernel side, it'd be nice to make a change anyway if that moves us away from an edge case.
Strictly speaking I'd say that the only explicit contradiction is where .thumb_func is used and yet the state is actually ARM. When .type symbol, %function is used, the user is implicitly relying on the Assembler to set the type of the symbol appropriately. If the assembler knows enough to warn then it knows enough to set the type of the symbol correctly. This requires a bit more tracking of state than the current rather arbitrary set of heuristics that both GNU as and MC are using. Actual practical usage of state changes are small as most files tend to be entirely Arm or entirely Thumb so the simple heuristics work for 99% of the cases. In practice I suspect that using a similar set of heuristics in both GNU and LLVM will aid portability, this would mean warning if the assembler detected that the ARM/Thumb state of the symbol disagreed with the ARM/Thumb state of the mapping symbols.
I tried changing Linux first, but it is not entirely trivial as there are different cases (where only the ENDPROC macro is used and hence would require more invasive changes). Also I think the Linux case is quite straight forward, as it is always defined within the label and at a point where the assembler is still in Thumb mode. Hence my current approach using the mode when the type directive is found works...
From the comments I guess an approach using mapping symbol is requested? While I understand conceptually what we want I am a bit lost how to implement this exactly. From what I understand we generate mapping symbols of some kind already? I tried to find them in code but wasn't really successful... Can you maybe give me a more concrete pointer?
The mapping symbols $a, $t and $d are emitted whenever there is a state change. See the EmitMappingSymbol() functions in ARMELFStreamer.cpp . As an example:
.thumb label: bx lr .arm bx lr .type label, %function
would result in an object file with:
label: $t bx lr $a bx lr
The last mapping symbol emitted is cached in LastEMSInfo, and the last mapping symbol for a section is cached in LastMappingSymbols for when ChangeSection() is called. IIUC this doesn't help us very much as LastEMSInfo will only give us the same information as IsThumb. For the example above at the point that the .type label, %function directive were encountered we'd have LastEMSInfo pointing to the $a and IsThumb = false. What we need to do is find the closest mapping symbol at or after label, the "$t" in this case. I can think of a couple of approaches:
1.) Set up a map from MCSymbol to EmitMappingSymbolInfo that records the first "$a" or "$t" mapping symbol seen after each Symbol, if we then encounter a .type symbol, %function we can look up the state.
2.) All symbols, including mapping symbols will be stored in MCAssembler::Symbols. These are accessible via symbol_begin(), symbol_end() and symbols(). In theory we could walk the symbols in reverse on encountering the .type directive, stopping when we encounter the label we are setting the .type for. There are some details that will need to be resolved; symbols, including mapping symbols that are pending, will need to be flushed. I'm making the assumption that symbols are added to MCAssembler::Symbols in source order, but this will need to be checked.
Apologies for all the details, there may be a simpler way of doing this.