This is an archive of the discontinued LLVM Phabricator instance.

[LLD][ELF][ARM] Do not substitute BL/BLX for non STT_FUNC symbols.
ClosedPublic

Authored by peter.smith on Jan 28 2020, 4:03 AM.

Details

Summary

D73474 disabled the generation of interworking thunks for branch relocations to non STT_FUNC symbols. This patch handles the case of BL and BLX instructions to non STT_FUNC symbols. LLD would normally look at the state of the caller and the callee and write a BL if the states are the same and a BLX if the states are different.

This patch disables BL/BLX substitution when the destination symbol does not have type STT_FUNC. This brings our behavior in line with GNU ld which may prevent difficult to diagnose runtime errors when switching to lld.

This is related to clang-built-linux https://github.com/ClangBuiltLinux/linux/issues/773 although in that case it looks like all the problem instances were caught by D73474.

Diff Detail

Event Timeline

peter.smith created this revision.Jan 28 2020, 4:03 AM
MaskRay accepted this revision.Jan 29 2020, 12:23 AM
MaskRay added inline comments.
lld/ELF/Arch/ARM.cpp
460

excess pair of ()

(The !rel.sym case is not covered by a test, but I think it does not matter.)

461

One simplification is interwork ? val & 1 : isBlx.

470–471

else can be dropped.

507

(The !rel.sym case is not covered by a test, but I think it does not matter.)

This revision is now accepted and ready to land.Jan 29 2020, 12:23 AM

Thanks for the suggestions, I've applied them prior to commit.

This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Feb 5 2020, 10:52 AM

Hello, this causes a bunch of chromium binaries to crash during startup: https://bugs.chromium.org/p/chromium/issues/detail?id=1047531

Looks like most of them contain assembly code. Do you have any idea from this fuzzy description what might be up? What's data that I can collect that's useful for debugging?

Hello, this causes a bunch of chromium binaries to crash during startup: https://bugs.chromium.org/p/chromium/issues/detail?id=1047531

Looks like most of them contain assembly code. Do you have any idea from this fuzzy description what might be up? What's data that I can collect that's useful for debugging?

The crash will be an illegal instruction because there was a state change from Arm to Thumb or vice-versa expected. This change makes LLD stricter when it comes to complying with the ABI for the Arm architecture. For debugging what we'd need to know is the assembly compliant with the ABI and I've got this patch wrong. Or is the assembly non-compliant and this increase in strictness triggered a problem.

The specific change is that the linker should only "interwork" which in this patch means changing a BL to a BLX when source and target are in different state for symbols with type STT_FUNC, and a BLX to bl if source and target are the same state.

To give an example:

        .syntax unified
        .global arm1
        .global arm2
        .global thumb1
        .global thumb2

        // uncomment these to get blx from Arm to Thumb and vice versa
        // .type arm1, %function
        // .type arm2, %function
        // .type thumb1, %function
        // .type thumb2, %function
        
        .section .text.01, "ax", %progbits
        .arm
arm1:
        bl arm2
        bl thumb1

        .section .text.02, "ax", %progbits
        .arm
arm2:   bx lr

        .section .text.03, "ax", %progbits
        .thumb
thumb1:
        bl arm1
        bl thumb2
        bx lr
        .section .text.04, "ax", %progbits
        .thumb
thumb2: 
        bx lr

In this example all the symbols have type STT_NOTYPE so the linker does not interwork and does not change bl to blx. When the .type lines are uncommented two of the bl instructions (thumb to arm, and arm to thumb) are changed to blx.

At least in this example ld.bfd and (with this patch) ld.lld agree.

What I'd suggest is:
The crash will be an illegal instruction, this instruction will be the target of a BL and is highly likely that in a non-stripped binary there will be a symbol at that address. What is the type of that symbol? If it is not STT_FUNC then the assembly code needs to add .type <symbol name>, %function.

I had hoped that as I was bringing LLD inline with the GNU linker there wouldn't be a problem in tightening up LLD to the specification. There is a chance that code that has only been linked with LLD, and unfortunately, it looks like ld.gold, will expect interworking for all symbol types. If this is going to be a giant legacy problem, then I think we may need a strict switch or potentially a warning.

We haven't used bfd.ld for chromium/android in many many years, and we haven't used gold in years. I don't know what the linker defaults are in the Android NDK, but I'd guess that they default to lld nowadays. (I haven't checked though.)

If this affects us, it likely affects others.

How would a warning for this work though?

I left some comments in https://bugs.chromium.org/p/chromium/issues/detail?id=1047531 Can non-Google users comment there?

We haven't used bfd.ld for chromium/android in many many years, and we haven't used gold in years. I don't know what the linker defaults are in the Android NDK, but I'd guess that they default to lld nowadays. (I haven't checked though.)

If this affects us, it likely affects others.

How would a warning for this work though?

This behavior change will require a new option. I hope there are only a few places needing updates so that we don't have to create a new option.

https://bugs.chromium.org/p/chromium/issues/detail?id=1047531#c19 anc #c20 suggest that no assembly is involved and that this instead related to jumps to attribute((weak)) functions.

Here's a reduced repro:

thakis@thakis:~/src/chrome/src$ cat test.cc
void foo(int) __attribute__((weak));

int main() {
  if (foo)
    foo(4);
}
thakis@thakis:~/src/chrome/src$ third_party/llvm-build/Release+Asserts/bin/clang --target=arm-linux-androideabi16 -fuse-ld=lld test.cc --sysroot=third_party/android_ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot   -fPIC -Bthird_party/android_ndk/toolchains/llvm/prebuilt/linux-x86_64 -mthumb -o a.out.good
ld.lld: warning: lld uses blx instruction, no object with architecture supporting feature detected
thakis@thakis:~/src/chrome/src$ ../../llvm-project/out/gn/bin/clang --target=arm-linux-androideabi16 -fuse-ld=lld test.cc --sysroot=third_party/android_ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot   -fPIC -Bthird_party/android_ndk/toolchains/llvm/prebuilt/linux-x86_64 -mthumb -o a.out.bad
ld.lld: warning: lld uses blx instruction, no object with architecture supporting feature detected
thakis@thakis:~/src/chrome/src$ diff -u <(objdump -d a.out.good) <(objdump -d a.out.bad)
--- /dev/fd/63	2020-02-05 15:46:00.567207286 -0500
+++ /dev/fd/62	2020-02-05 15:46:00.567207286 -0500
@@ -1,5 +1,5 @@
 
-a.out.good:     file format elf32-littlearm
+a.out.bad:     file format elf32-littlearm
 
 
 Disassembly of section .text:
@@ -59,7 +59,7 @@
     1422:	d004      	beq.n	142e <main+0x1e>
     1424:	e7ff      	b.n	1426 <main+0x16>
     1426:	2004      	movs	r0, #4
-    1428:	f000 e82a 	blx	1480 <main+0x70>
+    1428:	f000 f82a 	bl	1480 <main+0x70>
     142c:	e7ff      	b.n	142e <main+0x1e>
     142e:	9801      	ldr	r0, [sp, #4]
     1430:	b002      	add	sp, #8

I think that repro shows that this patch is likely incomplete. Unless the fix is easy to fix, it'd be nice if we could revert this to green up trunk.

psmith added a subscriber: psmith.Feb 5 2020, 12:59 PM

https://bugs.chromium.org/p/chromium/issues/detail?id=1047531#c19 anc #c20 suggest that no assembly is involved and that this instead related to jumps to attribute((weak)) functions.

Thanks for the details. I've just got home, sorry it is getting late in UK. It looks like the linker was outputting BL and is now outputting BLX. If I'm reading the diff right then - is bad and + is good

-  11d3e2:	f12c fe8d 	bl	24a100 <linker_script_end_of_text+0x828>
+  11d3e2:	f12c ee8e 	blx	24a100 <linker_script_end_of_text+0x828>

Given the PLT is always in ARM state then BLX is what I would expect.
The key line seems to be:

bool interwork = (rel.sym && rel.sym->isFunc()) || rel.type == R_PLT_PC;
if (interwork ? (val & 1) == 0 : isBlx) {

It looks like interwork is coming out false for weak symbols.

Just seen your latest message, thanks for the reproducer, it seems to match. I will try and fix as soon as possible. Given my schedule over the next few days (catching a flight tomorrow afternoon) I suggest reverting. It is likely a quick fix, but I'd prefer not to it late at night. I'll submit a new patch for review as soon as I can.

psmith added a comment.Feb 5 2020, 1:00 PM

My apologies for the breakage.

psmith added a comment.Feb 5 2020, 1:21 PM

I know what the problem is, it is a stupid typo on my part that looks superficially correct and the non-strong enum hasn't saved me at the type checker.

bool interwork = (rel.sym && rel.sym->isFunc()) || rel.type == R_PLT_PC;

Spot the deliberate mistake, this should be rel.expr == R_PLT_PC. There is a similar error at the R_ARM_CALL change, but we are saved by ARM state to ARM state being correct with the non-interwork case. So this needs two changes and a test. I can do this tomorrow morning, but given the possible delay in getting the patch approved, I still think reverting and I'll resubmit makes sense as I don't trust myself with detail this late.

I am also investigating it, calls to the undefined weak getauxval need to use BLX. I'll try fixing it.

psmith added a comment.Feb 5 2020, 1:52 PM

I am also investigating it, calls to the undefined weak getauxval need to use BLX. I'll try fixing it.

Thanks. It should just be a matter of changing rel.type to rel.expr on line 457 and line 412.

I am also investigating it, calls to the undefined weak getauxval need to use BLX. I'll try fixing it.

Thanks. It should just be a matter of changing rel.type to rel.expr on line 457 and line 412.

Fixed by 5461fa2b1fcfcfcd8e28e3ac3383d2245d5d90bf . R_ARM_THM_CALL tests were added to arm-thumb-interwork-shared.s

There is no R_ARM_CALL test.

MaskRay added inline comments.Feb 5 2020, 2:21 PM
lld/ELF/Arch/ARM.cpp
507
-     bool interwork = (rel.sym && rel.sym->isFunc()) || rel.type == R_PLT_PC;
+     bool interwork = (rel.sym && rel.sym->isFunc()) || rel.expr == R_PLT_PC;

When no IFUNC is involved, R_PLT_PC->R_PC optimization is entirely decided by sym.isPreemptible.
Can we use sym.isPreemptible instead?

When IFUNC is involved (there is also a FreeBSD extension -z ifunc-noplt), I get more confused.

// Relocations.cpp:1323
  if (!sym.isPreemptible && (!sym.isGnuIFunc() || config->zIfuncNoplt)) {

(I thought about IFUNC, but apologies that I did not raise the point in the review.)

thakis added a comment.Feb 5 2020, 4:59 PM

My apologies for the breakage.

No worries! Thanks for being very responsive and fixing quickly :) (To both you and maskray.)

MaskRay added a subscriber: hans.EditedFeb 5 2020, 5:39 PM

My apologies for the breakage.

No worries! Thanks for being very responsive and fixing quickly :) (To both you and maskray.)

<del>Please cherry pick 5461fa2b1fcfcfcd8e28e3ac3383d2245d5d90bf to release/10.x :)</del>

edit: We don't need it. D73542 is not in release/10.x

My apologies for the breakage.

No worries! Thanks for being very responsive and fixing quickly :) (To both you and maskray.)

<del>Please cherry pick 5461fa2b1fcfcfcd8e28e3ac3383d2245d5d90bf to release/10.x :)</del>

edit: We don't need it. D73542 is not in release/10.x

5461fa2b1fcfcfcd8e28e3ac3383d2245d5d90bf looks good; thanks very much! As you've pointed out the original fault isn't on 10.x so no need to cherry pick that to the branch.

hans added a comment.Feb 6 2020, 2:06 AM

5461fa2b1fcfcfcd8e28e3ac3383d2245d5d90bf looks good; thanks very much! As you've pointed out the original fault isn't on 10.x so no need to cherry pick that to the branch.

Thanks both for checking and keeping 10.x in mind!

We're possibly not yet out of the woods, there's a single additional failing test that also regressed in the ~200 revision window eaabaf7e..5e416ba9

I'm currently running another bisect at https://bugs.chromium.org/p/chromium/issues/detail?id=1049649 . It's in a bigger binary so that'll take a day or so to bisect and it's possible it's something else, but I figured I'd mention it.

thakis added a comment.Feb 7 2020, 4:39 AM

Yup, it bisected to this change as well.

It's now in a much bigger binary and analysis will probably take a bit longer than last time.

Would you mind reverting this while we figure out what exactly is going on? We haven't had a working trunk version in over a week now, and it'd be good if we could update chromium's llvm to make sure there are no other new regressions.

psmith added a comment.Feb 7 2020, 5:19 AM

Yup, it bisected to this change as well.

It's now in a much bigger binary and analysis will probably take a bit longer than last time.

Would you mind reverting this while we figure out what exactly is going on? We haven't had a working trunk version in over a week now, and it'd be good if we could update chromium's llvm to make sure there are no other new regressions.

I don't mind reverting for now as this is mostly a bring the tools to conform to the ld.bfd as the Linux kernel has had problems with at least branch (not branch with link) case. I'm away at a conference and don't have easy access to a terminal, would you be able to revert this and the fix? A test case would be great. I don't mind if it is an lld --reproduce (probably need a google drive link) and a pointer to which call is wrong?

lld/ELF/Arch/ARM.cpp
507

I'll need to check tomorrow. We're using knowledge that a PLT entry is always Arm state, perhaps sym.isInPlt() would be the clearest way to express that intent. I think sym.isPreemptible is a good proxy for that but not as descriptive. That would also avoid potential problems with the FreeBSD extension if it resulted in a R_PLT_PC relocation but no PLT entry.

thakis added a comment.Feb 7 2020, 5:56 AM

Thanks! I've reverted this (and maskray's fix follow-up) in c29003813ab . I'll work on getting you something actionable today.

I debugged this a bit. This time around, it's the assembly issue we originally expected :/

https://bugs.chromium.org/p/chromium/issues/detail?id=1049649#c7
https://bugs.chromium.org/p/chromium/issues/detail?id=1049649#c8
https://bugs.chromium.org/p/chromium/issues/detail?id=1049649#c9

I'll land that crashpad change, but this seems like a risky change to me. Maybe we could teach clang's integrated assembler to warn on .globls without an explicit .type when writing arm output before relanding or something? We (chromium) will be happy after the crashpad change I assume, but other projects out there that don't track upstream as closely might run into problems.

At the very least, this should have a release notes entry :)

Here's a post where the discussion is basically:

Q: Hey, what's .type?
A: Don't know, probably don't need it
A2: As of recently you do need it for some linkers, with a link to https://sourceware.org/ml/binutils/2012-03/msg00125.html

https://stackoverflow.com/questions/4423211/when-are-gas-elf-the-directives-type-thumb-size-and-section-needed

(The discussion is from 2012.)

This is hit 2 for the query "gas type arm assembly".

So I'm not super hopeful about .type annotations in assembly code usually being correct.

Sorry, one more pile-on comment: We were a bit lucky that we had a random test act as system test for our crash processing. If that hadn't happened, we wouldn't have seen this breakage and only eventually would have noticed that we no longer get crashes from the field. (I'm surprised we don't have dedicated crash tests! I'm sure we do somewhere and I'll follow up on why those didn't run.) But missing crash reports would've been a pain to track down. I could see assembly in other projects having a similar edge-case role, so this might cause similar problems there.

MaskRay added a comment.EditedFeb 7 2020, 4:57 PM

The test change did not need to be deleted by c29003813ab9bd6ea7b6de40ea8f1fe21979f13f. I added it back in f08099b9c26be463aca096f73e9863921dddf792.

Do you mean that lld's behavior is correct and the patch should reland?

@peter.smith and I may need to think more about .isGnuIFunc() || config->zIfuncNoplt)
@thakis So don't reland it too fast

thakis added a comment.Feb 7 2020, 6:27 PM

Well, it's technically correct, but it breaks real-world code in subtle ways none the less.

I think my suggestion of having clang's integrated assembler warn on .globls that are followd by assembly and that don't have an explicit .type might be a decent way forward.

Thanks for checking out the problem. I agree that this is a change in behaviour that isn't obvious to new authors of Arm assembler, and we have enough of a legacy of binaries that other people may be affected.

I think an assembler warning would probably not be as helpful as a linker one as the assembler can only catch the problem when there is a definition and a reference in the same .s file. For example:

.text
.global foo

bl foo

foo has type STT_NOTYPE (it's just a reference), what is important is that the definition of foo has type STT_FUNC, which may not have any references to it at all in the same file.

The linker knows the merged symbol type and can give a better warning. There are a couple of ways that I think we could do it:

  • Give a warning when LLD will change its behaviour from the previous version, and a hint on how to fix it, I think that this is only possible when there is a thumb BL to a STT_NOTYPE symbol, as STT_NOTYPE symbols that are labels have the bottom bit clear and look to LLD like ARM.
  • A more expensive check that uses the mapping symbols ($t, $a and $d) these symbols denote the Thumb, Arm and Literal data ranges, if we get a STT_NOTYPE symbol we can lookup the closest previous mapping symbol, we can either autocorrect the STT_NOTYPE feature or give a warning if they disagree. This does require a scan through the objects symbol table, but it given that we don't expect this to happen often then it would be worth it as part of an exceptional case.

I'm happy to work on either, although I think starting with the smallest change would be a good way to start.

thakis added a comment.Feb 8 2020, 4:42 PM

I'm happy to work on either, although I think starting with the smallest change would be a good way to start.

Sounds good. Happy to test the simple warning on our code (with the fix to crashpad's asm reverted, to make sure the warning would find that).

I've updated the diff to add a warning about no interworking for the cases when the behaviour changes from lld 10.0. I've set the details in a large comment in front of warnOnStateMismatch.

I've added checks for the warning to the existing test, and added a new test that checks the case where there is a non STT_FUNC symbol with the bottom bit set. I expect that case to be much rarer.

peter.smith reopened this revision.Feb 9 2020, 1:16 PM

Reopening as the addition of the warning is a non-trivial change.

This revision is now accepted and ready to land.Feb 9 2020, 1:16 PM

I'm happy to work on either, although I think starting with the smallest change would be a good way to start.

Sounds good. Happy to test the simple warning on our code (with the fix to crashpad's asm reverted, to make sure the warning would find that).

I have some questions. Are the two warnings added for migration convenience for some projects? Are they still useful after the migration? Do they have false positives? i.e. Is it not allowed to emit BLX referencing a Thumb STT_NONE symbol in ARM state? Is it not allowed to emit BL referencing an ARM STT_NONE symbol in Thumb state? If there are legitimate use cases, will the warnings break their builds if they are configured with -Wl,--fatal-warnings?

lld/ELF/Arch/ARM.cpp
407

bit0 != isBlx

408

bit0 == isBlx

412

const auto &d = cast<Defined>(s); works

414

cast<Defined>(s).section->name

Delete the variable d

459

When rel.sym == nullptr, is the R_ARM_CALL invalid? If yes, we can guard these statements with a single rel.sym nullness check.

461

Is the condition (including the checks performed in warnOnStateMismatch) simply: rel.sym && !rel.sym->isFunc() && isBlx?

508

Is the condition (including the checks performed in warnOnStateMismatch) simply: rel.sym && !rel.sym->isFunc() && !isBlx?

lld/test/ELF/arm-thumb-interwork-abs.s
4

--defsym sym=0x13001 can avoid %S/Inputs/arm-thumb-abs.s

lld/test/ELF/arm-thumb-interwork-notfunc.s
3

Is --no-threads significant?

78

Synthesize debug info with llvm-mc -g

Then use something like # CHECK: {{.*}}.s:[[# @LINE+1]]:1: warning: to check the line numbers.

I have some questions. Are the two warnings added for migration convenience for some projects? Are they still useful after the migration? Do they have false positives? i.e. Is it not allowed to emit BLX referencing a Thumb STT_NONE symbol in ARM state? Is it not allowed to emit BL referencing an ARM STT_NONE symbol in Thumb state? If there are legitimate use cases, will the warnings break their builds if they are configured with -Wl,--fatal-warnings?

The warnings are for migration convenience. They are still useful after the migration as very few people coming to Arm and Thumb assembly know that writing .type symbol, %function is important. I think most existing files in projects have a macro for declaring functions and this usually takes care of the type, but it could still save some new projects some pain.

The ABI rule is essentially symbol type is STT_FUNC, linker's responsibility for interworking, all other cases are the programmer's responsibility.

False positives are possible, although most of these are cases where LLD will now get the interworking right, and LLD 10.0 would get it wrong, so in theory they should be rare in code bases already linked by LLD.

.thumb
// foo has STT_NOTYPE, LLD 10.0 will treat as ARM and incorrectly turn this into a BLX
// LLD 11.0 will warn with false positive
foo: bx lr
        bl foo
// foo has STT_NOTYPE, LLD 10.0 will treat as ARM and incorrectly turn this into a BL
// LLD 11.0 will warn with false positive
.thumb
foo: bx lr
.arm
        blx foo

The bl to a "section symbol + offset" is the only case where the warning can't be fixed with .type symbol, %function.

.section .foo, "ax", %progbits
.thumb
nop
bx lr

bl .foo + 4

Yes the false positives could fail a build if fatal warnings was enabled. The only case I can think of that would do this is if someone wrote something like https://github.com/ClangBuiltLinux/linux/issues/325#issuecomment-457160379 but with the b changed to a bl. I don't it would be possible to set the type on the local label or section symbol as the programmer wouldn't know what it was.

#define __futex_atomic_ex_table(err_reg)
"3:\n"
" .pushsection __ex_table,"a"\n"
" .align 3\n"
" .long 1b, 4f, 2b, 4f\n"
" .popsection\n"
" .pushsection .text.fixup,"ax"\n"
" .align 2\n"
"4: mov %0, " err_reg "\n"
" bl 3b\n"
" .popsection"

We could eliminate the false positives if we use the mapping symbols the disassembler uses to cross check the state. LLD isn't set up to use the mapping symbols efficiently or easily though, essentially a linear scan through the objects local symbols. I'm happy to write something using them, but thought that it was better to try the minimal change first.

Will upload another diff with the changes.

lld/ELF/Arch/ARM.cpp
407

Thanks, for spotting, will apply and the ones below.

459

At the moment yes. If we ever need relocateNoSym with either R_ARM_CALL or R_ARM_THM_CALL we'll need to revisit. I've left an assert for the moment.

461

I think so. I've rewritten to have warnOnStateMismatch be just a warning, and have integrated the logic to trigger it into R_ARM_CALL and R_ARM_THM_CALL.

508

I've separated the logic from the warning as in R_ARM_CALL.

lld/test/ELF/arm-thumb-interwork-abs.s
4

Thanks, have applied.

lld/test/ELF/arm-thumb-interwork-notfunc.s
3

Not in the test, it was very helpful when writing as there are two sections with similar error messages, but they could come out interleaved. I've removed it and from the other test.

78

Thanks for the tip, I wasn't aware of that.

Updated to split the logic for the warning from the output. This permits some simplification. Updated tests.

While I'm here it is worth mentioning that I'll soon be no longer using peter.smith@linaro.org. My assignment from Arm to Linaro has ended so I'm back with peter.smith@arm.com (user psmith). This unfortunately means I won't have a lot of, if any, work time to spend on LLD so I'll most likely be slower to respond. I don't intend to disappear, so by all means ping if I'm late on a review.

MaskRay added a comment.EditedFeb 11 2020, 4:30 PM

I have some questions. Are the two warnings added for migration convenience for some projects? Are they still useful after the migration? Do they have false positives? i.e. Is it not allowed to emit BLX referencing a Thumb STT_NONE symbol in ARM state? Is it not allowed to emit BL referencing an ARM STT_NONE symbol in Thumb state? If there are legitimate use cases, will the warnings break their builds if they are configured with -Wl,--fatal-warnings?

The warnings are for migration convenience. They are still useful after the migration as very few people coming to Arm and Thumb assembly know that writing .type symbol, %function is important. I think most existing files in projects have a macro for declaring functions and this usually takes care of the type, but it could still save some new projects some pain.

The ABI rule is essentially symbol type is STT_FUNC, linker's responsibility for interworking, all other cases are the programmer's responsibility.

False positives are possible, although most of these are cases where LLD will now get the interworking right, and LLD 10.0 would get it wrong, so in theory they should be rare in code bases already linked by LLD.

.thumb
// foo has STT_NOTYPE, LLD 10.0 will treat as ARM and incorrectly turn this into a BLX
// LLD 11.0 will warn with false positive
foo: bx lr
        bl foo
// foo has STT_NOTYPE, LLD 10.0 will treat as ARM and incorrectly turn this into a BL
// LLD 11.0 will warn with false positive
.thumb
foo: bx lr
.arm
        blx foo

The bl to a "section symbol + offset" is the only case where the warning can't be fixed with .type symbol, %function.

.section .foo, "ax", %progbits
.thumb
nop
bx lr

bl .foo + 4

Yes the false positives could fail a build if fatal warnings was enabled. The only case I can think of that would do this is if someone wrote something like https://github.com/ClangBuiltLinux/linux/issues/325#issuecomment-457160379 but with the b changed to a bl. I don't it would be possible to set the type on the local label or section symbol as the programmer wouldn't know what it was.

#define __futex_atomic_ex_table(err_reg)
"3:\n"
" .pushsection __ex_table,"a"\n"
" .align 3\n"
" .long 1b, 4f, 2b, 4f\n"
" .popsection\n"
" .pushsection .text.fixup,"ax"\n"
" .align 2\n"
"4: mov %0, " err_reg "\n"
" bl 3b\n"
" .popsection"

We could eliminate the false positives if we use the mapping symbols the disassembler uses to cross check the state. LLD isn't set up to use the mapping symbols efficiently or easily though, essentially a linear scan through the objects local symbols. I'm happy to write something using them, but thought that it was better to try the minimal change first.

Will upload another diff with the changes.

I asked because I wonder whether a warning is the right approach. GNU ld does not warn. If the projects that are affected is just Chromium, do we still need a warning? It seems @thakis has already fixed the bug.

Is it not allowed to emit BLX referencing a Thumb STT_NONE symbol in ARM state? Is it not allowed to emit BL referencing an ARM STT_NONE symbol in Thumb state?

A warning can help users catch bugs, but it can also make a user with a legitimate use case frustrated.

My impression is that it's an easy enough bug to make with hand-written assembly. I don't know what the probability of a false positive is vs. actual issues being caught.

From the perspective of someone who works on a huge codebase that uses LLD to link for ARM, I'd much prefer a link-time warning (and we use --fatal-warnings so it'll be an error) to a runtime crash. That has to be balanced against the annoyance from false positives, of course, but I don't know what the real-world probability of hitting a false positive is.

I asked because I wonder whether a warning is the right approach. GNU ld does not warn. If the projects that are affected is just Chromium, do we still need a warning? It seems @thakis has already fixed the bug.
A warning can help users catch bugs, but it can also make a user with a legitimate use case frustrated.

Reasons for a warning

  • We are making a change in behavior to match bfd, projects only linked with gold or LLD for some portion of time could suffer runtime errors.
  • As thakis points out, Chromium has extensive testing, bisection to find problems, and someone like thakis to report and fix, not all projects will be so well served.
  • The author of the problem in Chromium didn't know that they needed to set the type of the symbol to STT_FUNC, I think that this situation is common.
  • We can limit the warning to cases where LLD would change behaviour, hence catch problems due to this change.
  • All compiled code will be correctly typed, it is only assembly language we need to worry about.

Reasons for not warning

  • There are false positives, which in one corner case (bl to section symbol + offset) could be difficult to resolve.
  • It makes the code more complicated

At the moment I'd prefer a warning. My suggestion for how to proceed:

  • Try with a warning, see if there are false positives reported
  • If no then keep it, if yes then write a more complex check using the mapping symbols

I asked because I wonder whether a warning is the right approach. GNU ld does not warn. If the projects that are affected is just Chromium, do we still need a warning? It seems @thakis has already fixed the bug.
A warning can help users catch bugs, but it can also make a user with a legitimate use case frustrated.

Reasons for a warning

  • We are making a change in behavior to match bfd, projects only linked with gold or LLD for some portion of time could suffer runtime errors.
  • As thakis points out, Chromium has extensive testing, bisection to find problems, and someone like thakis to report and fix, not all projects will be so well served.
  • The author of the problem in Chromium didn't know that they needed to set the type of the symbol to STT_FUNC, I think that this situation is common.
  • We can limit the warning to cases where LLD would change behaviour, hence catch problems due to this change.
  • All compiled code will be correctly typed, it is only assembly language we need to worry about.

Reasons for not warning

  • There are false positives, which in one corner case (bl to section symbol + offset) could be difficult to resolve.
  • It makes the code more complicated

At the moment I'd prefer a warning. My suggestion for how to proceed:

  • Try with a warning, see if there are false positives reported
  • If no then keep it, if yes then write a more complex check using the mapping symbols

Well said. Let's give it a try!

lld/test/ELF/Inputs/arm-thumb-abs.s
2

This file can be deleted.

MaskRay accepted this revision.Feb 12 2020, 8:08 AM
This revision was automatically updated to reflect the committed changes.