Page MenuHomePhabricator

x86 interrupt calling convention: Fix argument offsets
AbandonedPublic

Authored by phil-opp on Jan 3 2019, 8:46 AM.

Details

Reviewers
craig.topper
rnk
Summary

The x86 interrupt calling convention uses custom stack offsets for the parameters because they are pushed directly by the CPU. Thus the calling convention requires MFI.setObjectOffset(FI, Offset) calls before all returns in LowerMemArgument in order to set the correct offset. Commit 4c3428b604324ed1528a78dbc31c8c8805d3c3e6 introduced new code paths without MFI.setObjectOffset(FI, Offset), which broke the argument passing for the x86 interrupt calling convention in some cases.

For example, it resulted in incorrect error code arguments in Rust binaries, but only when compiled without optimizations: https://github.com/rust-lang/rust/issues/57270

This patch fixes this bug by adding the setObjectOffset calls to the two new code paths.

Diff Detail

Event Timeline

phil-opp created this revision.Jan 3 2019, 8:46 AM
nikic added a subscriber: nikic.Jan 3 2019, 9:35 AM

Test?

I tried to add additional checks to the x86-64-intrcc.ll test, but the patched version of llc generates the exact same assembly as the unpatched version for this test (with and without -O0). I think this is because the test doesn't use the relevant code paths. I'm not sure whether it's practical to construct new tests for every code path and how to construct such tests.

Test?

I tried to add additional checks to the x86-64-intrcc.ll test, but the patched version of llc generates the exact same assembly as the unpatched version for this test (with and without -O0). I think this is because the test doesn't use the relevant code paths. I'm not sure whether it's practical to construct new tests for every code path and how to construct such tests.

Either something is 'broken', and this patch fixes things, or the patch is not needed.

How do you know that this patch helps/does the right thing?
There is probably some code that now is not broken?
Can you write an automated check for that?
If yes, you could try to creduce/bugpoint that code to the minimal snippet that still shows the miscompile(?) fix.

rnk added a subscriber: aaboud.Jan 3 2019, 10:09 AM
rnk added inline comments.
lib/Target/X86/X86ISelLowering.cpp
2956–2970

We should really be fixing this so that the CCValAssign VA object has the right offset as soon as AnalyzeArguments in returns. Then all the generic code would work find without updating the stack offset after we create the fixed stack object.

It looks like @aaboud added it here.

2992

The existing test doesn't exercise this condition here. I think all you have to do to exercise it is to set up a function with this convention that copies this argument to an alloca, takes its address, and forwards it to another function.

Test?

I tried to add additional checks to the x86-64-intrcc.ll test, but the patched version of llc generates the exact same assembly as the unpatched version for this test (with and without -O0). I think this is because the test doesn't use the relevant code paths. I'm not sure whether it's practical to construct new tests for every code path and how to construct such tests.

Either something is 'broken', and this patch fixes things, or the patch is not needed.

How do you know that this patch helps/does the right thing?
There is probably some code that now is not broken?
Can you write an automated check for that?
If yes, you could try to creduce/bugpoint that code to the minimal snippet that still shows the miscompile(?) fix.

Yes, there is some broken Rust code that multiple people are able to reproduce: https://github.com/phil-opp/blog_os/issues/513. With this patch applied, the issue no longer occurs.

I spent my entire day trying to create a test case from the LLVM IR that the Rust code is translated to, but I could not reproduce the failing behavior in a test. I'm not sure why, maybe because of the different properties of the compilation target (e.g. disabled SSE). I'm sure that it would be relatively simple for someone with more LLVM experience, but I would need to invest much more time to familiarize myself with LLVM first, and I don't have the time to do so currently. If someone wants to take this from me or fix it in a different way (e.g. how @rnk suggested), that's fine with me, I just want this issue to be fixed.

That being said, this patch doesn't break any tests and only affects the x86 interrupt calling convention. Also, it does not introduce any new behavior, but just reverts the behavior change that was done in 4c3428b604324ed1528a78dbc31c8c8805d3c3e6. This behavior change was most likely accidental, because it was not mentioned in the commit and no new tests for the x86 interrupt calling convention were added either. So even though I understand the policy of requiring a test, it makes more sense to me to merge this without a test than to keep the wrong behavior until someone invests the needed time to construct a failing test.

rnk added a comment.Jan 4 2019, 3:14 PM

@phil-opp, sorry for the breakage, I understand your frustration. From my perspective, the x86 interrupt calling convention was hacked into the x86 backed in a way that doesn't play nice with orthogonal features, like the copy elision I added. Your fix proliferates the wrong direction of the existing design of the interrupt convention. Yes, it fixes the problem, but if I don't push back, require a test, contact Intel, the original authors of this feature, and pressure them to do it in a more proper way, LLVM will continue to become more unmaintanable.

I would promise to take responsibility and fix this properly myself, but I am going on vacation next week, and I don't plan to bring a laptop. I hesitate to make promises for freely given open source work with a time horizon beyond a week, but if you ping this in a week and a half, I can try to make a test and proper fix then.

In any case, thanks for the report, I'm aware of the problem now.

@rnk Thanks for your thoughtful reply! Sorry for voicing my frustration, I just feared that the issue remains unfixed because no one wants to invest the time to fix it properly. But I understand your reasoning and I see how the current code is somewhat hacky.

Thanks a lot for offering your help! It would be great if this could be fixed in a clean way, so that it won't break on each new code path in the future.

Ping @rnk: Do you have some time to fix this?

rnk added a comment.Jan 16 2019, 11:46 AM

Ping @rnk: Do you have some time to fix this?

Yep, I plan to pick it up this week.

Yep, I plan to pick it up this week.

Awesome, thanks a lot!

wxiao3 added a subscriber: wxiao3.Jan 17 2019, 6:11 PM
wxiao3 added a comment.EditedJan 17 2019, 6:17 PM

Hi,

I have narrowed the issue to a small test case as below and hope it can help you fix it:

%struct.interrupt_frame = type { i64, i64, i64, i64, i64 }

declare void @foo(%struct.interrupt_frame*, i64*)
define x86_intrcc void @test_fn_ecode(%struct.interrupt_frame* %frame, i64 %ecode) {
  %x.addr = alloca i64, align 4
  store i64 %ecode, i64* %x.addr, align 4
  call void @foo(%struct.interrupt_frame* %frame, i64* %x.addr)
  ret void
}

Wei

rnk added a comment.Jan 18 2019, 11:06 AM

I ended up putting together this small cleanup around calling conventions first:
https://reviews.llvm.org/D56883

phil-opp abandoned this revision.Jan 24 2019, 9:55 AM

Replaced by D56944