This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Bail out of casting calls when a conversion from/to byval is involved.
ClosedPublic

Authored by glandium on Oct 11 2022, 6:39 PM.

Diff Detail

Event Timeline

glandium created this revision.Oct 11 2022, 6:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2022, 6:39 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
glandium requested review of this revision.Oct 11 2022, 6:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2022, 6:39 PM
arsenm added a subscriber: arsenm.Oct 11 2022, 7:30 PM

Needs test case

glandium updated this revision to Diff 466997.Oct 11 2022, 7:45 PM

Added testcase.

glandium updated this revision to Diff 466999.Oct 11 2022, 7:48 PM

Refreshed test case to match what's in the issue.

arsenm added inline comments.Oct 11 2022, 7:57 PM
llvm/test/LTO/X86/cast-to-byval.ll
3 ↗(On Diff #466999)

Why does this test involve LTO or multiple modules? Why can't you just run instcombine?

glandium added inline comments.Oct 11 2022, 8:03 PM
llvm/test/LTO/X86/cast-to-byval.ll
3 ↗(On Diff #466999)

Because I have no idea how to reproduce without LTO. I failed with a bitcast of the function.

glandium added inline comments.Oct 11 2022, 8:14 PM
llvm/test/LTO/X86/cast-to-byval.ll
3 ↗(On Diff #466999)

Scrap that, just calling without a bitcast works.

glandium updated this revision to Diff 467002.Oct 11 2022, 8:17 PM

InstCombine test.

glandium updated this revision to Diff 467003.Oct 11 2022, 8:19 PM

Adjustment to the comment.

glandium updated this revision to Diff 467004.Oct 11 2022, 8:22 PM

Added missing datalayout and target.

(Sorry for the update noise)

I don't think your patch completely solves the issue you're describing. What happens if you have a struct where the only member is a pointer? clang will lower it to "ptr", rust will lower it to "ptr byval", and whatever LTO side-effects you're seeing will happen without this instcombine transform ever getting involved.

Not sure what the right solution looks like here; the current CallBase::getCalledFunction() isn't really designed to consider this case.

glandium added a comment.EditedOct 12 2022, 1:31 AM

It looks like it's not a problem:

cat <<EOF | opt --O3 -S
target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i686-unknown-linux-gnu"

define ptr @bar(ptr %0) {
  %2 = tail call ptr @foo(ptr %0)
  ret ptr %2
}

%Foo = type { ptr }
define ptr @foo (ptr byval(%Foo) %foo) {
  %1 = load ptr, ptr %foo, align 4
  ret ptr %1
}
EOF
; ModuleID = '<stdin>'
source_filename = "<stdin>"
target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i686-unknown-linux-gnu"

%Foo = type { ptr }

; Function Attrs: argmemonly mustprogress nofree norecurse nosync nounwind readonly willreturn
define ptr @bar(ptr nocapture readonly %0) local_unnamed_addr #0 {
  %2 = load ptr, ptr %0, align 4
  ret ptr %2
}

; Function Attrs: argmemonly mustprogress nofree norecurse nosync nounwind readonly willreturn
define ptr @foo(ptr nocapture readonly byval(%Foo) %foo) local_unnamed_addr #0 {
  %1 = load ptr, ptr %foo, align 4
  ret ptr %1
}

attributes #0 = { argmemonly mustprogress nofree norecurse nosync nounwind readonly willreturn }

(that's without the patch applied, btw ; with or without the patch makes no difference in this case)

nikic added a reviewer: efriedma.EditedOct 12 2022, 1:40 AM

Example for the case @efriedma mentions: https://llvm.godbolt.org/z/oPzWv9sr4

I believe this bug is on the rustc side and needs to be fixed there. There is an ABI mismatch on the IR level here, and I don't think we can ask IR transforms to reason about post-legalization ABI changes. We all know that LLVM's current way of dealing with platform ABIs is problematic, but we have to deal with what we have right now, and that requires all frontends to emit the same IR signature.

IR optimizations accommodate mismatches to some extent; if the types of the caller and the callee don't match, we treat the call as opaque, which effectively makes the call work if the actual registers match. But that doesn't detect ABI-modifying attributes.

It might be possible to adjust optimizations to handle this conservatively without any deep infrastructure changes. Basically, treat calls as opaque if there are significant ABI attributes that don't match... but I'm not sure what exactly that check would include, or how we make it fast.


That said, I don't have any objection to this patch... I mean, it can't really do any harm. I just want to be sure we're on the same page about what it does, and does not accomplish.

arsenm added inline comments.Oct 12 2022, 10:53 AM
llvm/test/Transforms/InstCombine/cast-to-byval.ll
4–5 ↗(On Diff #467004)

Shouldn't need this

IR optimizations accommodate mismatches to some extent; if the types of the caller and the callee don't match, we treat the call as opaque, which effectively makes the call work if the actual registers match. But that doesn't detect ABI-modifying attributes.

It might be possible to adjust optimizations to handle this conservatively without any deep infrastructure changes. Basically, treat calls as opaque if there are significant ABI attributes that don't match... but I'm not sure what exactly that check would include, or how we make it fast.


That said, I don't have any objection to this patch... I mean, it can't really do any harm. I just want to be sure we're on the same page about what it does, and does not accomplish.

My problem with this patch is that it does not address the problem in a principled way: It works around one specific instance of this problem, while retaining it in many other places. As you already pointed out, inlining for the struct containing a pointer case is broken. From a quick try, the same is true for argument promotion (https://llvm.godbolt.org/z/8We7dvYvT), we mix up the levels of indirection. The same is probably true for other IPO transforms as well. There's probably also issues with attribute interpretation that are much more wide-spread (e.g. does a nonnull refer to the byval pointer or its payload?)

If we wanted to support this, I think we would have to make an effort to support it more broadly. And I personally don't think this is justified, especially as this will need to be fixed on the rustc side anyway (if nothing else then to enable cross-language inlining after these changes).

glandium added inline comments.Oct 12 2022, 2:12 PM
llvm/test/Transforms/InstCombine/cast-to-byval.ll
4–5 ↗(On Diff #467004)

The test passes with the unpatched code without this.

From a quick try, the same is true for argument promotion (https://llvm.godbolt.org/z/8We7dvYvT), we mix up the levels of indirection.

This one doesn't happen without internal, which wouldn't happen in a real cross-module case, would it?

Example for the case @efriedma mentions: https://llvm.godbolt.org/z/oPzWv9sr4

This one is interesting because apart if you had a struct Foo { struct Foo* }, it seems to be a problem created by opaque pointers.

nikic added a comment.Oct 13 2022, 1:17 AM

From a quick try, the same is true for argument promotion (https://llvm.godbolt.org/z/8We7dvYvT), we mix up the levels of indirection.

This one doesn't happen without internal, which wouldn't happen in a real cross-module case, would it?

It can happen after fat LTO internalization. For thin LTO, I don't think it's possible right now (though I believe someone is working on supporting ThinLTO arg promotion, so that may not hold up in the future...)

Would you reconsider this patch? Sure, it doesn't address the issue generally, but it does fix the cases that were actively broken by 6c8adc5.

FWIW, https://llvm.godbolt.org/z/oPzWv9sr4 was already miscompiled with LLVM 14. As for https://llvm.godbolt.org/z/8We7dvYvT, it crashes LLVM 14.

nikic added a comment.Oct 20 2022, 6:44 AM

Okay, I looked a bit closer into what clang is doing here with the intention of implementing the same in rustc. This turns out to be more complicated than I originally thought (some notes in https://github.com/rust-lang/rust/issues/102174#issuecomment-1285502667). I believe this would require an entirely new PassMode, which can model the unpacked struct plus a possibly needed inreg padding register.

With that in mind, I'm okay with landing this mitigation in the meantime, especially as this does not lose any practical transformation power.

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
3400

Do we need to prevent the reverse direction as well? I.e. call-site has byval, function doesn't.

llvm/test/Transforms/InstCombine/cast-to-byval.ll
2 ↗(On Diff #467004)

Use update_test_checks.py.

4–5 ↗(On Diff #467004)

You need to change to an i64 argument when dropping the data layout, so it matches the ptr size.

glandium added inline comments.Oct 20 2022, 1:55 PM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
3400

In theory, I guess we do. At least from the "better safe than sorry" perspective. But in practice it doesn't seem to be a problem: no conversion from byval to integer is attempted, and for pointers, the call is not changed in the first place. https://llvm.godbolt.org/z/5ov489MxP

glandium added inline comments.Oct 20 2022, 2:03 PM
llvm/test/Transforms/InstCombine/cast-to-byval.ll
2 ↗(On Diff #467004)

Interestingly, the CHECKs it produces fail...

nikic added inline comments.Oct 20 2022, 2:07 PM
llvm/test/Transforms/InstCombine/cast-to-byval.ll
2 ↗(On Diff #467004)

At a guess, you need to reorder the foo and bar functions or pass --function-signature.

glandium added inline comments.Oct 20 2022, 2:28 PM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
3400

Preventing conversions from byval actually breaks Transforms/InstCombine/byval.ll. This is what update_test_checks.py would update:

--- a/llvm/test/Transforms/InstCombine/byval.ll
+++ b/llvm/test/Transforms/InstCombine/byval.ll
@@ -7,8 +7,7 @@ declare void @add_byval_callee_2(double* byval(double))
 
 define void @add_byval(i64* %in) {
 ; CHECK-LABEL: @add_byval(
-; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i64* [[IN:%.*]] to double*
-; CHECK-NEXT:    call void @add_byval_callee(double* byval(double) [[TMP1]])
+; CHECK-NEXT:    call void bitcast (void (double*)* @add_byval_callee to void (i64*)*)(i64* byval(i64) [[IN:%.*]])
 ; CHECK-NEXT:    ret void
 ;
   %tmp = bitcast void (double*)* @add_byval_callee to void (i64*)*

This change however doesn't feel wrong. Something seems off with the original test.

llvm/test/Transforms/InstCombine/cast-to-byval.ll
2 ↗(On Diff #467004)

Reordering made it work. Thanks.

glandium updated this revision to Diff 469365.Oct 20 2022, 2:50 PM

Removed datalayout/Ran update_test_checks

nikic accepted this revision.Oct 21 2022, 1:22 AM

LGTM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
3400

That change would be fine (in the sense that it is in line with the restriction you're introducing here), though it looks like this ultimately doesn't matter with opaque pointers.

llvm/test/Transforms/InstCombine/cast-to-byval.ll
3 ↗(On Diff #469365)

Drop the mtriple here -- that shouldn't be needed and would require a REQUIRES to avoid breakage when the target is not built.

This revision is now accepted and ready to land.Oct 21 2022, 1:22 AM
glandium added inline comments.Oct 21 2022, 1:43 PM
llvm/test/Transforms/InstCombine/cast-to-byval.ll
3 ↗(On Diff #469365)

does opt default to 64-bits on 32-bits hosts or does running tests on 32-bits not supported?

nikic added inline comments.Oct 21 2022, 1:46 PM
llvm/test/Transforms/InstCombine/cast-to-byval.ll
3 ↗(On Diff #469365)

Opt uses a 64-bit data layout by default, independent of host.

glandium updated this revision to Diff 469756.Oct 21 2022, 1:54 PM

Removed -mtriple ; Restricted conversions from byval as well.

glandium retitled this revision from [InstCombine] Bail out of casting calls when a conversion to byval is involved. to [InstCombine] Bail out of casting calls when a conversion from/to byval is involved..Oct 21 2022, 1:54 PM

@nikic can you push this for me?