Page MenuHomePhabricator

[ArgPromotion] Copy !range metadata for loads.
AcceptedPublic

Authored by wecing on Dec 29 2020, 5:06 PM.

Details

Summary

Current ArgPromotion implementation does not copy it: https://godbolt.org/z/zzTKof

Diff Detail

Event Timeline

wecing created this revision.Dec 29 2020, 5:06 PM
wecing requested review of this revision.Dec 29 2020, 5:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 29 2020, 5:06 PM
fhahn added a comment.Jan 7 2021, 6:12 AM

This looks reasonable to me. The load gets only moved, exactly the same value should get loaded. What about other metadata on loads?

wecing updated this revision to Diff 315275.Jan 7 2021, 4:38 PM

Copy all metadata.

ping -- @fhahn , does the new change look reasonable to you?

Copy all metadata.

Sorry for the delay. I am not sure if it is safe to copy all metadata, it would probably be better to add them one-by-one. I was mostly asking to check which other kinds would be safe. For example, debug metadata is function specific, so it can't simply be copied I think. Could you add a test with !dbg metadata?

wecing updated this revision to Diff 329845.Mar 10 2021, 10:02 PM

Only copy seleted metadata.

Thank you @fhahn for the response. I went through the metadata types mentioned in https://llvm.org/docs/LangRef.html#load-instruction, and those listed in https://llvm.org/docs/LangRef.html#metadata that mentiones could be used on 'load', and hand picked those I think should be safe to copy.

Since I do not have write access to the repo, if this change looks good to you, would you want to commit it on my behalf?

Only copy seleted metadata.

Thank you @fhahn for the response. I went through the metadata types mentioned in https://llvm.org/docs/LangRef.html#load-instruction, and those listed in https://llvm.org/docs/LangRef.html#metadata that mentiones could be used on 'load', and hand picked those I think should be safe to copy.

The attributes in question look like a good start to me, thanks.

llvm/test/Transforms/ArgumentPromotion/metadata.ll
6

If you are checking the full IR, you may want to use llvm/utils/update_test_checks.py to auto-generate the checks. The auto-generated checks use regular expressions + variables for value names, so they are less prone to break when value names change due to unrelated changes.

11

Would a simple pointer type also work, or do we need a struct?

wecing updated this revision to Diff 330086.Mar 11 2021, 3:35 PM

update test cases:

  • remove irrelevant constructions in test cases
  • use update_test_checks.py to generate regex-based checks
wecing marked 2 inline comments as done.Mar 11 2021, 3:37 PM
wecing added inline comments.
llvm/test/Transforms/ArgumentPromotion/metadata.ll
11

Thank you. Yes, a simple pointer is enough. The getelementptr is also unnecessary, so I also removed that.

fhahn accepted this revision.Mar 12 2021, 2:16 AM

LGTM, thanks! Just a few small remaining edits for the test. Please also let me know which name + email to use when I commit the change after the updates.

This revision is now accepted and ready to land.Mar 12 2021, 2:16 AM
fhahn added inline comments.Mar 12 2021, 2:18 AM
llvm/test/Transforms/ArgumentPromotion/metadata.ll
8

can you add a manual check line at the end of the file to check that the !0 metadata matches the expected range?

33

please also add a test for dereferenceable

wecing updated this revision to Diff 330390.Mar 12 2021, 3:45 PM
wecing marked an inline comment as done.
  • add test for !dereferenceable
  • check metadata objects after optimization
wecing marked 2 inline comments as done.Mar 12 2021, 3:46 PM

Please use "Chenguang Wang" and w3cing[at]gmail.com. Thank you @fhahn!

This revision was automatically updated to reflect the committed changes.

This is causing a runtime failure in one of our tests. Specifically due to the nonnull metadata. It appears that this should not be cloned because the circumstances under which it is valid and applied to the load in the callee may not exist in the caller. I'm not sure about whether this is an issue for some of the other metadata. I'm going to try to create a smaller test case. But perhaps this change should be reverted.

This is causing a runtime failure in one of our tests. Specifically due to the nonnull metadata. It appears that this should not be cloned because the circumstances under which it is valid and applied to the load in the callee may not exist in the caller. I'm not sure about whether this is an issue for some of the other metadata. I'm going to try to create a smaller test case. But perhaps this change should be reverted.

Here is a small test case. Note I have to split the invocation into 2 to get the problem to reproduce, due to phase ordering between InstCombine Calls that creates the !nonnull metadata and ArgPromotion that propagates it during promotion. We are hitting this in a ThinLTO build, where the !nonnull metadata is added during the initial compile, and after some additional inlining in the ThinLTO post link pipeline it then gets the argument promotion.

I am not sure if the other attributes being propagated could also hit issues like this. Can you revert while it is being investigated?

$ cat c.cc
struct Bar {
  int i;
};
struct Foo final {
  Bar *b;
  bool x;
};

__attribute__((noinline)) static int func(const Foo &f) {
  // This initially looked like:
  // __builtin_assume(!f.x || f.b != nullptr);
  // and in the original test case jump threading transformed it into the form
  // below with the assume split. Since I couldn't get jump threading to do this
  // for my small test case I have manually made the transformation here.
  if (!f.x)
    __builtin_assume(true);
  else
    __builtin_assume(f.b != nullptr);
  if (f.x)
    // Because this is essentially under the same condition as the
    // __builtin_assume(f.b != nullptr), the load of f.b gets the
    // !nonnull metadata.
    return f.b->i;
  return 0;
}

int caller(const Foo &f) {
  return func(f);
}

$ clang++ -O1 c.cc -flto=thin -S
# c.s has !nonnull on dereference in func()
$ opt < c.s -argpromotion -S
# !nonnull has been propagated to dereference now in caller(). However, this load is no longer guarded by the check of "if (f.x)"

Sorry about the inconveniences, @tejohnson. I am fine with rolling this back, but since I do not have write access to the repo, @fhahn, would you like to roll it back?


I found a much shorter program that could reproduce this issue:

%struct.Foo = type { %struct.Bar*, i8 }
%struct.Bar = type { i32 }

; Function Attrs: nounwind uwtable
define i32 @caller(%struct.Foo* dereferenceable(16) %0) local_unnamed_addr #0 {
  %2 = call i32 @func(%struct.Foo* dereferenceable(16) %0)
  ret i32 %2
}

; Function Attrs: nounwind uwtable
define internal i32 @func(%struct.Foo* dereferenceable(16) %0) unnamed_addr #0 {
  %2 = getelementptr inbounds %struct.Foo, %struct.Foo* %0, i64 0, i32 1
  %3 = load i8, i8* %2, align 8
  %4 = icmp eq i8 %3, 0
  br i1 %4, label %10, label %5

5:                                                ; preds = %1
  %6 = getelementptr inbounds %struct.Foo, %struct.Foo* %0, i64 0, i32 0
  %7 = load %struct.Bar*, %struct.Bar** %6, align 8, !nonnull !0
  %8 = getelementptr inbounds %struct.Bar, %struct.Bar* %7, i64 0, i32 0
  %9 = load i32, i32* %8, align 4
  br label %10

10:                                               ; preds = %1, %5
  %11 = phi i32 [ %9, %5 ], [ 0, %1 ]
  ret i32 %11
}

attributes #0 = { nounwind uwtable }

!0 = !{}

Running opt < arg_promo.ll -argpromotion -S > arg_promo.opt2.ll produces:

; ModuleID = '<stdin>'
source_filename = "<stdin>"

%struct.Foo = type { %struct.Bar*, i8 }
%struct.Bar = type { i32 }

; Function Attrs: nounwind uwtable
define i32 @caller(%struct.Foo* dereferenceable(16) %0) local_unnamed_addr #0 {
  %.idx = getelementptr %struct.Foo, %struct.Foo* %0, i64 0, i32 0
  %.idx.val = load %struct.Bar*, %struct.Bar** %.idx, align 8, !nonnull !0
  %.idx1 = getelementptr %struct.Foo, %struct.Foo* %0, i64 0, i32 1
  %.idx1.val = load i8, i8* %.idx1, align 8
  %2 = call i32 @func(%struct.Bar* %.idx.val, i8 %.idx1.val)
  ret i32 %2
}

; Function Attrs: nounwind uwtable
define internal i32 @func(%struct.Bar* %.0.0.val, i8 %.0.1.val) unnamed_addr #0 {
  %1 = icmp eq i8 %.0.1.val, 0
  br i1 %1, label %5, label %2

2:                                                ; preds = %0
  %3 = getelementptr inbounds %struct.Bar, %struct.Bar* %.0.0.val, i64 0, i32 0
  %4 = load i32, i32* %3, align 4
  br label %5

5:                                                ; preds = %2, %0
  %6 = phi i32 [ %4, %2 ], [ 0, %0 ]
  ret i32 %6
}

attributes #0 = { nounwind uwtable }

!0 = !{}

Copying !nonnull isn't the issue here; the root problem is, the second load (%7 from block %5) should not be promoted, since block %5 will not always be executed.

I can try to fix this once I get a chance, but I cannot guarantee an ETA.

nikic reopened this revision.Fri, Mar 26, 2:04 PM
nikic added a subscriber: nikic.

Here is a further reduction:

define i32* @caller(i1 %c, i32** dereferenceable(8) %arg) {
  %p = call i32* @func(i1 %c, i32** dereferenceable(8) %arg)
  ret i32* %p
} 

define internal i32* @func(i1 %c, i32** dereferenceable(8) %arg) {
bb:
  br i1 %c, label %if, label %else
  
if:
  %p = load i32*, i32** %arg, align 8, !nonnull !{}
  ret i32* %p 

else:
  ret i32* null                                    
}

It's okay to promote the load, because the pointer is dereferenceable, but it's not okay to transfer the metadata. There are two pre-existing bugs in this respect: We were already transfering the AA metadata, which is also incorrect. And we were transferring the load alignment (this probably shouldn't be transferred at all but rather a precondition). We can only transfer metadata if the load is guaranteed-to-execute from the entry of the function.

This revision is now accepted and ready to land.Fri, Mar 26, 2:04 PM

We were already transfering the AA metadata, which is also incorrect.

IIUC, lack of AA metadata wouldn't result in incorrect code, so that SGTM. But for this:

And we were transferring the load alignment (this probably shouldn't be transferred at all but rather a precondition)

I have an counterexample:

int func(int *p) {
    if (p % 4 != 0) {
        return load(p, align=1);
    }
    return 0;
}

From LLVM lang ref:

an omitted align argument means that the operation has the ABI alignment for the target... Overestimating the alignment results in undefined behavior... An alignment of 1 is always safe.

Not copying the align info should cause LLVM to fall back to load align 4, but since p % 4 != 0, it would produce undefined behavior.

So my guess is we should always copy align and !align; for the rest metadata items, as you suggested, only copy if they are guaranteed to be executed.

wecing updated this revision to Diff 334030.Mon, Mar 29, 6:30 PM

Conditionally copy most metadata info.

Updated this patch according to my previous comment. Would you mind taking another look, @fhahn and @nikic ? Thanks!