Current ArgPromotion implementation does not copy it: https://godbolt.org/z/zzTKof
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This looks reasonable to me. The load gets only moved, exactly the same value should get loaded. What about other metadata on loads?
Sorry for the delay. I am not sure if it is safe to copy all metadata, it would probably be better to add them one-by-one. I was mostly asking to check which other kinds would be safe. For example, debug metadata is function specific, so it can't simply be copied I think. Could you add a test with !dbg metadata?
Only copy seleted metadata.
Thank you @fhahn for the response. I went through the metadata types mentioned in https://llvm.org/docs/LangRef.html#load-instruction, and those listed in https://llvm.org/docs/LangRef.html#metadata that mentiones could be used on 'load', and hand picked those I think should be safe to copy.
Since I do not have write access to the repo, if this change looks good to you, would you want to commit it on my behalf?
The attributes in question look like a good start to me, thanks.
llvm/test/Transforms/ArgumentPromotion/metadata.ll | ||
---|---|---|
6 | If you are checking the full IR, you may want to use llvm/utils/update_test_checks.py to auto-generate the checks. The auto-generated checks use regular expressions + variables for value names, so they are less prone to break when value names change due to unrelated changes. | |
11 | Would a simple pointer type also work, or do we need a struct? |
update test cases:
- remove irrelevant constructions in test cases
- use update_test_checks.py to generate regex-based checks
llvm/test/Transforms/ArgumentPromotion/metadata.ll | ||
---|---|---|
11 | Thank you. Yes, a simple pointer is enough. The getelementptr is also unnecessary, so I also removed that. |
LGTM, thanks! Just a few small remaining edits for the test. Please also let me know which name + email to use when I commit the change after the updates.
This is causing a runtime failure in one of our tests. Specifically due to the nonnull metadata. It appears that this should not be cloned because the circumstances under which it is valid and applied to the load in the callee may not exist in the caller. I'm not sure about whether this is an issue for some of the other metadata. I'm going to try to create a smaller test case. But perhaps this change should be reverted.
Here is a small test case. Note I have to split the invocation into 2 to get the problem to reproduce, due to phase ordering between InstCombine Calls that creates the !nonnull metadata and ArgPromotion that propagates it during promotion. We are hitting this in a ThinLTO build, where the !nonnull metadata is added during the initial compile, and after some additional inlining in the ThinLTO post link pipeline it then gets the argument promotion.
I am not sure if the other attributes being propagated could also hit issues like this. Can you revert while it is being investigated?
$ cat c.cc struct Bar { int i; }; struct Foo final { Bar *b; bool x; }; __attribute__((noinline)) static int func(const Foo &f) { // This initially looked like: // __builtin_assume(!f.x || f.b != nullptr); // and in the original test case jump threading transformed it into the form // below with the assume split. Since I couldn't get jump threading to do this // for my small test case I have manually made the transformation here. if (!f.x) __builtin_assume(true); else __builtin_assume(f.b != nullptr); if (f.x) // Because this is essentially under the same condition as the // __builtin_assume(f.b != nullptr), the load of f.b gets the // !nonnull metadata. return f.b->i; return 0; } int caller(const Foo &f) { return func(f); } $ clang++ -O1 c.cc -flto=thin -S # c.s has !nonnull on dereference in func() $ opt < c.s -argpromotion -S # !nonnull has been propagated to dereference now in caller(). However, this load is no longer guarded by the check of "if (f.x)"
Sorry about the inconveniences, @tejohnson. I am fine with rolling this back, but since I do not have write access to the repo, @fhahn, would you like to roll it back?
I found a much shorter program that could reproduce this issue:
%struct.Foo = type { %struct.Bar*, i8 } %struct.Bar = type { i32 } ; Function Attrs: nounwind uwtable define i32 @caller(%struct.Foo* dereferenceable(16) %0) local_unnamed_addr #0 { %2 = call i32 @func(%struct.Foo* dereferenceable(16) %0) ret i32 %2 } ; Function Attrs: nounwind uwtable define internal i32 @func(%struct.Foo* dereferenceable(16) %0) unnamed_addr #0 { %2 = getelementptr inbounds %struct.Foo, %struct.Foo* %0, i64 0, i32 1 %3 = load i8, i8* %2, align 8 %4 = icmp eq i8 %3, 0 br i1 %4, label %10, label %5 5: ; preds = %1 %6 = getelementptr inbounds %struct.Foo, %struct.Foo* %0, i64 0, i32 0 %7 = load %struct.Bar*, %struct.Bar** %6, align 8, !nonnull !0 %8 = getelementptr inbounds %struct.Bar, %struct.Bar* %7, i64 0, i32 0 %9 = load i32, i32* %8, align 4 br label %10 10: ; preds = %1, %5 %11 = phi i32 [ %9, %5 ], [ 0, %1 ] ret i32 %11 } attributes #0 = { nounwind uwtable } !0 = !{}
Running opt < arg_promo.ll -argpromotion -S > arg_promo.opt2.ll produces:
; ModuleID = '<stdin>' source_filename = "<stdin>" %struct.Foo = type { %struct.Bar*, i8 } %struct.Bar = type { i32 } ; Function Attrs: nounwind uwtable define i32 @caller(%struct.Foo* dereferenceable(16) %0) local_unnamed_addr #0 { %.idx = getelementptr %struct.Foo, %struct.Foo* %0, i64 0, i32 0 %.idx.val = load %struct.Bar*, %struct.Bar** %.idx, align 8, !nonnull !0 %.idx1 = getelementptr %struct.Foo, %struct.Foo* %0, i64 0, i32 1 %.idx1.val = load i8, i8* %.idx1, align 8 %2 = call i32 @func(%struct.Bar* %.idx.val, i8 %.idx1.val) ret i32 %2 } ; Function Attrs: nounwind uwtable define internal i32 @func(%struct.Bar* %.0.0.val, i8 %.0.1.val) unnamed_addr #0 { %1 = icmp eq i8 %.0.1.val, 0 br i1 %1, label %5, label %2 2: ; preds = %0 %3 = getelementptr inbounds %struct.Bar, %struct.Bar* %.0.0.val, i64 0, i32 0 %4 = load i32, i32* %3, align 4 br label %5 5: ; preds = %2, %0 %6 = phi i32 [ %4, %2 ], [ 0, %0 ] ret i32 %6 } attributes #0 = { nounwind uwtable } !0 = !{}
Copying !nonnull isn't the issue here; the root problem is, the second load (%7 from block %5) should not be promoted, since block %5 will not always be executed.
I can try to fix this once I get a chance, but I cannot guarantee an ETA.
Here is a further reduction:
define i32* @caller(i1 %c, i32** dereferenceable(8) %arg) { %p = call i32* @func(i1 %c, i32** dereferenceable(8) %arg) ret i32* %p } define internal i32* @func(i1 %c, i32** dereferenceable(8) %arg) { bb: br i1 %c, label %if, label %else if: %p = load i32*, i32** %arg, align 8, !nonnull !{} ret i32* %p else: ret i32* null }
It's okay to promote the load, because the pointer is dereferenceable, but it's not okay to transfer the metadata. There are two pre-existing bugs in this respect: We were already transfering the AA metadata, which is also incorrect. And we were transferring the load alignment (this probably shouldn't be transferred at all but rather a precondition). We can only transfer metadata if the load is guaranteed-to-execute from the entry of the function.
We were already transfering the AA metadata, which is also incorrect.
IIUC, lack of AA metadata wouldn't result in incorrect code, so that SGTM. But for this:
And we were transferring the load alignment (this probably shouldn't be transferred at all but rather a precondition)
I have an counterexample:
int func(int *p) { if (p % 4 != 0) { return load(p, align=1); } return 0; }
From LLVM lang ref:
an omitted align argument means that the operation has the ABI alignment for the target... Overestimating the alignment results in undefined behavior... An alignment of 1 is always safe.
Not copying the align info should cause LLVM to fall back to load align 4, but since p % 4 != 0, it would produce undefined behavior.
So my guess is we should always copy align and !align; for the rest metadata items, as you suggested, only copy if they are guaranteed to be executed.
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp | ||
---|---|---|
319 | Without taking a closer look at the overall patch, this check is insufficient, because not all instructions in the entry block are guaranteed to execute (e.g. due to unwinding). There is isGuaranteedToTransferExecutionToSuccessor to check this. |
Hey, I was just about to submit something like your first patch version here independently (and with the same bugs).
Anyway the latest version of the diff makes sense to me. I added some nitpicks but this is essentially good to go.
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp | ||
---|---|---|
320–327 | avoid check for isOrigLoadGuaranteedToExecute in every loop iteration. | |
330–333 | This already happens in line 312. | |
335–339 | How about using llvm::copyMetadataForLoad from Transforms/Utils/Local.h at a first glance it seems to work fine even for loads in different functions. |
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp | ||
---|---|---|
330–333 | Please ignore my comment, just realized 312 is about alignment not AAInfo. |
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp | ||
---|---|---|
335–339 | Quoting @fhahn's comment above:
llvm::copyMetadataForLoad also copies debug metadata, so I think it's not safe to use here. It might be better to just hand pick a few ones that are safe to copy. |
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp | ||
---|---|---|
320–327 | We need to check the result of llvm::isGuaranteedToTransferExecutionToSuccessor() in every iteration anyways. The difference between my code and yours is minimal. :) |
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp | ||
---|---|---|
320–327 | Yep the difference is minuscule. Though as long as the code doesn't get more complicated, why not... Anyway no strong opinion if you want to keep as-is. | |
335–339 | Maintaining a custom list of metadata to copy is not great for future extension. Whenever someone adds a new kind of metadata he has to be aware of this list, understand the ArgumentPromotion pass and then add the new metadata to the list or the new metadata will get lost by the optimization. I wonder what makes debug info function-dependent. At a first glance it doesn't look like it is (given that the DILocations has a scope pointing to a DISubprogram which specifies the relevant function). Anyway I'm not too experienced here, so I may miss something. Well I'm gonna accept the changes as-is, but I think it's not ideal to have a custom list. |
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp | ||
---|---|---|
314–320 | Why is it okay to copy align metadata unconditionally? I would have expected it to get the same treatment as other metadata. |
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp | ||
---|---|---|
314–320 | Not an expert on this but I would expect alignments for the same type should never change? The logic has been there for 11 years: https://github.com/llvm/llvm-project/commit/81c03447fc6358197258903e3fc6ee65b16ee768 |
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp | ||
---|---|---|
314–320 | These are two separate things: The align attribute specifies the alignment of the load operand. The align metadata specifies the alignment of the load result (for pointer results). |
Without taking a closer look at the overall patch, this check is insufficient, because not all instructions in the entry block are guaranteed to execute (e.g. due to unwinding). There is isGuaranteedToTransferExecutionToSuccessor to check this.