This patch extends the load merge/widen in AggressiveInstCombine() defined in
https://reviews.llvm.org/D127392
to handle reverse load patterns.
Paths
| Differential D135137
[AggressiveInstCombine] Load merge the reverse load pattern of consecutive loads. ClosedPublic Authored by bipmis on Oct 4 2022, 3:28 AM.
Details Summary This patch extends the load merge/widen in AggressiveInstCombine() defined in to handle reverse load patterns.
Diff Detail
Event TimelineComment Actions Nice addition. I find the recursion a bit difficult to reason through, so forgive my questions below.
bipmis added inline comments.
Comment Actions This improves the pattern-matching, but it's still not complete, right? Ie, we should have a PhaseOrdering test with all 24 (4!) combinations of four i8-loads spliced into a 32-bit value, so we know the pattern is matched no matter what order the values are put back together with 'or'. We can probably rely on instcombine to canonicalize half of those patterns, but I'm not sure there's anything else there to reduce the matching space. define i32 @loadCombine_4consecutive_1234(ptr %p) { %p1 = getelementptr i8, ptr %p, i32 1 %p2 = getelementptr i8, ptr %p, i32 2 %p3 = getelementptr i8, ptr %p, i32 3 %l1 = load i8, ptr %p %l2 = load i8, ptr %p1 %l3 = load i8, ptr %p2 %l4 = load i8, ptr %p3 %e1 = zext i8 %l1 to i32 %e2 = zext i8 %l2 to i32 %e3 = zext i8 %l3 to i32 %e4 = zext i8 %l4 to i32 %s2 = shl i32 %e2, 8 %s3 = shl i32 %e3, 16 %s4 = shl i32 %e4, 24 %o1 = or i32 %e1, %s2 %o2 = or i32 %o1, %s3 %o3 = or i32 %o2, %s4 ret i32 %o3 } define i32 @loadCombine_4consecutive_1243(ptr %p) { %p1 = getelementptr i8, ptr %p, i32 1 %p2 = getelementptr i8, ptr %p, i32 2 %p3 = getelementptr i8, ptr %p, i32 3 %l1 = load i8, ptr %p %l2 = load i8, ptr %p1 %l3 = load i8, ptr %p2 %l4 = load i8, ptr %p3 %e1 = zext i8 %l1 to i32 %e2 = zext i8 %l2 to i32 %e3 = zext i8 %l3 to i32 %e4 = zext i8 %l4 to i32 %s2 = shl i32 %e2, 8 %s3 = shl i32 %e3, 16 %s4 = shl i32 %e4, 24 %o1 = or i32 %e1, %s2 %o2 = or i32 %o1, %s4 %o3 = or i32 %o2, %s3 ret i32 %o3 } define i32 @loadCombine_4consecutive_1324(ptr %p) { %p1 = getelementptr i8, ptr %p, i32 1 %p2 = getelementptr i8, ptr %p, i32 2 %p3 = getelementptr i8, ptr %p, i32 3 %l1 = load i8, ptr %p %l2 = load i8, ptr %p1 %l3 = load i8, ptr %p2 %l4 = load i8, ptr %p3 %e1 = zext i8 %l1 to i32 %e2 = zext i8 %l2 to i32 %e3 = zext i8 %l3 to i32 %e4 = zext i8 %l4 to i32 %s2 = shl i32 %e2, 8 %s3 = shl i32 %e3, 16 %s4 = shl i32 %e4, 24 %o1 = or i32 %e1, %s3 %o2 = or i32 %o1, %s2 %o3 = or i32 %o2, %s4 ret i32 %o3 } define i32 @loadCombine_4consecutive_1342(ptr %p) { ... Comment Actions
Right but won't these cover the most commonly seen patterns in the real application scenario, if not all considering it belongs to a pattern which generates a wider load. Comment Actions
In dealing with the related case of trying to match bswap patterns, we've found that eventually we will see every possible pattern in source code somewhere.
I expect that we'll end up using cooperation between "reassociate", "instcombine", and "aggressive-instcombine" to get the optimal loads. That's why I suggest PhaseOrdering tests - it's not clear which pass will be responsible for canonicalizing everything, but we do want to make sure that a complete trip through the optimizer will catch all patterns. There's also an open question of when and how often to invoke aggressive-instcombine (for example, it's only enabled at -O3 currently). Comment Actions The tests for completeness look as expected, but we want those to live in tests/Transforms/PhaseOrdering and use "RUN: opt -O3 -S < %s" or similar. That way, we can verify that the various passes that are expected to alter this IR are working cooperatively (and some seemingly unrelated change in another pass won't break the optimization that we want to enable with this patch). Comment Actions Move the loadCombine.ll which has combinations of four i8-loads spliced into a 32-bit value test to Transforms/PhaseOrdering
Comment Actions LGTM.
This revision is now accepted and ready to land.Oct 18 2022, 9:10 AM Comment Actions
Thanks. bipmis retitled this revision from [AggressiveInstCombine] Load merge the reverse load pattern and mixed load sizes. to [AggressiveInstCombine] Load merge the reverse load pattern of consecutive loads..Oct 19 2022, 3:14 AM Closed by commit rG38f3e44997f0: [AggressiveInstCombine] Load merge the reverse load pattern of consecutive… (authored by bipmis). · Explain WhyOct 19 2022, 3:23 AM This revision was automatically updated to reflect the committed changes. Comment Actions
Thanks for checking. So there are 3 possible follow-ups to make the transform robust:
Comment Actions Heads up! We are having some tests break due to this patch. We are working on a reproducer on our end and will post as soon as we get it. Comment Actions Here is the reproducer - https://godbolt.org/z/7bcjzfYK4 The following program returns 33 on trunk and 0 on clang 14: #include <utility> template <typename ValueType> class Cell { public: Cell() = default; void Set(ValueType value) { value_ = value; has_data_ = true; } std::pair<ValueType, bool> GetAndMarkRead() { bool fresh = has_data_; has_data_ = false; return std::make_pair(value_, fresh); } bool GetAndMarkRead2() { bool fresh = has_data_; has_data_ = false; return fresh; } ValueType value_; bool has_data_ = false; }; bool foo() { Cell<bool> cell; cell.Set(true); return cell.GetAndMarkRead().second; } int main() { if (foo()) return 0; return 33; } GetAndMarkRead returns false instead of 'true`. Returning std::pair matters. bipmis mentioned this in D137201: [AggressiveInstCombine] Handle the insert point of the merged load correctly..Nov 1 2022, 3:12 PM Comment Actions
Thanks for the test case. The issue likely is inserting the new merged load at a right insert point. I have done a patch to handle this. Can you please test the patch and confirm if this works correctly. Comment Actions
Hello, This is blocking our releases in Google. I see the fix patch didn't have an update since Tuesday. Can you please give it higher priority ? Comment Actions Hello, as mentioned above there is a fix in https://reviews.llvm.org/D137201. We found another case that was incorrect though, so were looking into fixing that too. If it does take too long, perhaps a revert is in order in the meantime. Did you get a chance to verify the fix worked for your case? Thanks. Comment Actions
I think as a quick fix we can switch off load merge when we encounter stores and eventually come with a proper fix with additional test cases. I will do the patch on the same so that this does not block us. Comment Actions
We unblocked ourselves for now. Will test the fix once it lands. Comment Actions
OK sounds good. @bipmis was on training this week, but did manage to put https://reviews.llvm.org/D137333 together which as far as we understand should fix all the known issues. Hopefully that will mean no more issues your end and we can work on a more substantial fix as we have more time. Please let us know if more issues do come up. Thanks. Comment Actions
The test passed with the fix. Thanks!
Revision Contents
Diff 468298 llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
llvm/test/Transforms/AggressiveInstCombine/AArch64/loadcombine-phaseordering.ll
llvm/test/Transforms/AggressiveInstCombine/AArch64/or-load.ll
llvm/test/Transforms/AggressiveInstCombine/X86/loadcombine-phaseordering.ll
llvm/test/Transforms/AggressiveInstCombine/X86/or-load.ll
|
typo: Offest -> Offset