This patch extends the load merge/widen in AggressiveInstCombine() defined in
https://reviews.llvm.org/D127392
to handle reverse load patterns.
Differential D135137
[AggressiveInstCombine] Load merge the reverse load pattern of consecutive loads. bipmis on Oct 4 2022, 3:28 AM. Authored by
Details This patch extends the load merge/widen in AggressiveInstCombine() defined in to handle reverse load patterns.
Diff Detail
Unit Tests
Event TimelineComment Actions Nice addition. I find the recursion a bit difficult to reason through, so forgive my questions below.
Comment Actions This improves the pattern-matching, but it's still not complete, right? Ie, we should have a PhaseOrdering test with all 24 (4!) combinations of four i8-loads spliced into a 32-bit value, so we know the pattern is matched no matter what order the values are put back together with 'or'. We can probably rely on instcombine to canonicalize half of those patterns, but I'm not sure there's anything else there to reduce the matching space. define i32 @loadCombine_4consecutive_1234(ptr %p) { %p1 = getelementptr i8, ptr %p, i32 1 %p2 = getelementptr i8, ptr %p, i32 2 %p3 = getelementptr i8, ptr %p, i32 3 %l1 = load i8, ptr %p %l2 = load i8, ptr %p1 %l3 = load i8, ptr %p2 %l4 = load i8, ptr %p3 %e1 = zext i8 %l1 to i32 %e2 = zext i8 %l2 to i32 %e3 = zext i8 %l3 to i32 %e4 = zext i8 %l4 to i32 %s2 = shl i32 %e2, 8 %s3 = shl i32 %e3, 16 %s4 = shl i32 %e4, 24 %o1 = or i32 %e1, %s2 %o2 = or i32 %o1, %s3 %o3 = or i32 %o2, %s4 ret i32 %o3 } define i32 @loadCombine_4consecutive_1243(ptr %p) { %p1 = getelementptr i8, ptr %p, i32 1 %p2 = getelementptr i8, ptr %p, i32 2 %p3 = getelementptr i8, ptr %p, i32 3 %l1 = load i8, ptr %p %l2 = load i8, ptr %p1 %l3 = load i8, ptr %p2 %l4 = load i8, ptr %p3 %e1 = zext i8 %l1 to i32 %e2 = zext i8 %l2 to i32 %e3 = zext i8 %l3 to i32 %e4 = zext i8 %l4 to i32 %s2 = shl i32 %e2, 8 %s3 = shl i32 %e3, 16 %s4 = shl i32 %e4, 24 %o1 = or i32 %e1, %s2 %o2 = or i32 %o1, %s4 %o3 = or i32 %o2, %s3 ret i32 %o3 } define i32 @loadCombine_4consecutive_1324(ptr %p) { %p1 = getelementptr i8, ptr %p, i32 1 %p2 = getelementptr i8, ptr %p, i32 2 %p3 = getelementptr i8, ptr %p, i32 3 %l1 = load i8, ptr %p %l2 = load i8, ptr %p1 %l3 = load i8, ptr %p2 %l4 = load i8, ptr %p3 %e1 = zext i8 %l1 to i32 %e2 = zext i8 %l2 to i32 %e3 = zext i8 %l3 to i32 %e4 = zext i8 %l4 to i32 %s2 = shl i32 %e2, 8 %s3 = shl i32 %e3, 16 %s4 = shl i32 %e4, 24 %o1 = or i32 %e1, %s3 %o2 = or i32 %o1, %s2 %o3 = or i32 %o2, %s4 ret i32 %o3 } define i32 @loadCombine_4consecutive_1342(ptr %p) { ... Comment Actions Right but won't these cover the most commonly seen patterns in the real application scenario, if not all considering it belongs to a pattern which generates a wider load. Comment Actions In dealing with the related case of trying to match bswap patterns, we've found that eventually we will see every possible pattern in source code somewhere.
I expect that we'll end up using cooperation between "reassociate", "instcombine", and "aggressive-instcombine" to get the optimal loads. That's why I suggest PhaseOrdering tests - it's not clear which pass will be responsible for canonicalizing everything, but we do want to make sure that a complete trip through the optimizer will catch all patterns. There's also an open question of when and how often to invoke aggressive-instcombine (for example, it's only enabled at -O3 currently). Comment Actions The tests for completeness look as expected, but we want those to live in tests/Transforms/PhaseOrdering and use "RUN: opt -O3 -S < %s" or similar. That way, we can verify that the various passes that are expected to alter this IR are working cooperatively (and some seemingly unrelated change in another pass won't break the optimization that we want to enable with this patch). Comment Actions Move the loadCombine.ll which has combinations of four i8-loads spliced into a 32-bit value test to Transforms/PhaseOrdering
Comment Actions LGTM.
Comment Actions Thanks. Comment Actions Thanks for checking. So there are 3 possible follow-ups to make the transform robust:
Comment Actions Heads up! We are having some tests break due to this patch. We are working on a reproducer on our end and will post as soon as we get it. Comment Actions Here is the reproducer - https://godbolt.org/z/7bcjzfYK4 The following program returns 33 on trunk and 0 on clang 14: #include <utility> template <typename ValueType> class Cell { public: Cell() = default; void Set(ValueType value) { value_ = value; has_data_ = true; } std::pair<ValueType, bool> GetAndMarkRead() { bool fresh = has_data_; has_data_ = false; return std::make_pair(value_, fresh); } bool GetAndMarkRead2() { bool fresh = has_data_; has_data_ = false; return fresh; } ValueType value_; bool has_data_ = false; }; bool foo() { Cell<bool> cell; cell.Set(true); return cell.GetAndMarkRead().second; } int main() { if (foo()) return 0; return 33; } GetAndMarkRead returns false instead of 'true`. Returning std::pair matters. Comment Actions Thanks for the test case. The issue likely is inserting the new merged load at a right insert point. I have done a patch to handle this. Can you please test the patch and confirm if this works correctly. Comment Actions Hello, This is blocking our releases in Google. I see the fix patch didn't have an update since Tuesday. Can you please give it higher priority ? Comment Actions Hello, as mentioned above there is a fix in https://reviews.llvm.org/D137201. We found another case that was incorrect though, so were looking into fixing that too. If it does take too long, perhaps a revert is in order in the meantime. Did you get a chance to verify the fix worked for your case? Thanks. Comment Actions I think as a quick fix we can switch off load merge when we encounter stores and eventually come with a proper fix with additional test cases. I will do the patch on the same so that this does not block us. Comment Actions
We unblocked ourselves for now. Will test the fix once it lands. Comment Actions
OK sounds good. @bipmis was on training this week, but did manage to put https://reviews.llvm.org/D137333 together which as far as we understand should fix all the known issues. Hopefully that will mean no more issues your end and we can work on a more substantial fix as we have more time. Please let us know if more issues do come up. Thanks. |
typo: Offest -> Offset