If the upper bits of the SHL result aren't used, we might be able to use a narrower shift. For example, on X86 this can turn a 64-bit into 32-bit enabling a smaller encoding.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 30138 Build 30137: arc lint + arc unit
Event Timeline
llvm/test/CodeGen/X86/zext-logicop-shift-load.ll | ||
---|---|---|
25 | Are you looking at this as a followup? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
11041–11045 | It would be better to split this change off on its own while adding a test specifically for this pattern. IIUC, we can modify the existing test slightly and show the missing fold: declare void @t() define void @tbz_zext(i32 %in) { %shl = shl i32 %in, 3 %t = zext i32 %shl to i64 %and = and i64 %t, 32 %cond = icmp eq i64 %and, 0 br i1 %cond, label %then, label %end then: call void @t() br label %end end: ret void } |
Remove the AArch64 code change. Show the regression instead. I'll work on the separate patch and rebase accordingly depending on what order they get committed
I think D60482 should go in 1st, so we avoid that known regression. There's still an open question about the x86 LEA matching. I've seen that or similar matching failures in other tests, so it would be nice to catch it first too.
I wonder if losing the wrapping flags is hurting. Although we should be able to use knownbits to restore the knowledge. Something like this?
define i64 @lea(i64 %t0, i32 %t1) { %t4 = add nuw nsw i32 %t1, 8 %sh = shl nsw i32 %t4, 2 %t5 = zext i32 %sh to i64 %t6 = add i64 %t5, %t0 ret i64 %t6 }
Produces:
leal 32(,%rsi,4), %eax
addq %rdi, %rax
Instead of:
leaq 32(%rdi,%rsi,4), %rax
For the LEA regression here, we need to teach foldMaskedShiftToScaledMask to look through the any_extend to find the shift and reinsert the any_extend in the new ordering.
It would be better to split this change off on its own while adding a test specifically for this pattern.
IIUC, we can modify the existing test slightly and show the missing fold: