Add DemandedElts support inside the TRUNCATE analysis.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| test/CodeGen/ARM/lowerMUL-newload.ll | ||
|---|---|---|
| 28 ↗ | (On Diff #180459) | This just looks like we're missing something for the ARMISD::VMULL lowering | 
| test/CodeGen/X86/avx512-any_extend_load.ll | ||
| 53 ↗ | (On Diff #180459) | Simplifying to ANY_EXTEND prevents PACKSS/PACKUS from working | 
| test/CodeGen/X86/combine-sra.ll | ||
| 252 ↗ | (On Diff #180459) | We'd been relying on the v4i64 ashr expansion | 
| test/CodeGen/X86/vector-blend.ll | ||
| 956 ↗ | (On Diff #180459) | Haven't worked out the problem here yet | 
| test/CodeGen/X86/vector-trunc-widen.ll | ||
| 77 ↗ | (On Diff #180459) | We'd been relying on the v8i64 ashr expansion | 
| test/CodeGen/X86/vector-blend.ll | ||
|---|---|---|
| 956 ↗ | (On Diff #180459) | I think we need to call SimplifyDemandedBits on Conditions of SHRUNKBLEND. We only do it when we convert from VSELECT to SHRUNKBLEND. | 
| test/CodeGen/ARM/lowerMUL-newload.ll | ||
|---|---|---|
| 28 ↗ | (On Diff #180459) | Using "CHECK-NEXT" and matching with the exact register names will make this test cast very sensitive to scheduling and register allocation changes. | 
| test/CodeGen/ARM/lowerMUL-newload.ll | ||
|---|---|---|
| 28 ↗ | (On Diff #180459) | But it stops people missing/hiding codegen changes that need to be kept an eye on, including register allocation changes. This argument has been going on for years now, and we've tended to see that the benefits of update_llc_test_checks.py outweighs any difficulties. More importantly, do you have any insights as to how to improve ARMISD::VMULL lowering? | 
Add support for ANY_EXTEND ops to ARM's LowerMUL.
This fixes the main MULL regression but I'm not sure how to fix the ADDW regression which seems to be a purely isel pattern - @t.p.northover @efriedma @huihuiz any thoughts?
The pattern in question comes out of https://github.com/llvm/llvm-project/blob/0fa0cf8638b0777a1a44feebf78a63865e48ecf6/llvm/lib/Target/ARM/ARMInstrNEON.td#L3100 , and it traces out to https://github.com/llvm/llvm-project/blob/0fa0cf8638b0777a1a44feebf78a63865e48ecf6/llvm/lib/Target/ARM/ARMInstrNEON.td#L4216 .
Probably we want to do what the Hexagon backend does: def asext: PatFrags<(ops node:$Rs), [(sext node:$Rs), (anyext node:$Rs)]>;.
| llvm/lib/Target/ARM/ARMISelLowering.cpp | ||
|---|---|---|
| 8693–8695 ↗ | (On Diff #291969) | return getOpcode() == ANY_EXTEND but I'm guessing this is just a placeholder function anyway | 
| llvm/test/CodeGen/AArch64/lowerMUL-newload.ll | ||
|---|---|---|
| 362 | ||
@dmgreen I've added ANY_EXTEND matching in isSignExtended to give you an indication of effect on codegen
I only looked at the ARM equivalent. From what I remember, the sequence of events was something like:
- One of the two operands to the mul was converted from a sext to an anyext. The other was not due to having multiple uses.
- That anyext was folded into a load to produce a zextload (we don't produce a vector anyext load)
- We couldn't match anything due one operand being a sext and the other being a zextload.
So in that case we would either need to use demanded bits know the top bits are not needed when converting it to a mull, create an anyextload instead of a zextload or handle multiple uses so both inputs turn into anyext or zextloads.
I'm happy for the isSignExtended change, as far as I understand that should be fine. The ARM side may be harder to fix, and as the test seems to only added for correctness - it doesn't seem like something that should hold up this patch. We should have fixed the majority of cases and if more come up we can tackle them as needed. I would be happy with this patch so long as the X86 changes are OK.
| llvm/test/CodeGen/X86/combine-sra.ll | ||
|---|---|---|
| 248–249 | Looking at this now - the set of combines that was necessary to get to the old codegen is pretty impressive..... | |
Seems fine to me, thanks.
| llvm/test/CodeGen/X86/vector-trunc.ll | ||
|---|---|---|
| 388–399 | Looks like for pre-SSE41 we still fail to detect high bits as zeros? | |
| llvm/test/CodeGen/X86/vector-trunc.ll | ||
|---|---|---|
| 388–399 | pre-SSE41 we don't have packusdw (I've no idea why this wasn't included in SSE2 with the rest of them...) so we have a fallback to continue to use packssdw | |
This caused asserts in Chromium. See https://bugs.chromium.org/p/chromium/issues/detail?id=1168629#c2 for a reproducer.
I've reverted in a51226057fc30510ac86b32a36a9769ddbf4c318 in the meantime.
@dmgreen What do you think is the best way to extend D93833 to handle multiply-add/sub as well? Handle in DAG or refactor the isel patterns to accept sanyext/zanyext (I didn't get very far with my initial attempt with this approach as a lot of the patfrags were hardcodded)?