This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/X86/pr31956.ll | ||
---|---|---|
13–14 | Looked at this a bit, and I think this is ok. We are intentionally being aggressive about duplicating multi-use loads because eliminating the dependency and reducing register pressure (assuming load-folding) is probably better for perf if this code is in a loop. In this particular case, there seems to be an opportunity to commute the shufps masks in lowerShuffleWithSHUFPS() in the case where we create 2 shufps ops. I'm guessing that's a very rare occurrence, so not sure if it's worth a TODO comment/bug report. |
llvm/test/CodeGen/X86/pr31956.ll | ||
---|---|---|
13–14 | That shouldn't be a problem, I'll deal with that first. |
Looked at this a bit, and I think this is ok. We are intentionally being aggressive about duplicating multi-use loads because eliminating the dependency and reducing register pressure (assuming load-folding) is probably better for perf if this code is in a loop.
In this particular case, there seems to be an opportunity to commute the shufps masks in lowerShuffleWithSHUFPS() in the case where we create 2 shufps ops. I'm guessing that's a very rare occurrence, so not sure if it's worth a TODO comment/bug report.