As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs).
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/X86/X86.td | ||
---|---|---|
427–430 ↗ | (On Diff #196446) | Is there a possibility that we would use this for scalar transforms too? If not, better to make this explicitly about vectors: |
lib/Target/X86/X86.td | ||
---|---|---|
427–430 ↗ | (On Diff #196446) | Yes AMD targets at least should benefit from the scalar case as well - I'll investigate. |