Same as other memory instructions, ds instructions add latency even if
exec is zero. Jumping over them if exec=0 is cheaper than executing
them.
With this change, the branch instruction that skips over a basic block
if exec=0 is not removed when the block contains a ds instruction.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
If I remember my cycle counts correctly, with an untaken branch this will be about breakeven for a single DS op?
Comment Actions
I believe this is about right.
However since we allow 12 instructions and rarely encounter a DS op in isolation, I think the branch is generally cheaper (assuming cache hit).
Comment Actions
Yes, that’s also what Carl concluded. Taking a s_cbranch_execz should be about the same amount of cycles as a ds_ instruction.
If we more than a ds_ instruction, taking the branch should be cheaper.