This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Keep skip branch for ds instructions
ClosedPublic

Authored by sebastian-ne on Mar 4 2021, 1:45 AM.

Details

Summary

Same as other memory instructions, ds instructions add latency even if
exec is zero. Jumping over them if exec=0 is cheaper than executing
them.
With this change, the branch instruction that skips over a basic block
if exec=0 is not removed when the block contains a ds instruction.

Diff Detail

Event Timeline

sebastian-ne created this revision.Mar 4 2021, 1:45 AM
sebastian-ne requested review of this revision.Mar 4 2021, 1:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2021, 1:45 AM
arsenm added a comment.Mar 4 2021, 5:59 AM

If I remember my cycle counts correctly, with an untaken branch this will be about breakeven for a single DS op?

If I remember my cycle counts correctly, with an untaken branch this will be about breakeven for a single DS op?

I believe this is about right.
However since we allow 12 instructions and rarely encounter a DS op in isolation, I think the branch is generally cheaper (assuming cache hit).

Yes, that’s also what Carl concluded. Taking a s_cbranch_execz should be about the same amount of cycles as a ds_ instruction.
If we more than a ds_ instruction, taking the branch should be cheaper.

rampitec accepted this revision.Mar 4 2021, 11:20 AM
This revision is now accepted and ready to land.Mar 4 2021, 11:20 AM
This revision was automatically updated to reflect the committed changes.