This is an archive of the discontinued LLVM Phabricator instance.

[RISCV][ISel] improved compressed instruction use
Needs ReviewPublic

Authored by dybv-sc on Aug 22 2022, 2:09 AM.

Details

Summary

Helping to emit compressed BNEZ in comparasion with constant when possible.

Addressing following issue: https://github.com/llvm/llvm-project/issues/56393

Diff Detail

Event Timeline

dybv-sc created this revision.Aug 22 2022, 2:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2022, 2:09 AM
dybv-sc requested review of this revision.Aug 22 2022, 2:09 AM
dybv-sc updated this revision to Diff 454424.Aug 22 2022, 2:14 AM

Fixed missing commit

dybv-sc edited the summary of this revision. (Show Details)Aug 22 2022, 2:29 AM
dybv-sc added reviewers: reames, asb.

As noted in the bug, this increases the critical path length in some cases. Have you benchmarked this?

For immediates that fit in c.li the new sequence might be larger. c.li works for all registers. c.bnez only works for x8-x15 and short distances.

Sorry for long silence.
I've benchmarked SPEC with llvm-test-suite on Alibaba THead machine and found out that there is slight performance downgrade with this substitution. I've isolated one case:

lw      s0, 0(a0)
li      a2, 101
ld      a0, 0(a1)
slliw   a1, s0, 1
addw    a1, a1, s0
sw      a1, 0(a0)
blt     s0, a2, .LBB0_2

transforms to:

lw      s0, 0(a0)
ld      a0, 0(a1)
slti    a2, s0, 101
slliw   a1, s0, 1
addw    a1, a1, s0
sw      a1, 0(a0)
bnez    a2, .LBB0_2

Being put in a hot loop the latter one adds 7 cycles more to each iteration. I found out that It does not affect branch predictor or cache, so there must be a pipeline stall happening here. I'll investigate this further.

So, after more running more spec tests in different modes (train and ref) on different RISCV boards (SiFive and THead) I got mixed results on performance. Performance increase on number on tests was insignificant while on other there was a slight decrease. On average performance declined by 0.5%. On the other hand, size reduction can be seen uniformly among all tests. On average it is 20 less bytes or 0.04% of size reduction. I think these amounts can't justify the performance cost.
Considering that some performance reductions are platform specific (like the one I mentioned in previous comment) and rely on internal architecture features, it is not seem possible to come up with general solution here. And more specialized ones will require more time and effort. And possible 0.04% code size reduction just not worth it.
What do you think?

So, after more running more spec tests in different modes (train and ref) on different RISCV boards (SiFive and THead) I got mixed results on performance. Performance increase on number on tests was insignificant while on other there was a slight decrease. On average performance declined by 0.5%. On the other hand, size reduction can be seen uniformly among all tests. On average it is 20 less bytes or 0.04% of size reduction. I think these amounts can't justify the performance cost.
Considering that some performance reductions are platform specific (like the one I mentioned in previous comment) and rely on internal architecture features, it is not seem possible to come up with general solution here. And more specialized ones will require more time and effort. And possible 0.04% code size reduction just not worth it.
What do you think?

Thanks for collecting the data. I agree, it sounds like its not worth it.