This is an archive of the discontinued LLVM Phabricator instance.

[LoongArch] Optimize bitwise and with immediates
ClosedPublic

Authored by benshi001 on Mar 31 2023, 9:33 PM.

Details

Summary

Optimize bitfield extractions retaining bit positions
from lu12i + addi + and to bstrpick + slli.

Diff Detail

Event Timeline

benshi001 created this revision.Mar 31 2023, 9:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2023, 9:33 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
benshi001 requested review of this revision.Mar 31 2023, 9:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2023, 9:33 PM
benshi001 added inline comments.Mar 31 2023, 9:38 PM
llvm/test/CodeGen/LoongArch/ir-instruction/and.ll
287–288

This is not optimized, since 0xff000 can be composed by a single lu12i.w.

325–326

This is not optimized, since 0xff000 can be composed by a single lu12i.w.

328–329

This is not optimized, since the constant is used twice.

384

This is not optimized, since the value -31 can be composed by a single ADDI.W.

benshi001 updated this revision to Diff 510175.Mar 31 2023, 9:52 PM
benshi001 added inline comments.Mar 31 2023, 9:55 PM
llvm/test/CodeGen/LoongArch/ir-instruction/and.ll
384

This is not optimized, since the value -2048 can be composed by a single ADDI.W.

xen0n added inline comments.Mar 31 2023, 11:56 PM
llvm/test/CodeGen/LoongArch/ir-instruction/and.ll
319–321

Seems wrong? The intended operation is to retain only a[15:4] so we should bstrpick.d $a0, $a0, 15, 4 to retain bits, then slli.d $a0, $a0, 4 to restore the LSB position. (LoongArch bstrpick invariably stores to rd's LSB side, and will not retain the original relative position.)

benshi001 updated this revision to Diff 510188.Apr 1 2023, 12:18 AM
benshi001 marked an inline comment as done.
benshi001 added inline comments.
llvm/test/CodeGen/LoongArch/ir-instruction/and.ll
319–321

Thanks. I should not make such as mistake :(

benshi001 marked an inline comment as done.Apr 1 2023, 12:18 AM
benshi001 updated this revision to Diff 510190.Apr 1 2023, 12:22 AM
xen0n accepted this revision.Apr 1 2023, 12:23 AM

Good catch, thanks!

IMO you could include as comments your explanations to existing cases where this optimization has not taken place (e.g. "This is not optimized into bstrpick + slli because the constant has multiple uses."), so future readers wouldn't have to do archaeology to see them. The code changes LGTM.

This revision is now accepted and ready to land.Apr 1 2023, 12:23 AM
xen0n added a comment.Apr 1 2023, 12:33 AM

I've just found there's no commit message. In general it could be helpful to include one or two short sentences describing the actual change, e.g. "Optimize bitfield extractions retaining bit positions with bstrpick + slli", because "optimize foo" otherwise doesn't carry useful information about the actual improvements.

llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
1386

nit: "constant"

Good catch, thanks!

IMO you could include as comments your explanations to existing cases where this optimization has not taken place (e.g. "This is not optimized into bstrpick + slli because the constant has multiple uses."), so future readers wouldn't have to do archaeology to see them. The code changes LGTM.

Thanks. I have updated https://reviews.llvm.org/D147367 with your suggestion !

benshi001 edited the summary of this revision. (Show Details)Apr 1 2023, 12:46 AM
benshi001 marked an inline comment as done.
benshi001 added inline comments.
llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
1386

I have fixed this typo in my local repo, and will be correct when committing. Thanks!

benshi001 marked an inline comment as done.Apr 1 2023, 12:47 AM
SixWeining added inline comments.Apr 2 2023, 6:03 AM
llvm/test/CodeGen/LoongArch/ir-instruction/and.ll
339–344

It's also a win if optimized to bstrpick + slli since we can save a register $a2. Right?

; LA64-NEXT:    bstrpick.d $a0, $a0, 15, 4
; LA64-NEXT:    slli.d $a0, $a0, 4
; LA64-NEXT:    bstrpick.d $a1, $a1, 15, 4
; LA64-NEXT:    slli.d $a1, $a1, 4
; LA64-NEXT:    mul.d $a0, $a0, $a1

But if the immediate 0xfff0 is used 3 times, we have 2 choices:

  1. Use less registers but one more instruction.
  2. Use less instructions but one more register.

I'm not sure how to balance this.

benshi001 added inline comments.Apr 2 2023, 7:19 AM
llvm/test/CodeGen/LoongArch/ir-instruction/and.ll
339–344

I am also not sure about the case of more then 2 uses. Maybe we make a convervative choice, just make it unchanged? And we only handle 1 and 2 uses ?

SixWeining added inline comments.Apr 2 2023, 6:51 PM
llvm/test/CodeGen/LoongArch/ir-instruction/and.ll
339–344

I agree with you.

benshi001 updated this revision to Diff 510377.Apr 2 2023, 6:59 PM
benshi001 marked 2 inline comments as done.
benshi001 added inline comments.
llvm/test/CodeGen/LoongArch/ir-instruction/and.ll
339–344

I have updated my code

  1. 1 and 2 uses are handled, more uses are rejected;
  2. corresponding tests are added for the above logic.
benshi001 marked an inline comment as done.Apr 2 2023, 7:01 PM
benshi001 added inline comments.
llvm/test/CodeGen/LoongArch/ir-instruction/and.ll
392

This is the case the immediate has 3 uses, which is not optimized.

xen0n accepted this revision.Apr 2 2023, 7:09 PM

LGTM from a cursory look, thanks! Although putting your informative replies into the code as comments would be even better, as others would see the rationale along with the code and not have to do archaeology themselves.

llvm/test/CodeGen/LoongArch/ir-instruction/and.ll
339–344

Fine with me. If we were to aim for the maximum performance possible then that'll have to be backed with actual micro-architecture details, so the optimizer can do the right thing for the right models. Less instructions executed usually can't hurt after all. (In my Go benchmarking experiences, loop alignment could have far more profound influence on the numbers than micro-optimization like this, so it's probably fine to not care as much here.)

Changes to test like llvm/test/CodeGen/LoongArch/alloca.ll are recovered?

benshi001 updated this revision to Diff 510381.Apr 2 2023, 7:35 PM

Changes to test like llvm/test/CodeGen/LoongArch/alloca.ll are recovered?

I have supplemented llvm/test/CodeGen/LoongArch/alloca.ll and other affected tests.

I am working in multiple threads mode, sorry for so many typos and spots. :(

LGTM from a cursory look, thanks! Although putting your informative replies into the code as comments would be even better, as others would see the rationale along with the code and not have to do archaeology themselves.

Thanks for your suggestion. I have added some informative comments to both c++ code and the test code.

xen0n added a comment.Apr 2 2023, 7:54 PM

LGTM from a cursory look, thanks! Although putting your informative replies into the code as comments would be even better, as others would see the rationale along with the code and not have to do archaeology themselves.

Thanks for your suggestion. I have added some informative comments to both c++ code and the test code.

Ah. I actually meant putting the justification for the "> 2 uses" (being conservative when faced with register pressure vs. instruction count blah blah), and it's actually fine to not comment what you did when the code is self-documenting. In general just document why and not what you do for a piece of code that may warrant such explanation.

benshi001 updated this revision to Diff 510390.Apr 2 2023, 8:45 PM

LGTM from a cursory look, thanks! Although putting your informative replies into the code as comments would be even better, as others would see the rationale along with the code and not have to do archaeology themselves.

Thanks for your suggestion. I have added some informative comments to both c++ code and the test code.

Ah. I actually meant putting the justification for the "> 2 uses" (being conservative when faced with register pressure vs. instruction count blah blah), and it's actually fine to not comment what you did when the code is self-documenting. In general just document why and not what you do for a piece of code that may warrant such explanation.

I have commented in the c++ code as follow

// Omit if the constant has more than 2 uses. This a conservative
// decision. Whether it is a win depends on the HW microarchitecture.
// However it should always be better for 1 and 2 uses.
if (CN->use_size() > 2)
  return SDValue();
xen0n accepted this revision.Apr 2 2023, 8:53 PM

LGTM from a cursory look, thanks! Although putting your informative replies into the code as comments would be even better, as others would see the rationale along with the code and not have to do archaeology themselves.

Thanks for your suggestion. I have added some informative comments to both c++ code and the test code.

Ah. I actually meant putting the justification for the "> 2 uses" (being conservative when faced with register pressure vs. instruction count blah blah), and it's actually fine to not comment what you did when the code is self-documenting. In general just document why and not what you do for a piece of code that may warrant such explanation.

I have commented in the c++ code as follow

// Omit if the constant has more than 2 uses. This a conservative
// decision. Whether it is a win depends on the HW microarchitecture.
// However it should always be better for 1 and 2 uses.
if (CN->use_size() > 2)
  return SDValue();

Looks good, thanks!

SixWeining accepted this revision.Apr 2 2023, 9:04 PM
This revision was landed with ongoing or failed builds.Apr 2 2023, 9:11 PM
This revision was automatically updated to reflect the committed changes.