This is an archive of the discontinued LLVM Phabricator instance.

[LoongArch] Add codegen support for the bitwise binary operations and part of other operations
ClosedPublic

Authored by SixWeining on Jun 7 2022, 4:39 AM.

Details

Summary

Reference:
https://llvm.org/docs/LangRef.html#bitwise-binary-operations
https://llvm.org/docs/LangRef.html#other-operations

The reason why other operations are implemented here is that some
bitwise binary operations depend on them. For example, on loongarch32,
shl over i64 data requires select.

Depends on D127200

Diff Detail

Event Timeline

SixWeining created this revision.Jun 7 2022, 4:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2022, 4:39 AM
SixWeining requested review of this revision.Jun 7 2022, 4:39 AM
SixWeining added inline comments.Jun 7 2022, 11:47 PM
llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
578

Typo. Should be sra.

579

Ditto. Should be srl.

SixWeining updated this revision to Diff 435069.Jun 8 2022, 1:12 AM

Add the missing update to shift-masked-shamt.ll

xen0n added a comment.Jun 8 2022, 11:31 PM

This is big, and I hope to finish reviewing within this week. Just commenting to let you know ;-)

This is big, and I hope to finish reviewing within this week. Just commenting to let you know ;-)

Thanks for your reviewing. Yes. This change is a bit big due to some dependencies.

SixWeining updated this revision to Diff 435763.Jun 9 2022, 7:00 PM

Remove the patch number in the title.

SixWeining retitled this revision from [LoongArch 3/n] Add codegen support for the bitwise binary operations and part of other operations to [LoongArch] Add codegen support for the bitwise binary operations and part of other operations.Jun 9 2022, 7:00 PM

adjust the dependency

A polite ping. Hi @xen0n, do you have any comments? Thanks.

The implementation and test cases LGTM in terms of LoongArch semantics, only some minor nits.

Other people may have to check for general LLVM coding practices and style though.

llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
153

nit: one space before TODO

llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td
168

nit: ditto

llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
226

nit: Chinglish; maybe just "Don't know how to legalize this operation" would suffice.

llvm/test/CodeGen/LoongArch/ir-instruction/and.ll
83

I suppose you mean removing the redundant test cases with known out-of-bounds operands here; do you plan to just remove them before landing?

llvm/test/CodeGen/LoongArch/ir-instruction/select-bare-flt.ll
11–12

movgr2cf only looks at the LSB, so the andi is unnecessary and we could save a cycle here. However this may well be because of some codegen details (I think it's due to codegen wanting to first extend i1 to ensure the upper bits have well-defined content), and this can be improved later. This patch is too large after all...

llvm/test/CodeGen/LoongArch/ir-instruction/select-bare-int.ll
10

Similarly here; this andi is unnecessary too in this case. ($a0 is the i1 %a, so I think it should be UB if upper bits contain interesting content; I'm not quite sure about this though. If it's indeed UB then the suggestion is appropriate.)

SixWeining marked an inline comment as done.Jun 15 2022, 11:42 PM
SixWeining added inline comments.
llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
153

OK. Thanks.

llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
226

OK, thanks.

llvm/test/CodeGen/LoongArch/ir-instruction/and.ll
83

Yes. Let me remove some redundant tests. Thanks.

llvm/test/CodeGen/LoongArch/ir-instruction/select-bare-flt.ll
11–12

Good idea! Let me record and improve it in future patch.

Sorry. Yes, this patch is big. How about let me try to split it into 3~5 small ones firstly?

llvm/test/CodeGen/LoongArch/ir-instruction/select-bare-int.ll
10

I think the andi 1 is necessary here because maskeqz and masknez will look at if $a0 is 0 or not. So we must make sure the upper bits are all zeros.

However, if we indicate the codegen that the parameter i1 %a is [[ https://llvm.org/docs/LangRef.html#parameter-attributes | zero-extended ]]before passed in, the andi 1 would be omitted.

xen0n added inline comments.Jun 16 2022, 12:44 AM
llvm/test/CodeGen/LoongArch/ir-instruction/and.ll
83

Thanks. This should make the diff much shorter ;-)

llvm/test/CodeGen/LoongArch/ir-instruction/select-bare-flt.ll
11–12

Regarding splitting commits, I think either is fine, as the majority of changes here are actually test cases. You can of course do so if not in a hurry. ;-)

llvm/test/CodeGen/LoongArch/ir-instruction/select-bare-int.ll
10

Hmm, my reasoning is that if the incoming register $a0 actually has non-zero bits set in the upper bits, the function signature is i1 %a for this parameter, so it would be UB if that's the case; hence doing anything is permissible, and we choose to ignore the bits (codegen-ing as if the non-0-or-1 inputs weren't possible) and just straight maskeqz/masknez/or away.

If this sounds okay, then we won't need the zero-extended marking after all.

Address @xen0n's comments that removing redundant tests. Thanks.

SixWeining added inline comments.Jun 16 2022, 2:17 AM
llvm/test/CodeGen/LoongArch/ir-instruction/xor.ll
82

Sorry, I forget to handle this. Wait me.

Remove redundant tests in xor.ll and add missing test or.ll.

SixWeining added inline comments.Jun 16 2022, 11:17 PM
llvm/test/CodeGen/LoongArch/ir-instruction/fcmp-dbl.ll
27

Sorry, this instruction is not necessary as the ISA says:

MOVCF2GR writes the value of the condition flag register cj into the lowest bit of the general register rd and clears the other bits.

Let me remove it.

remove unnecessary andi 1 after movcf2gr

xen0n added inline comments.Jun 16 2022, 11:33 PM
llvm/test/CodeGen/LoongArch/ir-instruction/fcmp-dbl.ll
27

Ah, I checked the original text, and unfortunately there seems to be a translation error:

MOVCF2GR 将条件标志寄存器 cj 的值写入通用寄存器 rd 的最低一比特。

GR[rd][0] = CFR[cj]

which means this instead, for anyone knowing Chinese:

MOVCF2GR writes the value of conditional flag register cj to the lowest bit of GPR rd.

this also agrees with the pseudo-code.

So, if we don't want UB in case the destination of movcf2gr cannot be statically proven to have a sane value, the andi is still needed... And the translation error should be fixed.

Thinking about the discrepancy harder, you may want to do an experiment (movcf2gr into a register pre-filled with some non-trivial content) to see which version is correct. I don't know if the translators actually *fixed* the original manual during translation (because they have no permission to alter the original document AFAIK).

Thinking about the discrepancy harder, you may want to do an experiment (movcf2gr into a register pre-filled with some non-trivial content) to see which version is correct. I don't know if the translators actually *fixed* the original manual during translation (because they have no permission to alter the original document AFAIK).

I alreay did such experiment before removing the andi. The actual result is of movcf2gr is GR[rd][0] = CFR[cj] && GR[rd][GRLen-1 .. 1] = 0 and I have confirmed with the author of the original Chinese document that it is realy an ambiguity. He originally wanted to express GR[rd][0] = CFR[cj],GR[rd][GRLEN-1:1]=0. This will be fixed in future document updates.

Okay. I've checked the QEMU sources for MOVCF2GR, it seems the English translation is the accurate one, and the Chinese original is ambiguous... Actually, according to common sense, one really doesn't want to depend on, and write to the same GPR in one single insn either. So the removal of andi is correct optimization.

Thinking about the discrepancy harder, you may want to do an experiment (movcf2gr into a register pre-filled with some non-trivial content) to see which version is correct. I don't know if the translators actually *fixed* the original manual during translation (because they have no permission to alter the original document AFAIK).

I alreay did such experiment before removing the andi. The actual result is of movcf2gr is GR[rd][0] = CFR[cj] && GR[rd][GRLen-1 .. 1] = 0 and I have confirmed with the author of the original Chinese document that it is realy an ambiguity. He originally wanted to express GR[rd][0] = CFR[cj],GR[rd][GRLEN-1:1]=0. This will be fixed in future document updates.

Yeah exactly. We're all fine then ;-)

Does anyone have more comments on this patch? We still have a lot of patches waiting to be submitted after this one getting approved and landed. Thanks. ;-)

xen0n accepted this revision.Jun 17 2022, 2:31 AM

I've previously went through the testcases a few days ago, and it seems no new problems cropped up. So LGTM LoongArch-wise; someone else should take a look to check the LLVM coding practices.

This revision is now accepted and ready to land.Jun 17 2022, 2:31 AM
This revision was landed with ongoing or failed builds.Jun 19 2022, 6:57 PM
This revision was automatically updated to reflect the committed changes.