This is an archive of the discontinued LLVM Phabricator instance.

[LoongArch] Add codegen support for division operations
ClosedPublic

Authored by SixWeining on Jun 24 2022, 8:37 PM.

Details

Summary

These operations include sdiv/udiv/srem/urem.

As the ISA [https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#_div_wudu_mod_wudu]
described, when the divisor is 0, the result can be any value, but no
exception will be triggered. Unlike gcc, which by default emit code
that checks divide-by-zero after the division or modulus instruction,
we only emit this check when the -loongarch-check-zero-division
option is passed.

Diff Detail

Event Timeline

SixWeining created this revision.Jun 24 2022, 8:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2022, 8:37 PM
SixWeining requested review of this revision.Jun 24 2022, 8:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2022, 8:37 PM
xry111 added inline comments.Jun 24 2022, 10:45 PM
llvm/test/CodeGen/LoongArch/ir-instruction/sdiv-udiv-srem-urem.ll
133

It looks suboptimal: "div.w $a0, $a0, $a1" should work so these two sign-extensions are not needed.

I'm not sure if it's easy to optimize this. If an optimization is not suitable for this revision, we can do it later.

Trapping division/modulus operations are signatures of MIPS codegen, and indeed here the trapping-by-default behavior and the flag seem to come from MIPS. However, as division-by-zero in LLVM IR is undefined behavior, why can't we just omit the trapping behavior altogether (and match RISCV in this regard), or at least disable the trapping by default?

SixWeining added inline comments.Jun 25 2022, 1:02 AM
llvm/test/CodeGen/LoongArch/ir-instruction/sdiv-udiv-srem-urem.ll
133

Yes. 32bit division can be optimized to div.w but we must make sure the inputs are sign extend values. This limitation is marked in Chinese ISA document but not in the English document. Maybe the English version is outdated.

define i32 @sdiv_i32(i32 %a, i32 %b) {
entry:
  %r = sdiv i32 %a, %b
  ret i32 %r
}
=>
; LA64-NOTRAP-NEXT:    addi.w $a1, $a1, 0
; LA64-NOTRAP-NEXT:    addi.w $a0, $a0, 0
; LA64-NOTRAP-NEXT:    div.w $a0, $a0, $a1
define i32 @sdiv_i32(i32 signext %a, i32 signext %b) {
entry:
  %r = sdiv i32 %a, %b
  ret i32 %r
}
=>
; LA64-NOTRAP-NEXT:    div.w $a0, $a0, $a1

Since this is an improvement to the codegen, let me implement it with seperate patch in future. Thanks.

xen0n added inline comments.Jun 25 2022, 1:20 AM
llvm/test/CodeGen/LoongArch/ir-instruction/sdiv-udiv-srem-urem.ll
133

Indeed; however the limitation seems hugely likely a hardware erratum, because almost any other 32-bit operation on LA64 silently ignores the upper bits. I think it's better to confirm with the hardware team, in case this is another error fixed in the translation.

Trapping division/modulus operations are signatures of MIPS codegen, and indeed here the trapping-by-default behavior and the flag seem to come from MIPS. However, as division-by-zero in LLVM IR is undefined behavior, why can't we just omit the trapping behavior altogether (and match RISCV in this regard), or at least disable the trapping by default?

Good question! This is what I have ever thought. In fact I don't know why mips needs to check zero-divide by default. riscv and aarch64 do not check and I'm not sure it is because their ISAs define zero-divide having an fixed result while Mips and LoongArch do not.
Currently the upstramed gcc checks zero-divide by default, so here I do as it to keep consistent.

xen0n added a comment.Jun 25 2022, 1:39 AM

I just experimented on 3A5000 and it seems the "undefined result" is all-zeros, and unfortunately the div.w/mod.w UB when input operand is non-canonical (i.e. non-sign-extended) 32-bit is indeed present; in this case the output is all-zeros too.

So the sign-extension is indeed necessary for inputs not statically known to be signext. The all-zeros in case of UB is less useful than RISCV's all-ones, in terms of expected semantics (we want ideally something near "infinity"), but it's UB after all, and 0 is equally okay here.

xen0n added a comment.Jun 25 2022, 1:41 AM

Trapping division/modulus operations are signatures of MIPS codegen, and indeed here the trapping-by-default behavior and the flag seem to come from MIPS. However, as division-by-zero in LLVM IR is undefined behavior, why can't we just omit the trapping behavior altogether (and match RISCV in this regard), or at least disable the trapping by default?

Good question! This is what I have ever thought. In fact I don't know why mips needs to check zero-divide by default. riscv and aarch64 do not check and I'm not sure it is because their ISAs define zero-divide having an fixed result while Mips and LoongArch do not.
Currently the upstramed gcc checks zero-divide by default, so here I do as it to keep consistent.

Might be better to just flip the switch on gcc upstream. @xry111 has kindly forwarded my comment to the Loongson GCC issue tracker so we could continue discussion there regarding gcc.

xen0n added a comment.Jun 25 2022, 1:45 AM

riscv and aarch64 do not check and I'm not sure it is because their ISAs define zero-divide having an fixed result while Mips and LoongArch do not.

Also, technically LoongArch could be revised to produce similarly defined results as RISC-V or AArch64, because anything is compatible with "undefined" so we won't break compatibility with the v1.00 ISA semantics. You could discuss this with your hardware team too if you want; IMO the RISC-V approach seems very reasonable, for example.

This comment was removed by xry111.
llvm/test/CodeGen/LoongArch/ir-instruction/sdiv-udiv-srem-urem.ll
133

GCC simply generates a div.w instruction.

xen0n added a comment.Jun 25 2022, 1:53 AM

I just experimented on 3A5000 and it seems the "undefined result" is all-zeros, and unfortunately the div.w/mod.w UB when input operand is non-canonical (i.e. non-sign-extended) 32-bit is indeed present; in this case the output is all-zeros too.

So the sign-extension is indeed necessary for inputs not statically known to be signext. The all-zeros in case of UB is less useful than RISCV's all-ones, in terms of expected semantics (we want ideally something near "infinity"), but it's UB after all, and 0 is equally okay here.

My mistake: I saw GCC generated "div.w $a0, $a0, $a1", but it only works because the ABI has made sure that the parameters are already signed-extended.

Indeed. I experimented further and it's a whole can of worms with non-canonical inputs to {div/mod}.w:

# a = 4294967396 (0x0000000100000064)
# b = 5 (0x0000000000000005)
div.w(a, b) = 0  # 0x0000000000000000
mod.w(a, b) = 0  # 0x0000000000000000

# a = 4294967396 (0x0000000100000064)
# b = -5 (0xfffffffffffffffb)
div.w(a, b) = 0  # 0x0000000000000000
mod.w(a, b) = 0  # 0x0000000000000000

# a = 100 (0x0000000000000064)
# b = 8589934597 (0x0000000200000005)
div.w(a, b) = 85899345920  # 0x0000001400000000
mod.w(a, b) = 100  # 0x0000000000000064

# a = 100 (0x0000000000000064)
# b = -8589934597 (0xfffffffdfffffffb)
div.w(a, b) = -85899345920  # 0xffffffec00000000
mod.w(a, b) = 100  # 0x0000000000000064

The result is even more unintelligible if the divisor is non-canonical... let's just ensure signext one way or another (ABI, signext, legalization, etc.) and never invoke the UB. It's not fun.

It seems I can't remove an inline comment...

llvm/test/CodeGen/LoongArch/ir-instruction/sdiv-udiv-srem-urem.ll
133

My mistake: I saw GCC generated "div.w $a0, $a0, $a1", but it only works because in the ABI parameters are already signed-extended.

disable trapping by default

SixWeining edited the summary of this revision. (Show Details)Jun 27 2022, 6:30 AM
xen0n accepted this revision.Jun 27 2022, 11:03 PM

LGTM, thanks!

(I've checked harder and apparently the weird "erratum" is in fact a wart carried over from MIPS. The MIPS ISA manual contain the very same wording regarding non-canonical inputs to 32-bit division/modulus operations. Let's just hope this gets fixed in future models...)

This revision is now accepted and ready to land.Jun 27 2022, 11:03 PM
xry111 accepted this revision.Jun 28 2022, 3:12 AM

I'll submit a patch to change GCC behavior later (after fix PR106096, which is blocking bootstrap).

LGTM, thanks!

(I've checked harder and apparently the weird "erratum" is in fact a wart carried over from MIPS. The MIPS ISA manual contain the very same wording regarding non-canonical inputs to 32-bit division/modulus operations. Let's just hope this gets fixed in future models...)

I guess the ALU of 3A5000 is not deviated too much from 3A4000, and it limits LA v1.00 to keep such division results undefined. It's usually a good thing to replace components one by one (instead of replacing many components in one shot) in a complex product like CPU though.

xen0n added inline comments.Jul 2 2022, 5:33 PM
llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
401

didn't notice before; this should be "mod".

SixWeining added inline comments.Jul 3 2022, 5:47 PM
llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
401

Thanks. Let me fix it.

SixWeining updated this revision to Diff 441988.Jul 3 2022, 5:54 PM

Change 'rem' to 'mod' in comments.

This revision was landed with ongoing or failed builds.Jul 6 2022, 2:55 AM
This revision was automatically updated to reflect the committed changes.