This is an archive of the discontinued LLVM Phabricator instance.

builtins: Add ARM Thumb1 implementation for uidiv and uidivmod
ClosedPublic

Authored by weimingz on Dec 1 2016, 10:19 AM.

Details

Summary

The current uidiv supports archs without clz. However, the asm is for thumb2/arm.
For uidivmod, the existing code calls the C version of uidivmodsi4, which then calls uidiv. The extra push/pop/bl makes it less efficient.

Diff Detail

Repository
rL LLVM

Event Timeline

weimingz updated this revision to Diff 79943.Dec 1 2016, 10:19 AM
weimingz retitled this revision from to builtins: Add ARM Thumb1 implementation for uidiv and uidivmod.
weimingz updated this object.
weimingz added reviewers: rengolin, jmolloy.
weimingz added a subscriber: llvm-commits.
compnerd added inline comments.Dec 1 2016, 6:02 PM
lib/builtins/arm/aeabi_uidivmod.S
29 ↗(On Diff #79943)

Nice using the shrink wrapping.

Do we want to maintain stack alignment and push the frame pointer as well?

31 ↗(On Diff #79943)

Its there a benefit to popping into r3? Or does the thumb 1 encoding only allow popping into low registers?

lib/builtins/arm/udivsi3.S
80 ↗(On Diff #79943)

Not your change, but do you mind creating a macro for the thumb2 case too?

103 ↗(On Diff #79943)

Shouldn't the label be aligned, not the code?

106 ↗(On Diff #79943)

Can we use or instead?

174 ↗(On Diff #79943)

Please use defined.

weimingz added inline comments.Dec 1 2016, 6:39 PM
lib/builtins/arm/aeabi_uidivmod.S
29 ↗(On Diff #79943)

Since thumb1 has no ldrd/strd, I think it's OK. And I see existing code like aeabi_idivmod.S, aeabi_cdcmp.S pushes odd number of regs.

31 ↗(On Diff #79943)

right. can only pop to R0-R7 and PC

lib/builtins/arm/udivsi3.S
80 ↗(On Diff #79943)

yes. I did plan to do.

103 ↗(On Diff #79943)

yes, I thought the same. But I tried and didn't work. Will do more digging.

106 ↗(On Diff #79943)

in Thumb1, ORR doesn't support immediate values.

174 ↗(On Diff #79943)

will do

weimingz updated this revision to Diff 80095.Dec 2 2016, 10:37 AM

Updated per Saleem's comments. For adc label alignment, not sure if it's the best way to do.

compnerd added inline comments.Dec 2 2016, 8:48 PM
lib/builtins/arm/udivsi3.S
200 ↗(On Diff #80095)

Can you format this please? (clang-format).

254 ↗(On Diff #80095)

Any reason to not just use the single definition of this label? (as in just move it for both cases.

106 ↗(On Diff #79943)

Hmm, do we have some way to guarantee that the thumb bit would never be set by the linker? (at least with link, the linker will set the thumb bit on the relocation target if it is in the code section on a thumb block).

weimingz added inline comments.Dec 3 2016, 10:30 PM
lib/builtins/arm/udivsi3.S
200 ↗(On Diff #80095)

Will do. I thought clang-format was only for C/C++.

254 ↗(On Diff #80095)

Right, we don't need two definitions. Will remove this.

106 ↗(On Diff #79943)

this is the same as the thumbe2 version (lne70-71)

if __ARM_ARCH_ISA_THUMB == 2

adr  ip, LOCAL_LABEL(div0block) + 1

Since it's PC-relative, there should be no relocation entry for this.

weimingz updated this revision to Diff 80213.Dec 4 2016, 11:45 AM

clang-formatted the .h file but it doesn't support .S. Even if I force it to format, the output looks not very nice.

compnerd edited edge metadata.Dec 4 2016, 7:51 PM

A .S file is run through the C preprocessor. You are defining a macro. That macro should be something that clang-format can format just fine (in fact, I had done just that for the Thumb2 case).

weimingz updated this revision to Diff 80225.Dec 4 2016, 8:16 PM
weimingz edited edge metadata.

Thanks. I just picked the change on the macro def from git-clang-format output. I also replace the space with tab as the rest of the file uses tab for indention.

compnerd accepted this revision.Dec 5 2016, 7:47 AM
compnerd edited edge metadata.

Thanks for the various revisions. I don't think that the tests for the built-ins are run on the bots, so please do run them locally again before committing.

This revision is now accepted and ready to land.Dec 5 2016, 7:47 AM
rengolin edited edge metadata.Dec 5 2016, 8:04 AM

Thanks for the various revisions. I don't think that the tests for the built-ins are run on the bots, so please do run them locally again before committing.

Wait, aren't them executed when "make check-all" is used on compiler-rt?

Thanks for the various revisions. I don't think that the tests for the built-ins are run on the bots, so please do run them locally again before committing.

Wait, aren't them executed when "make check-all" is used on compiler-rt?

The default CMake file uses -march=armv7, so the "thumb1" path is not tested by check-all.
Locally, I changed the CMake to let udivsi3.S to be used for armv6m.

The default CMake file uses -march=armv7, so the "thumb1" path is not tested by check-all.
Locally, I changed the CMake to let udivsi3.S to be used for armv6m.

Ah, of course!

I manually build and run udivmodsi4_test.c and udivsi3_test.c, no error

This revision was automatically updated to reflect the committed changes.
dim added a subscriber: dim.Mar 7 2017, 10:50 AM

Apparently this and it's followup commit rL288777 cause problems for different ARM ISAs on FreeBSD, and __udivsi3 is even completely broken: https://github.com/strejda/tegra/commit/30914ba688cb987240e22b2c7641463446527783

Did someone revert this? I think we should until we're sure how to fix the problem.

dim added a subscriber: strejda.Mar 11 2017, 5:44 AM

Did someone revert this? I think we should until we're sure how to fix the problem.

In FreeBSD, @strejda committed this rather big hammer to work around it: https://reviews.freebsd.org/rS314907, basically disabling all thumb support. :(

I would certainly not object to reverting rL288710 for now, and put up a new review to handle all concerns. @strejda, you already had a bunch of suggestions in your github commit, right?

In D27309#698618, @dim wrote:

In FreeBSD, @strejda committed this rather big hammer to work around it: https://reviews.freebsd.org/rS314907, basically disabling all thumb support. :(

Ouch, that's not good.

I would certainly not object to reverting rL288710 for now, and put up a new review to handle all concerns. @strejda, you already had a bunch of suggestions in your github commit, right?

We can do that offline. I'm OOO, so if anyone could revert the patch(es) for now, that'd be great. If not, I'll do when I'm back home.

--renato

Most of the changes are guarded by __ARM_ARCH_ISA_THUMB == 1 and I reviewed again of the "else" part and they look OK except for one suspect:
https://reviews.llvm.org/D30867

Most of the changes are guarded by __ARM_ARCH_ISA_THUMB == 1 and I reviewed again of the "else" part and they look OK except for one suspect:
https://reviews.llvm.org/D30867

No, unfortunately, _ _ARM_ARCH_ISA_THUMB == n can't be used as selector for Thumb1 (or Thumb2) code.
Value of _ _ARM_ARCH_ISA_THUMB isn't based on actual compilation mode (-mthumb, -marm), its based on given CPU (or arch) capabilities.
Thus, if we compile library with '-marm -march=armv6' then _ _ARM_ARCH_ISA_THUMB is set to '1'. So we ends with Thumb1 code compiled using ARM ISA. Not a nice result, right?
So D30867 cannot help us.

I found these problems:

  • all Thumb1 and some Thumb2 functions are not properly decorated by DEFINE_COMPILERRT_THUMB_FUNCTION() This breaks debugging.
  • usage of '.thumb' directive is inconsistent across files
  • Thumb 1 version of __udivsi3 is broken and doesn't not work at all. See:
	cmp	r0, r1, lsl IMM shift
 	addhs	r3, r3, IMM (1 << shift)
 	subhs	r0, r0, r1, lsl IMM shift
and
	lsls	r2, r1, IMM shift
	cmp	r0, r2
	blo	1f
	subs	r0, r0, r2
1:
	adcs	r3, r3
  • There is no Thumb1 variant of __udivmodsi4

Most of the changes are guarded by __ARM_ARCH_ISA_THUMB == 1 and I reviewed again of the "else" part and they look OK except for one suspect:
https://reviews.llvm.org/D30867

No, unfortunately, _ _ARM_ARCH_ISA_THUMB == n can't be used as selector for Thumb1 (or Thumb2) code.
Value of _ _ARM_ARCH_ISA_THUMB isn't based on actual compilation mode (-mthumb, -marm), its based on given CPU (or arch) capabilities.
Thus, if we compile library with '-marm -march=armv6' then _ _ARM_ARCH_ISA_THUMB is set to '1'. So we ends with Thumb1 code compiled using ARM ISA. Not a nice result, right?
So D30867 cannot help us.

I found these problems:

  • all Thumb1 and some Thumb2 functions are not properly decorated by DEFINE_COMPILERRT_THUMB_FUNCTION() This breaks debugging.
  • usage of '.thumb' directive is inconsistent across files
  • Thumb 1 version of __udivsi3 is broken and doesn't not work at all. See:
	cmp	r0, r1, lsl IMM shift
 	addhs	r3, r3, IMM (1 << shift)
 	subhs	r0, r0, r1, lsl IMM shift
and
	lsls	r2, r1, IMM shift
	cmp	r0, r2
	blo	1f
	subs	r0, r0, r2
1:
	adcs	r3, r3
  • There is no Thumb1 variant of __udivmodsi4

Great! Thanks for finding out the root cause. Will you pull the github patch into upstream repo?
Regarding udivmodi4.S, we can work on it. Currently, the thumb1 lib can use the C version (udivmodsi4.c ), which simply calls __udivsi3 with extra mul and sub . Since we implemented _udivsi3 in asm, the perf won't be too bad.

Sure. Give me some time to rebase (and polish) patch to the current upstream. I prefer to create a new review for it because I'm still not sure if all bits of patch are acceptable for upstream.