The current uidiv supports archs without clz. However, the asm is for thumb2/arm.
For uidivmod, the existing code calls the C version of uidivmodsi4, which then calls uidiv. The extra push/pop/bl makes it less efficient.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/builtins/arm/aeabi_uidivmod.S | ||
---|---|---|
29 ↗ | (On Diff #79943) | Nice using the shrink wrapping. Do we want to maintain stack alignment and push the frame pointer as well? |
31 ↗ | (On Diff #79943) | Its there a benefit to popping into r3? Or does the thumb 1 encoding only allow popping into low registers? |
lib/builtins/arm/udivsi3.S | ||
80 ↗ | (On Diff #79943) | Not your change, but do you mind creating a macro for the thumb2 case too? |
103 ↗ | (On Diff #79943) | Shouldn't the label be aligned, not the code? |
106 ↗ | (On Diff #79943) | Can we use or instead? |
174 ↗ | (On Diff #79943) | Please use defined. |
lib/builtins/arm/aeabi_uidivmod.S | ||
---|---|---|
29 ↗ | (On Diff #79943) | Since thumb1 has no ldrd/strd, I think it's OK. And I see existing code like aeabi_idivmod.S, aeabi_cdcmp.S pushes odd number of regs. |
31 ↗ | (On Diff #79943) | right. can only pop to R0-R7 and PC |
lib/builtins/arm/udivsi3.S | ||
80 ↗ | (On Diff #79943) | yes. I did plan to do. |
103 ↗ | (On Diff #79943) | yes, I thought the same. But I tried and didn't work. Will do more digging. |
106 ↗ | (On Diff #79943) | in Thumb1, ORR doesn't support immediate values. |
174 ↗ | (On Diff #79943) | will do |
Updated per Saleem's comments. For adc label alignment, not sure if it's the best way to do.
lib/builtins/arm/udivsi3.S | ||
---|---|---|
200 ↗ | (On Diff #80095) | Can you format this please? (clang-format). |
254 ↗ | (On Diff #80095) | Any reason to not just use the single definition of this label? (as in just move it for both cases. |
106 ↗ | (On Diff #79943) | Hmm, do we have some way to guarantee that the thumb bit would never be set by the linker? (at least with link, the linker will set the thumb bit on the relocation target if it is in the code section on a thumb block). |
lib/builtins/arm/udivsi3.S | ||
---|---|---|
200 ↗ | (On Diff #80095) | Will do. I thought clang-format was only for C/C++. |
254 ↗ | (On Diff #80095) | Right, we don't need two definitions. Will remove this. |
106 ↗ | (On Diff #79943) | this is the same as the thumbe2 version (lne70-71) if __ARM_ARCH_ISA_THUMB == 2adr ip, LOCAL_LABEL(div0block) + 1 Since it's PC-relative, there should be no relocation entry for this. |
clang-formatted the .h file but it doesn't support .S. Even if I force it to format, the output looks not very nice.
A .S file is run through the C preprocessor. You are defining a macro. That macro should be something that clang-format can format just fine (in fact, I had done just that for the Thumb2 case).
Thanks. I just picked the change on the macro def from git-clang-format output. I also replace the space with tab as the rest of the file uses tab for indention.
Thanks for the various revisions. I don't think that the tests for the built-ins are run on the bots, so please do run them locally again before committing.
The default CMake file uses -march=armv7, so the "thumb1" path is not tested by check-all.
Locally, I changed the CMake to let udivsi3.S to be used for armv6m.
Apparently this and it's followup commit rL288777 cause problems for different ARM ISAs on FreeBSD, and __udivsi3 is even completely broken: https://github.com/strejda/tegra/commit/30914ba688cb987240e22b2c7641463446527783
In FreeBSD, @strejda committed this rather big hammer to work around it: https://reviews.freebsd.org/rS314907, basically disabling all thumb support. :(
I would certainly not object to reverting rL288710 for now, and put up a new review to handle all concerns. @strejda, you already had a bunch of suggestions in your github commit, right?
Ouch, that's not good.
I would certainly not object to reverting rL288710 for now, and put up a new review to handle all concerns. @strejda, you already had a bunch of suggestions in your github commit, right?
We can do that offline. I'm OOO, so if anyone could revert the patch(es) for now, that'd be great. If not, I'll do when I'm back home.
--renato
Most of the changes are guarded by __ARM_ARCH_ISA_THUMB == 1 and I reviewed again of the "else" part and they look OK except for one suspect:
https://reviews.llvm.org/D30867
No, unfortunately, _ _ARM_ARCH_ISA_THUMB == n can't be used as selector for Thumb1 (or Thumb2) code.
Value of _ _ARM_ARCH_ISA_THUMB isn't based on actual compilation mode (-mthumb, -marm), its based on given CPU (or arch) capabilities.
Thus, if we compile library with '-marm -march=armv6' then _ _ARM_ARCH_ISA_THUMB is set to '1'. So we ends with Thumb1 code compiled using ARM ISA. Not a nice result, right?
So D30867 cannot help us.
I found these problems:
- Thumb[1][2] code selection is broken, "(_ _ARM_ARCH_ISA_THUMB == n && (defined(_ _THUMBEL_ _) || defined(_ _THUMBEB_ _))" is right condition. See: https://github.com/strejda/tegra/commit/91057bb46e54e350da87e354aed3189d469da458#diff-a8e5a4d82eadfcc89b4f54d6755a46b4L97
This one is fatal for FreeBSD.
- all Thumb1 and some Thumb2 functions are not properly decorated by DEFINE_COMPILERRT_THUMB_FUNCTION() This breaks debugging.
- usage of '.thumb' directive is inconsistent across files
- Thumb 1 version of __udivsi3 is broken and doesn't not work at all. See:
cmp r0, r1, lsl IMM shift addhs r3, r3, IMM (1 << shift) subhs r0, r0, r1, lsl IMM shift
and
lsls r2, r1, IMM shift cmp r0, r2 blo 1f subs r0, r0, r2 1: adcs r3, r3
- There is no Thumb1 variant of __udivmodsi4
Great! Thanks for finding out the root cause. Will you pull the github patch into upstream repo?
Regarding udivmodi4.S, we can work on it. Currently, the thumb1 lib can use the C version (udivmodsi4.c ), which simply calls __udivsi3 with extra mul and sub . Since we implemented _udivsi3 in asm, the perf won't be too bad.
Sure. Give me some time to rebase (and polish) patch to the current upstream. I prefer to create a new review for it because I'm still not sure if all bits of patch are acceptable for upstream.