This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
compiler-rt/lib/builtins/
-
lib/
-
builtins/
1/4
int_div_impl.inc
-
udivdi3.c
-
udivsi3.c
-
umoddi3.c
-
umodsi3.c

Differential D77912

[builtins] Make umodsi3/udivdi3/__umoddi3 standalone (shift and subtract)
ClosedPublic

Authored by MaskRay on Apr 10 2020, 3:34 PM.

Download Raw Diff

Details

Reviewers

kamleshbhalui
scanon

Commits

rGb541196eb45d: [builtins] Make __umodsi3/__udivdi3/__umoddi3 standalone (shift and subtract)

Summary

@kamleshbhalui reported that when the Standard Extension M
(Multiplication and Division) is disabled for RISC-V,
__udivdi3 will call __udivmodti4 which will in turn calls __udivdi3.

This patch moves __udivsi3 (shift and subtract) to int_div_impl.inc
__udivXi3, optimize a bit, add a __umodXi3, and use __udivXi3 and
__umodXi3 to define __udivsi3 __umodsi3 __udivdi3 __umoddi3.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

MaskRay created this revision.Apr 10 2020, 3:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2020, 3:34 PM

Herald added subscribers: Restricted Project, luismarques, s.egerton and 3 others. · View Herald Transcript

MaskRay mentioned this in D77744: __udivdi3 and __udivmoddi4 call each other and stuck in loop.Apr 10 2020, 3:35 PM

MaskRay edited the summary of this revision. (Show Details)

Harbormaster failed remote builds in B52748: Diff 256686!Apr 10 2020, 4:42 PM

Code review in D77744 is performance-wise better than this.

This revision now requires changes to proceed.Apr 10 2020, 6:53 PM

Add __builtin_clzll to speed up

Harbormaster failed remote builds in B52784: Diff 256743!Apr 10 2020, 11:09 PM

it's still less optimal.

This revision now requires changes to proceed.Apr 11 2020, 12:59 AM

In D77912#1975893, @kamleshbhalui wrote:

it's still less optimal.

How?

for this input
a=3325549423267495579 ,b=2351357301745665781
and compiled with -O2 with clang-9.
it takes 299 ns.
while D77744 takes only 91ns.

here is compiler detils i tried upon.

clang version 9.0.0 (tags/RELEASE_900/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/clang_9.0.0/bin

Re-purpose

In D77912#1976373, @kamleshbhalui wrote:

here is compiler detils i tried upon.

clang version 9.0.0 (tags/RELEASE_900/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/clang_9.0.0/bin

OK. I benchmarked __udivsi3. That version is actually slightly faster than D77744. I moved that function to int_div_impl.inc

Harbormaster failed remote builds in B52863: Diff 256868!Apr 12 2020, 12:48 PM

Thanks, LGTM.

This revision is now accepted and ready to land.Apr 12 2020, 6:33 PM

In D77912#1976893, @MaskRay wrote:

OK. I benchmarked __udivsi3. That version is actually slightly faster than D77744. I moved that function to int_div_impl.inc

What was the benchmark methodology?
I'm wondering if PC benchmark results make sense for the kinds of targets that are likely to benefit from these software implementations. For microcontrollers I also wonder if a simpler but smaller implementation might be a better trade-off, but I understand that the code came from udivsi3.c so that concern is arguably outside the scope of this patch.
Ideally this patch should have test changes that show its impact and how it fixes the problem.

In D77912#1978786, @luismarques wrote:

In D77912#1976893, @MaskRay wrote:

OK. I benchmarked __udivsi3. That version is actually slightly faster than D77744. I moved that function to int_div_impl.inc

What was the benchmark methodology?

int main() {
#ifdef B
  du_int s=1;
  for (int i=0; i <100000000; i++) {
    du_int a = xorshift64(&s);
    du_int b = xorshift64(&s);
    du_int c = div(a, b);
    asm volatile("" : : "r"(c));
  }
#else

  for (int a = 0; a < 2000; a++)
    for (int b = 1; b < 2000; b++) {
      if (div(a, b) != a/b) {
        printf("%d/%d\n", a, b);
        return 1;
      }
      if (mod(a, b) != a%b) {
        printf("%d%%%d\n", a, b);
        return 1;
      }
    }
#endif
}

-DB for benchmark. perf stat -r 30 ./a
The default for correctness. compiler-rt/test/builtins/Unit/ has some large number tests.

I'm wondering if PC benchmark results make sense for the kinds of targets that are likely to benefit from these software implementations. For microcontrollers I also wonder if a simpler but smaller implementation might be a better trade-off, but I understand that the code came from udivsi3.c so that concern is arguably outside the scope of this patch.

I don't think the performance matters that much. compiler-rt/lib/builtins as is is not optimized for every architecture which may lack division instructions. We probably want to go the libgcc extreme route.

Ideally this patch should have test changes that show its impact and how it fixes the problem.

compiler-rt/test/builtins/Unit/ has some tests. This is a runtime issue which only arises in some configurations. I believe riscv32 witht M extension can trigger the problem. It is not really feasible to have a unittest.

Closed by commit rGb541196eb45d: [builtins] Make __umodsi3/__udivdi3/__umoddi3 standalone (shift and subtract) (authored by MaskRay). · Explain WhyApr 14 2020, 10:45 AM

This revision was automatically updated to reflect the committed changes.

bjope added a subscriber: bjope.Apr 16 2020, 2:01 AM

bjope added inline comments.

compiler-rt/lib/builtins/int_div_impl.inc
2	This broke things for us downstream (with different char size). Do you mind if I change the condition to sizeof(a) * CHAR_BIT== 64 Btw,. shouldn't there be a file header in this file with copyright/license information etc (see http://llvm.org/docs/CodingStandards.html#file-headers)? How could that slip through review?

bjope added inline comments.Apr 16 2020, 7:32 AM

compiler-rt/lib/builtins/int_div_impl.inc
2	Suggested fixup: https://reviews.llvm.org/D78300

MaskRay marked an inline comment as done.Apr 16 2020, 8:25 AM

MaskRay added inline comments.

compiler-rt/lib/builtins/int_div_impl.inc
2	This broke things for us downstream (with different char size). Can you clarify the CHAR_BIT value on your platform? `sizeof(a) * CHAR_BIT == 64` does not make the code clearer. A better version than the current one will be `sizeof(a) == sizeof(unsigned long long)` Lots of places can have implication that CHAR_BIT=8. Importantly, POSIX CX requires CHAR_BIT to be 8. Supporting a CHAR_BIT!=8 may be impractical in practice... For your particular problem, if `sizeof(a) == sizeof(unsigned long long)` looks good to you, I can fix that. I am more concerned that supporting a different CHAR_BIT!=8 will require numerous fixups everywhere in LLVM and will impose huge mental burden for various places people use bitwise arithemtics and many other operations.

bjope added inline comments.Apr 17 2020, 12:57 AM

compiler-rt/lib/builtins/int_div_impl.inc
2	We certainly got some other fixes downstream for CHAR_BIT==16, so this was just "yet another" problem that popped up for something that has been working in the past. I can keep this as yet-another-size-of-char-fixup downstream, but thought it wouldn't hurt to make this change. I suggested using CHAR_BIT for two reasons: I was a bit fooled by the defines in int_lib.h that __builtin_clzll was expecting uint64_t as input. But now I realize that "unsigned long long" is the correct type for the argument. I got the feeling that `sizeof(...) * CHAR_BIT` was a quite common pattern in compiler-rt/lib/builtins. I have updated https://reviews.llvm.org/D78300 according to your suggestion. It makes sense considering (1) above. Maybe we can continue the discussion in that review.

MaskRay mentioned this in rG17772995d48b: [builtins] Add missing header in D77912 and make __builtin_clzll more robust.Apr 17 2020, 8:38 AM

Unfortunately, this patch violates copyright of the The PowerPC Compiler Writer's Guide. Please revert this patch, remove the copyrighted code, and you can submit a new patch for the other changes. For future patches that use guides with copyright, please consult with the LLVM Foundation board before committing. Thank you!

Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 12:06 PM

Herald added subscribers: luke957, StephenFan, arichardson. · View Herald Transcript

In D77912#3465400, @tonic wrote:

Unfortunately, this patch violates copyright of the The PowerPC Compiler Writer's Guide. Please revert this patch, remove the copyrighted code, and you can submit a new patch for the other changes. For future patches that use guides with copyright, please consult with the LLVM Foundation board before committing. Thank you!

If this request is related to "// Translated from Figure 3-40 of The PowerPC Compiler Writer's Guide" in the code, then I think it should apply to the very first commit
fd089990f76ec6392dbd0138ab874d90c632d2a4 (2009) by @ddunbar .

This patch merely copied this old comment. I don't think I am the appropriate one to take any action here.

Please also read the nature of the patch :)

In D77912#3465553, @MaskRay wrote:

In D77912#3465400, @tonic wrote:

Unfortunately, this patch violates copyright of the The PowerPC Compiler Writer's Guide. Please revert this patch, remove the copyrighted code, and you can submit a new patch for the other changes. For future patches that use guides with copyright, please consult with the LLVM Foundation board before committing. Thank you!

If this request is related to "// Translated from Figure 3-40 of The PowerPC Compiler Writer's Guide" in the code, then I think it should apply to the very first commit
fd089990f76ec6392dbd0138ab874d90c632d2a4 (2009) by @ddunbar .

This patch merely copied this old comment. I don't think I am the appropriate one to take any action here.

Please also read the nature of the patch :)

My apologies for missing that. I will be creating a patch to remove this then and other references.

In D77912#3465615, @tonic wrote:

In D77912#3465553, @MaskRay wrote:

In D77912#3465400, @tonic wrote:

Unfortunately, this patch violates copyright of the The PowerPC Compiler Writer's Guide. Please revert this patch, remove the copyrighted code, and you can submit a new patch for the other changes. For future patches that use guides with copyright, please consult with the LLVM Foundation board before committing. Thank you!

If this request is related to "// Translated from Figure 3-40 of The PowerPC Compiler Writer's Guide" in the code, then I think it should apply to the very first commit
fd089990f76ec6392dbd0138ab874d90c632d2a4 (2009) by @ddunbar .

This patch merely copied this old comment. I don't think I am the appropriate one to take any action here.

Please also read the nature of the patch :)

My apologies for missing that. I will be creating a patch to remove this then and other references.

I believe that the comment may be referring to the algorithm rather than the implementation itself as the book provides assembly listings for PPC.

In D77912#3465723, @compnerd wrote:

In D77912#3465615, @tonic wrote:

In D77912#3465553, @MaskRay wrote:

In D77912#3465400, @tonic wrote:

Unfortunately, this patch violates copyright of the The PowerPC Compiler Writer's Guide. Please revert this patch, remove the copyrighted code, and you can submit a new patch for the other changes. For future patches that use guides with copyright, please consult with the LLVM Foundation board before committing. Thank you!

If this request is related to "// Translated from Figure 3-40 of The PowerPC Compiler Writer's Guide" in the code, then I think it should apply to the very first commit
fd089990f76ec6392dbd0138ab874d90c632d2a4 (2009) by @ddunbar .

This patch merely copied this old comment. I don't think I am the appropriate one to take any action here.

Please also read the nature of the patch :)

My apologies for missing that. I will be creating a patch to remove this then and other references.

I believe that the comment may be referring to the algorithm rather than the implementation itself as the book provides assembly listings for PPC.

The LLVM Foundation legal counsel has advised us that it violates copyright.

In D77912#3465615, @tonic wrote:

My apologies for missing that. I will be creating a patch to remove this then and other references.

I believe that the comment may be referring to the algorithm rather than the implementation itself as the book provides assembly listings for PPC.

The LLVM Foundation legal counsel has advised us that it violates copyright.

Okay, thank you for checking.

lkail added a subscriber: lkail.Apr 22 2022, 1:20 AM

I'm working with the IBM legal team to see if there is any way we can get an exception or workaround for the copyright (these probably aren't the correct legal terms, but hopefully people understand the intention). I can update here once I have more information.

thesamesam added a subscriber: thesamesam.Apr 22 2022, 5:24 PM

lenary removed a subscriber: lenary.Apr 28 2022, 8:52 AM

Revision Contents

Path

Size

compiler-rt/

lib/

builtins/

58 lines

6 lines

47 lines

8 lines

6 lines

Diff 256868

compiler-rt/lib/builtins/int_div_impl.inc

This file was added.

				#define clz(a) (sizeof(a) == 8 ? __builtin_clzll(a) : __builtin_clz(a))

				bjopeUnsubmitted Not Done Reply Inline Actions This broke things for us downstream (with different char size). Do you mind if I change the condition to sizeof(a) * CHAR_BIT== 64 Btw,. shouldn't there be a file header in this file with copyright/license information etc (see http://llvm.org/docs/CodingStandards.html#file-headers)? How could that slip through review? bjope: This broke things for us downstream (with different char size). Do you mind if I change the…
				bjopeUnsubmitted Not Done Reply Inline Actions Suggested fixup: https://reviews.llvm.org/D78300 bjope: Suggested fixup: https://reviews.llvm.org/D78300
				MaskRayAuthorUnsubmitted Done Reply Inline Actions This broke things for us downstream (with different char size). Can you clarify the CHAR_BIT value on your platform? `sizeof(a) * CHAR_BIT == 64` does not make the code clearer. A better version than the current one will be `sizeof(a) == sizeof(unsigned long long)` Lots of places can have implication that CHAR_BIT=8. Importantly, POSIX CX requires CHAR_BIT to be 8. Supporting a CHAR_BIT!=8 may be impractical in practice... For your particular problem, if `sizeof(a) == sizeof(unsigned long long)` looks good to you, I can fix that. I am more concerned that supporting a different CHAR_BIT!=8 will require numerous fixups everywhere in LLVM and will impose huge mental burden for various places people use bitwise arithemtics and many other operations. MaskRay: > This broke things for us downstream (with different char size). Can you clarify the CHAR_BIT…
				bjopeUnsubmitted Not Done Reply Inline Actions We certainly got some other fixes downstream for CHAR_BIT==16, so this was just "yet another" problem that popped up for something that has been working in the past. I can keep this as yet-another-size-of-char-fixup downstream, but thought it wouldn't hurt to make this change. I suggested using CHAR_BIT for two reasons: I was a bit fooled by the defines in int_lib.h that __builtin_clzll was expecting uint64_t as input. But now I realize that "unsigned long long" is the correct type for the argument. I got the feeling that `sizeof(...) * CHAR_BIT` was a quite common pattern in compiler-rt/lib/builtins. I have updated https://reviews.llvm.org/D78300 according to your suggestion. It makes sense considering (1) above. Maybe we can continue the discussion in that review. bjope: We certainly got some other fixes downstream for CHAR_BIT==16, so this was just "yet another"…
				// Adapted from Figure 3-40 of The PowerPC Compiler Writer's Guide
				static __inline fixuint_t __udivXi3(fixuint_t n, fixuint_t d) {
				const unsigned N = sizeof(fixuint_t) * CHAR_BIT;
				// d == 0 cases are unspecified.
				unsigned sr = (d ? clz(d) : N) - (n ? clz(n) : N);
				// 0 <= sr <= N - 1 or sr is very large.
				if (sr > N - 1) // n < d
				return 0;
				if (sr == N - 1) // d == 1
				return n;
				++sr;
				// 1 <= sr <= N - 1. Shifts do not trigger UB.
				fixuint_t r = n >> sr;
				n <<= N - sr;
				fixuint_t carry = 0;
				for (; sr > 0; --sr) {
				r = (r << 1) \| (n >> (N - 1));
				n = (n << 1) \| carry;
				// Branch-less version of:
				// carry = 0;
				// if (r >= d) r -= d, carry = 1;
				const fixint_t s = (fixint_t)(d - r - 1) >> (N - 1);
				carry = s & 1;
				r -= d & s;
				}
				n = (n << 1) \| carry;
				return n;
				}

				// Mostly identical to __udivXi3 but the return values are different.
				static __inline fixuint_t __umodXi3(fixuint_t n, fixuint_t d) {
				const unsigned N = sizeof(fixuint_t) * CHAR_BIT;
				// d == 0 cases are unspecified.
				unsigned sr = (d ? clz(d) : N) - (n ? clz(n) : N);
				// 0 <= sr <= N - 1 or sr is very large.
				if (sr > N - 1) // n < d
				return n;
				if (sr == N - 1) // d == 1
				return 0;
				++sr;
				// 1 <= sr <= N - 1. Shifts do not trigger UB.
				fixuint_t r = n >> sr;
				n <<= N - sr;
				fixuint_t carry = 0;
				for (; sr > 0; --sr) {
				r = (r << 1) \| (n >> (N - 1));
				n = (n << 1) \| carry;
				// Branch-less version of:
				// carry = 0;
				// if (r >= d) r -= d, carry = 1;
				const fixint_t s = (fixint_t)(d - r - 1) >> (N - 1);
				carry = s & 1;
				r -= d & s;
				}
				return r;
				}

compiler-rt/lib/builtins/udivdi3.c

	//===-- udivdi3.c - Implement __udivdi3 -----------------------------------===//			//===-- udivdi3.c - Implement __udivdi3 -----------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file implements __udivdi3 for the compiler_rt library.			// This file implements __udivdi3 for the compiler_rt library.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "int_lib.h"			#include "int_lib.h"

				typedef du_int fixuint_t;
				typedef di_int fixint_t;
				#include "int_div_impl.inc"

	// Returns: a / b			// Returns: a / b

	COMPILER_RT_ABI du_int __udivdi3(du_int a, du_int b) {			COMPILER_RT_ABI du_int __udivdi3(du_int a, du_int b) {
	return __udivmoddi4(a, b, 0);			return __udivXi3(a, b);
	}			}

compiler-rt/lib/builtins/udivsi3.c

	//===-- udivsi3.c - Implement __udivsi3 -----------------------------------===//			//===-- udivsi3.c - Implement __udivsi3 -----------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file implements __udivsi3 for the compiler_rt library.			// This file implements __udivsi3 for the compiler_rt library.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "int_lib.h"			#include "int_lib.h"

	// Returns: a / b			typedef su_int fixuint_t;
				typedef si_int fixint_t;
				#include "int_div_impl.inc"

	// Translated from Figure 3-40 of The PowerPC Compiler Writer's Guide			// Returns: a / b

	// This function should not call __divsi3!			COMPILER_RT_ABI su_int __udivsi3(su_int a, su_int b) {
	COMPILER_RT_ABI su_int __udivsi3(su_int n, su_int d) {			return __udivXi3(a, b);
	const unsigned n_uword_bits = sizeof(su_int) * CHAR_BIT;
	su_int q;
	su_int r;
	unsigned sr;
	// special cases
	if (d == 0)
	return 0; // ?!
	if (n == 0)
	return 0;
	sr = __builtin_clz(d) - __builtin_clz(n);
	// 0 <= sr <= n_uword_bits - 1 or sr large
	if (sr > n_uword_bits - 1) // d > r
	return 0;
	if (sr == n_uword_bits - 1) // d == 1
	return n;
	++sr;
	// 1 <= sr <= n_uword_bits - 1
	// Not a special case
	q = n << (n_uword_bits - sr);
	r = n >> sr;
	su_int carry = 0;
	for (; sr > 0; --sr) {
	// r:q = ((r:q) << 1) \| carry
	r = (r << 1) \| (q >> (n_uword_bits - 1));
	q = (q << 1) \| carry;
	// carry = 0;
	// if (r.all >= d.all)
	// {
	// r.all -= d.all;
	// carry = 1;
	// }
	const si_int s = (si_int)(d - r - 1) >> (n_uword_bits - 1);
	carry = s & 1;
	r -= d & s;
	}
	q = (q << 1) \| carry;
	return q;
	}			}

	#if defined(__ARM_EABI__)			#if defined(__ARM_EABI__)
	COMPILER_RT_ALIAS(__udivsi3, __aeabi_uidiv)			COMPILER_RT_ALIAS(__udivsi3, __aeabi_uidiv)
	#endif			#endif

compiler-rt/lib/builtins/umoddi3.c

	//===-- umoddi3.c - Implement __umoddi3 -----------------------------------===//			//===-- umoddi3.c - Implement __umoddi3 -----------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file implements __umoddi3 for the compiler_rt library.			// This file implements __umoddi3 for the compiler_rt library.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "int_lib.h"			#include "int_lib.h"

				typedef du_int fixuint_t;
				typedef di_int fixint_t;
				#include "int_div_impl.inc"

	// Returns: a % b			// Returns: a % b

	COMPILER_RT_ABI du_int __umoddi3(du_int a, du_int b) {			COMPILER_RT_ABI du_int __umoddi3(du_int a, du_int b) {
	du_int r;			return __umodXi3(a, b);
	__udivmoddi4(a, b, &r);
	return r;
	}			}

compiler-rt/lib/builtins/umodsi3.c

	//===-- umodsi3.c - Implement __umodsi3 -----------------------------------===//			//===-- umodsi3.c - Implement __umodsi3 -----------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file implements __umodsi3 for the compiler_rt library.			// This file implements __umodsi3 for the compiler_rt library.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "int_lib.h"			#include "int_lib.h"

				typedef su_int fixuint_t;
				typedef si_int fixint_t;
				#include "int_div_impl.inc"

	// Returns: a % b			// Returns: a % b

	COMPILER_RT_ABI su_int __umodsi3(su_int a, su_int b) {			COMPILER_RT_ABI su_int __umodsi3(su_int a, su_int b) {
	return a - __udivsi3(a, b) * b;			return __umodXi3(a, b);
	}			}