This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Add support for div, ldiv, lldiv, and imaxdiv folding
Needs ReviewPublic

Authored by msebor on Jul 8 2022, 1:22 PM.

Details

Summary

The div function and its kin were originally introduced in C89 for portability and efficiency. The portability argument has mostly gone by the wayside in C99 when the semantics of division and modulo were tightened up; as compilers have become better at optimizing integer division the efficiency argument for this API has become weaker as well. Nevertheless, the functions are still in the standards and being used, albeit not very often. This change adds basic support for folding calls to these functions with constant arguments. It assumes that struct div members are laid out in the order they're described. The standards don't require this but I don't know of any systems where the assumption doesn't hold. If there's a concern with this assumption then the code could be made conditional based on the result of a configury test.

This simplistic solutions leaves room for improvements, such as:

  1. Fold div(X, 1) to {X, 0} for any X (and possibly also div(X, -1) to {-X, 0}).
  2. Fold div(X, Y).quot to X / Y and div(X, Y).rem to X % Y for any X and Y when only one of the struct members is used.

I would have wanted to implement (1) in the initial patch, mostly as a learning exercise, but couldn't find a simple way to create an initializer for a nonconstant struct like there is for a constant one with ConstantStruct::get(). Any suggestions?

(2) seems outside the scope of this initial solution but it too looks like an interesting learning opportunity, so I'd welcome pointers here as well.

Diff Detail

Event Timeline

msebor created this revision.Jul 8 2022, 1:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2022, 1:22 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
msebor requested review of this revision.Jul 8 2022, 1:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2022, 1:22 PM
nikic added a comment.Jul 8 2022, 1:33 PM

Uh, does this actually work in end-to-end tests? From a quick check, on x86_64 the return ABI for div() is actually a packed i64 value: https://c.godbolt.org/z/EvTEdKMYG On aarch64 ldiv() actually returns a [2 x i64]: https://c.godbolt.org/z/W9v1YYbsM

This seems like a morass of different ABIs that your code doesn't account for.

nikic added a comment.Jul 8 2022, 1:38 PM

I would have wanted to implement (1) in the initial patch, mostly as a learning exercise, but couldn't find a simple way to create an initializer for a nonconstant struct like there is for a constant one with ConstantStruct::get(). Any suggestions?

The general pattern for this is to start with a PoisonValue and then use a chain of CreateInsertValue to insert values into the struct. https://github.com/llvm/llvm-project/blob/adf1ffe95854a245cbc48bbaea55f60b003d5f76/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp#L794 is very similar to your case (this one creates and {poison, 0} struct and then insertvalues into the first value).

msebor added a comment.EditedJul 8 2022, 3:27 PM

Thanks for the pointer to D65457!

I didn't expect calling convention differences to be visible this early in the middle end (they're not in GCC). That seems unfortunate, but at least I learned something new. That said, on some of the targets mentioned in D129396 such as arm or i386, LibCallSimplifier::optimizeDiv isn't invoked for calls to div because they pass the function three arguments rather than the expected two (a pointer to struct div_t to fill with the result, followed by the two expected integers), and so those calls are not recognized (TargetLibraryInfo::getLibFunc() returns false). So while those targets don't benefit from the change I wouldn't expect them to be adversely affected by it. Unless this isn't the only other calling convention, presumably support for it could be added in a follow-up. If there are others then adding a built-in for it might be an option to explore. Or, if div et al. aren't worth the trouble, maybe I should just add tests to catch this futile attempt for an enhancement early in the future and keep others from wasting time with it.

The patch does cause a couple of test failures that I neglected to mention:

Failed Tests (2):
  LLVM :: tools/llvm-tli-checker/ps4-tli-check.yaml
  LLVM-Unit :: Analysis/./AnalysisTests/TargetLibraryInfoTest/ValidProto

They don't look like they're obviously due to ABI differences but I haven't looked into them in any detail to fully understand them yet. My guess is that not all targets (e.g., PlayStation 4?) support all four functions and that the patch is missing some conditionals to avoid assuming all four have the expected semantics.

nikic added a comment.Jul 11 2022, 2:12 AM

Thanks for the pointer to D129396!

I didn't expect calling convention differences to be visible this early in the middle end (they're not in GCC). That seems unfortunate, but at least I learned something new. That said, on some of the targets mentioned in D129396 such as arm or i386, LibCallSimplifier::optimizeDiv isn't invoked for calls to div because they pass the function three arguments rather than the expected two (a pointer to struct div_t to fill with the result, followed by the two expected integers), and so those calls are not recognized (TargetLibraryInfo::getLibFunc() returns false). So while those targets don't benefit from the change I wouldn't expect them to be adversely affected by it. Unless this isn't the only other calling convention, presumably support for it could be added in a follow-up.

In my first comment above I shared two examples that use a different two argument ABI. In one case, i64 is used instead of { i32, i32 } and in the other [2 x i64] is used instead of { i64, i64 }. I would expect your current code to crash for this due to a RAUW type mismatch. Now, you could add a return type check to getLibFunc() and ensure that the return type is indeed a struct, in which case I think your code would work for the platforms that do use the straightforward ABI.

Solving this on the Clang side sounds a good bit more robust, but I'm not familiar with how that would look like from a technical perspective.

msebor added a comment.EditedJul 11 2022, 9:57 AM

I see. It's valid, for example, to declare { i32, i32 } @div(i32, i32) like the test does but that's not necessarily the same as the div declaration that Clang might emit for div_t div(int, int) on some targets. This lets the test pass even on the same target as with an incompatible div and even though the corresponding C test would not (Clang would presumably issue -Wincompatible-library-redeclaration). To catch these problems tests that exercise the C interface to a library would need to be written in C and ideally run for all supported targets.

I suppose one lesson here is that the functions that are handled in SimplifyBuiltins.cpp are treated as just ordinary functions, not as special intrinsics like they are in GCC where most of them have the same declarations across all targets and where it isn't possible to declare one that doesn't match.

Let me look into the idea of declaring div as a true intrinsic and see if that simplifies things.

msebor updated this revision to Diff 453754.Aug 18 2022, 1:04 PM

Revision 2 with the following changes:

  • Enhance the signature validation to cover known forms for all supported targets.
  • Likewise, enhance the folder code to handle all known forms of the function signatures.
  • Add tests.
  • Adjust TargetLibraryInfoTest.cpp to cover the functions.

I manually verified the sanity of the approach with a number of targets representative of the recognized forms of the function declaration and tried to capture them all in the comprehensive tests. The patch passes all tests on x86_64-linux with one failure:

Failed Tests (1):
  LLVM :: tools/llvm-tli-checker/ps4-tli-check.yaml

I can guess the failure has to do with what's hardcoded somewhere for the ps4 target but not much more. There's a comment in ps4-tli-check.yaml that reads:

## The -COUNT suffix doesn't care if there are too many matches, so check
## the exact count first; the two directives should add up to that.
## Yes, this means additions to TLI will fail this test, but the argument
## to -COUNT can't be an expression.

There's no suggestion on how the failure should be fixed in this instance and nothing I tried worked (e.g., the simple approach from D114881) so I'm afraid I'll need some handholding here.

majnemer added inline comments.
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
2703

nit: you can use isa here instead of dyn_cast

xbolva00 added inline comments.Aug 18 2022, 2:02 PM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
2668

Strong assumption? Is this order defined by C standard?

msebor added inline comments.Aug 25 2022, 8:06 AM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
2668

No (as noted in the Summary and in D65457#1606877).

The implementations I reviewed (Android, Darwin, FreeBSD, Glibc, HP-UX, Illumos/Solaris, Newlib, OpenBSD, QNX, and Windows), as well as LLVM's own libc, all define the struct as it's commonly documented. A Code Search query also didn't reveal anything unexpected. But to be absolutely sure the transformation could either be parameterized on the result of a configury test or enabled only for the systems where the layout is known. I don't know how to do the former yet but I'm open to either if someone feels it's necessary.

2703

Right, will do, thanks!

Ping: Are there any comments on the substance of this change?

nikic added a subscriber: efriedma.

This looks reasonable to me at a glance. Adding @efriedma for some additional feedback on this kind of very ABI dependent fold.

Given the ABI-dependent nature, it might be worthwhile to add some C tests for these functions to llvm-test-suite, so we know if some of the assumptions here break down.

llvm/lib/Analysis/TargetLibraryInfo.cpp
1018

Any reason why this one branch checks bitwidth >= 16, but we accept any for the others?

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
2664

I would consider it slightly more elegant to call do a !RetTy check instead on the result of getParamStructRetType(), as that is the value we end up relying on.

2670

Shouldn't we check that StructTy has the expected form? The libfunc verification doesn't check how the sret type looks like.

2678

This could be an assert (to make sure the TLI check is in sync).

2712

Arg0Idx -> LHSIdx or NumIdx or something? It's confusing that Arg0 and Arg0Idx don't refer to the same argument.

2720

Missing return?

2732

If I'm seeing this right, we don't guard against > 64-bit integers anywhere (should probably be part of the libcall check? Or else, this code could just keep working on APInt and not care about bit widths -- that would be more idiomatic.)

There really isn't any reason to preserve the call to div() in the IR, even if we can't fold it. If you use the normal LLVM IR div/rem instructions, usually we can optimize them to a more efficient sequence. Worst-case, the backend will just generate a call to __divmodsi4(), which is basically the same thing as div().

Given that, the lowering would be a lot simpler to implement in clang because we wouldn't need to worry about the ABI stuff. (And there shouldn't be code in languages other than C using div().)

msebor updated this revision to Diff 462574.Sep 23 2022, 1:06 PM
msebor marked an inline comment as done.

Changes in revision 3 of the patch addressing review comments:

  • Add missing return statement.
  • Remove test for argument bit width >= 16.
  • Use isa instead of dyn_cast.

I'm willing to look into handling this in the front end instead if/when I have time after this is done. But I'd prefer to land this change first, and then replace it, to abandoning it on the assumption that I will have the time. I believe this is an improvement as it is (and after D65457 the second such attempt to improve things). Let's not let perfect be the enemy of the good.

llvm/lib/Analysis/TargetLibraryInfo.cpp
1018

Probably either because I forgot to duplicate it in the others, or because I forgot to remove if from here when I remembered that none of the APIs that take long (or long long) is checked for its size because the middle end doesn't expose it (I noted it in this comment on D129915).

Beside this, none of these checks (including IntX) also test for the size being a power of 2. They all should be added, but rather than doing it piecemeal, it would be better handled comprehensively in a change of its own.

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
2670

The check is on lines 1052 through 1054 in isValidProtoForDivFunc.

2712

Ag0Idx corresponds to Arg0. There's no LHS in the code, so I think the name is just fine the way it is.

2720

Whoops! There sure is! Thank you!

2732

See my comment about Long (and LLong) checking above.

Since no target (that I know of) supports these functions with an argument wider than 64 bits, declaring and calling it with one would be undefined, so any slicing in that case would be fine.

I agree that (eventually) the TLI checking should enforce the limit(s) for all APIs that take long (and long long, and intmax_t), so using APInt for what in practice is never bigger than int64_t now would be unnecessary and inefficient (and inconsistent with what's used in most other places I've seen).