This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add more addcarry tests
ClosedPublic

Authored by chfast on Nov 14 2019, 5:35 AM.

Event Timeline

chfast created this revision.Nov 14 2019, 5:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 14 2019, 5:35 AM
davezarzycki added a subscriber: craig.topper.

A few comments / questions:

  1. I'm not a regular contributor to LLVM, so please wait for somebody else (like @craig.topper, @RKSimon, or @spatel) to sign off on this.
  2. Once one has "add carry", one immediately wants "sub carry/borrow". Please consider adding tests to subcarry.ll.
  3. What is add256_1 trying to do and how is it different than add256_2?
  4. Finally, can you rename your examples to be more descriptive? In particular, if you look at early parts of the file, you'll see that there are a bunch of definitions with simple names like "add256" where 256 (etc) refers to native LLVM types, not aggregates. Perhaps names like the tests I added "sub_U320_without_i128_or" which strongly implies at the implementation strategy.
chfast updated this revision to Diff 229462.Nov 15 2019, 1:08 AM

Add comments more descriptive test names

A few comments / questions:

  1. I'm not a regular contributor to LLVM, so please wait for somebody else (like @craig.topper, @RKSimon, or @spatel) to sign off on this.
  2. Once one has "add carry", one immediately wants "sub carry/borrow". Please consider adding tests to subcarry.ll.

Yes, I can prepare similar tests for subtraction. Let me know if they should be include in this changeset.

  1. What is add256_1 trying to do and how is it different than add256_2?

They are now described in comments (also the order has been swapped). Let me know if the descriptions are good enough.

  1. Finally, can you rename your examples to be more descriptive? In particular, if you look at early parts of the file, you'll see that there are a bunch of definitions with simple names like "add256" where 256 (etc) refers to native LLVM types, not aggregates. Perhaps names like the tests I added "sub_U320_without_i128_or" which strongly implies at the implementation strategy.

Done.

Yes, please update subcarry.ll too.

And thanks, I was trying to see why the AND instruction was used by "add_U256_without_i128_or_recursive". It's not wrong but it is more complicated than it needs to be. If you are somebody else wants to add more pattern matching as a refinement, let's handle that in a subsequent patch.

My patch is designed to recognize the following pattern (which clang happens to generate but I didn't know that going in):

temp[n] = x[n] + y[n];
result[n] = temp[n] + ((temp[n-1] < x[n-1]) | (result[n-1] < temp[n-1]));

In English: "either the previous addition overflowed or the previous carry propagation overflowed (but they cannot both overflow)."

Because the two carries cannot both overflow at the same time, you can also use XOR or ADD to merge the carry flags, but ADD is more difficult to pattern match because the compiler can commute the additions.

Yes, please update subcarry.ll too.

I will.

And thanks, I was trying to see why the AND instruction was used by "add_U256_without_i128_or_recursive". It's not wrong but it is more complicated than it needs to be. If you are somebody else wants to add more pattern matching as a refinement, let's handle that in a subsequent patch.

Yes, that's reasonable. This implementation comes from my small library for 256-bit arithmetic. The 255-bit addition is defined in terms of uint128 type (custom, not __int128). I just inlined all the functions manually. Maybe there is a way to change my implementation in a way it will benefit from this PR.
Anyway, the only question for is if I should keep this test.

davezarzycki added a comment.EditedNov 15 2019, 5:20 AM

When you update the tests, please remove the URLs. Also, what do you see the tests adding above and beyond what already exist in the file? For example, the nested structures design only changes the operands to getelementptr, but it doesn't change how the add/sub carry optimization works. And the by-reference versus by-value tests are also independent of the add/sub carry optimization.

chfast updated this revision to Diff 229786.Nov 18 2019, 3:01 AM

Update addcarry tests and add a subcarry test

I ended up with optimizing my bigint library. The end result is - add_U256_without_i128_or_recursive which works better in current LLVM release and is also nicely reduced by D70079.
The analogous implementations of subtraction is added as sub_U256_without_i128_or_recursive test. This one should be interested to you @davezarzycki, your changes make it better, but it looks it is not perfect yet.

I ended up with optimizing my bigint library. The end result is - add_U256_without_i128_or_recursive which works better in current LLVM release and is also nicely reduced by D70079.
The analogous implementations of subtraction is added as sub_U256_without_i128_or_recursive test. This one should be interested to you @davezarzycki, your changes make it better, but it looks it is not perfect yet.

Thanks for the update. Where is the original source for sub_U256_without_i128_or_recursive? The IR is strange. It seems to use two different strategies for merging carry flags.

And as a reminder, please remove the URLs from the test files.

chfast updated this revision to Diff 229826.Nov 18 2019, 6:05 AM

Remove URLs

Thanks for the update. Where is the original source for sub_U256_without_i128_or_recursive? The IR is strange. It seems to use two different strategies for merging carry flags.

I don't really have a standalone implementation for it at the moment. It is code for this procedure for uint256 type: https://github.com/chfast/intx/blob/master/include/intx/intx.hpp#L528-L532. It is the same as the one for add, just + replaced with - and < with > for checking the carry flag. But probably LLVM has disturbed it a bit because add is proffered over sub in some places.

Thanks for the update. Where is the original source for sub_U256_without_i128_or_recursive? The IR is strange. It seems to use two different strategies for merging carry flags.

I don't really have a standalone implementation for it at the moment. It is code for this procedure for uint256 type: https://github.com/chfast/intx/blob/master/include/intx/intx.hpp#L528-L532. It is the same as the one for add, just + replaced with - and < with > for checking the carry flag. But probably LLVM has disturbed it a bit because add is proffered over sub in some places.

Okay. I'm okay with these tests now. Please wait for somebody more experienced / authoritative too sign off though. Thanks!

Also, the tree-like recursion that generated sub_U256_without_i128_or_recursive will be extremely difficult to optimize. Please consider linear recursion instead.

spatel accepted this revision.Nov 18 2019, 7:59 AM

LGTM. We can always adjust/add tests in the follow-ups to the code for overflow/carry.

llvm/test/CodeGen/X86/addcarry.ll
967

curry -> carry

This revision is now accepted and ready to land.Nov 18 2019, 7:59 AM
This revision was automatically updated to reflect the committed changes.