This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Fixes *absdiff* intrinsic: LangRef doc/test case improvement and corresponding code change
ClosedPublic

Authored by ashahid on Jul 31 2015, 5:19 AM.

Details

Summary

This patch fixes the condition code for 'compare' llvm IR in the expansion of *absdiff* intrinsic.

LangRef doc is updated to reflect this change.

Test case is divided into two, based on the data width(128/256 bit). Also updated the tests to make it non-fragile.

Diff Detail

Repository
rL LLVM

Event Timeline

ashahid retitled this revision from to [CodeGen] Fixes *absdiff* intrinsic: LangRef doc/test case improvement and corresponding code change.
ashahid updated this object.
ashahid set the repository for this revision to rL LLVM.
ashahid added a subscriber: llvm-commits.
mzolotukhin requested changes to this revision.Jul 31 2015, 1:26 PM
mzolotukhin edited edge metadata.

Hi Shahid,

Please find some comments inline:

docs/LangRef.rst
10387–10390 ↗(On Diff #31118)

What's the difference between llvm.uabsdiff and llvm.sabsdiff then?

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
737 ↗(On Diff #31118)

AFAIU, this should be ISD::SETLE.

test/CodeGen/X86/absdiff_128.ll
150–152 ↗(On Diff #31118)

This CHECK-DAG doesn't make much sense, since it's limited by CHECK and CHECK-NEXT from both sides.

Moreover, I think the right way to make the tests less bristle is to not check for everything, but just look for key instructions.
For example, we definitely expect to see psubd, then, maybe after several other instructions, we want to see pcmpgt, then we want to see pand, pandn, and por. Thus, I'd write this test something like this:

CHECK: psubd
CHECK: pcmpgt
CHECK-DAG: pand // BTW, why do you have two `pandn` here?
CHECK-DAG: pandn
CHECK: por
CHECK: ret
206 ↗(On Diff #31118)

If we don't want to match any specific register here, we need to get rid of comments # xmm5 = xmm4... too.

test/CodeGen/X86/absdiff_256.ll
33 ↗(On Diff #31118)

This is still fragile.
Imagine that register allocator for some strange reason begins to use xmm5 instead of xmm6 and vice versa - this test will immediately fail.

Also, if you want to match pxor %xmmN, %xmmN, the correct way to write the regexp for it would be:

pxor [[SOMENAME:%xmm[0-9]+]], [[SOMENAME]]

This will ensure that pxor operates on the same register.

This revision now requires changes to proceed.Jul 31 2015, 1:26 PM
ashahid added inline comments.Aug 2 2015, 12:16 AM
docs/LangRef.rst
10387–10390 ↗(On Diff #31118)

The difference is the presence of NSW flag in case of llvm.sabsdiff.

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
737 ↗(On Diff #31118)

My bad... intention was to use Tmp1 instead of Tmp2. I will use the proper variable names to reflect the operations.

test/CodeGen/X86/absdiff_128.ll
150–152 ↗(On Diff #31118)

Thanks, this is a very important input.

For ISD::SETGE, X86 swaps the operand, consequently, in context of VSELECT it uses two "pandn".

206 ↗(On Diff #31118)

Ok

test/CodeGen/X86/absdiff_256.ll
33 ↗(On Diff #31118)

Ok

Hi James,

That is right, for Uint_max the current comparison will not be proper.

With larger data type do you mean promoting the given data type to larger type ex: MVT:i32 to MVT:i64
And then doing the expansion?

Regards,
Shahid

From: James Molloy [mailto:james@jamesmolloy.co.uk]
Sent: Monday, August 03, 2015 1:02 AM
To: reviews+D11678+public+e92bec0f352bb617@reviews.llvm.org; Shahid, Asghar-ahmad; james.molloy@arm.com; hfinkel@anl.gov; mzolotukhin@apple.com
Cc: llvm-commits@cs.uiuc.edu
Subject: Re: [PATCH] D11678: [CodeGen] Fixes *absdiff* intrinsic: LangRef doc/test case improvement and corresponding code change

I think uabsdiff needs to be expanded using a larger data type. If we have uabsdiff(uint_max, uint_max) , a signed comparison won't return the right result unless the bitwidth is expanded, right?

James

mzolotukhin added inline comments.Aug 3 2015, 12:55 PM
docs/LangRef.rst
10387–10390 ↗(On Diff #31118)

I still don't think it's correct. NSW is just a hint to optimizers, but it doesn't add any additional logic. It does assert that the expression won't overflow, but the operations we execute are still the same. That is, currently the only difference between signed and unsigned version is that for signed version we could get an undefined behavior in some cases. This is clearly incorrect, because we should get different results without undefined behavior in some cases (e.g. <-1,-1,-1,-1> and <1,1,1,1> - it should give <254,254,254,254> for uabsdiff.v4i8 and <2,2,2,2> for sabsdiff.v4i8).

What really should be the difference, as far is I understand, is condition code in the comparison:

%ispos = icmp sge <4 x i32> %sub, zeroinitializer

As far as I understand, we should use uge for unsigned and sge for signed case.

ashahid edited edge metadata.
ashahid removed rL LLVM as the repository for this revision.

Updated the patch to define the behavior of llvm.uabsdiff intrinsic. The corresponding doc, code & test cases are updated accordingly.

bruno added a subscriber: bruno.Aug 18 2015, 6:13 AM
bruno added inline comments.
docs/LangRef.rst
10713 ↗(On Diff #31646)

Space after the dot

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
747 ↗(On Diff #31646)

Remove the curly braces

ashahid edited edge metadata.

Updated the patch for Bruno's comments.

Hey there... ping.

Hi Shahid,

Thanks for working on this! Please find some questions/comments below:

docs/LangRef.rst
10750–10751 ↗(On Diff #32814)

Please specify what happens if the result overflows (e.g. llvm.sabsdiff.v4i8(<4 x i32> <-128, -128, -128, -128>, <4 x i32> <127, 127, 127, 127>)).

10756 ↗(On Diff #32814)

While we are here, could you please fix the typo here? (space before 'it' and capitalize the first letter)

test/CodeGen/X86/absdiff_256.ll
9 ↗(On Diff #32814)

A single CHECK-DAG between two CHECK statements has no effect (it works as a plain CHECK). Please fix that.

ashahid added inline comments.Aug 26 2015, 1:13 AM
docs/LangRef.rst
10750–10751 ↗(On Diff #32814)

Thanks for the catch, in this case the behavior is undefined and targets can define their own behavior. Does this make sense?

Hi Shahid,

Please see my replies below:

Thanks,
Michael

docs/LangRef.rst
10750–10751 ↗(On Diff #32814)

I think that totally makes sense, but we need to explicitly state that in the documentation.

test/CodeGen/X86/absdiff_256.ll
1 ↗(On Diff #32814)

CHECK is the default prefix, so you don't need to specify it.

ab added a subscriber: ab.Aug 26 2015, 1:38 PM

I'm not a fan of the way the tests are written: they seem both too brittle and too strict.

FWIW, I'd start by using utils/update_llc_test_checks.py, and then refining away the non-ABI-specified registers into regexes.

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
726 ↗(On Diff #32814)

Can you define these at the point of initialization instead?

735 ↗(On Diff #32814)

Again, please initialize variables when defining them.

Given line 745, NVT is unnecessary, I think. The block at l736 could just set VT instead.

737–738 ↗(On Diff #32814)

This logic is fishy, IMO: it doesn't make sense for targets to set the element type promotion rule, and asking for the promoted vector type might get us a widened version.

What about using:

VT.widenIntegerVectorElementType(*DAG.getContext());

which sounds like what you want, given that there's no other sensible thing the target could do.

lib/Target/X86/X86ISelLowering.cpp
318–323 ↗(On Diff #32814)

This will become unnecessary if you use EVT::widenIntegerVectorElementType above.

test/CodeGen/X86/absdiff_128.ll
11–12 ↗(On Diff #32814)

The a-d seems too restrictive (what about, say, %r9d?), and '+' seems unnecessary.

test/CodeGen/X86/absdiff_256.ll
5 ↗(On Diff #32814)

Why test v16i32? IMHO the output is huge without providing more coverage.

Should this be v16i16 or something?

ashahid added inline comments.Aug 27 2015, 2:03 AM
test/CodeGen/X86/absdiff_128.ll
11–12 ↗(On Diff #32814)

Thanks for the input, do you mean to use explicit register name in case of ABI specific register. For example, here if we are expecting EAX & ECX abi registers respectively, I should specify %eax & %ecx?

Hi Shahid,

Ahmed is actually suggesting the opposite - your regexes are *too*
restrictive. Consider relaxing them instead:

[[SRC:%.*]]

James

Hi Shahid,

Ahmed is actually suggesting the opposite - your regexes are *too*
restrictive. Consider relaxing them instead:

[[SRC:%.*]]

James

Ok

Updated the patch for

  1. result overflow documentation
  2. Test case fixes
  3. Refactoring of the expansion of llvm@absdiff intrinsic.

Hi Mikhail & others,

Please review, waiting for the responses for long time.

Regards,
Shahid

Hi Shahid,

The logic in LangRef looks fine to me, but I'd prefer someone else to review the SDAG part.

Michael

ab added a comment.Sep 8 2015, 5:32 PM

Thanks for the changes!

I'm still uncomfortable with the tests. The explicit CHECK-NEXT approach would apply much better to the smaller tests in _128.ll than the big v16i16 one, I think. Regexes would also help make the tests less brittle while keeping them as useful.

Also, should we merge the two test files together? IMO, the _128/_256 distinction isn't very useful.

test/CodeGen/X86/absdiff_128.ll
66–67 ↗(On Diff #33309)

These lines could be simplified into:

; CHECK: movzbl
; CHECK: movzbl

Ditto for others, e.g. pextrw below.

test/CodeGen/X86/absdiff_256.ll
25–27 ↗(On Diff #33309)

Why do these need CHECK-DAG?

Hi Ahmed,

Thanks for the comments. I will do the needful.

Regarding the test merging, IMO, even if it is not useful now, having a place holder as _256.ll is not a bad idea.

Regards,
Shahid

test/CodeGen/X86/absdiff_256.ll
25–27 ↗(On Diff #33309)

This is for the case where psubw and pxor comes in different order. Doesn't it qualify for CHECK-DAG?

Updated the tests to make it less brittle.

hfinkel added inline comments.Sep 18 2015, 9:26 AM
docs/LangRef.rst
10873 ↗(On Diff #34417)

I'd phrase this differently, and say that the intermediate calculations are computed using infinitely-precise unsigned arithmetic. That's similar to the language we used for inbounds GEPs.

10879 ↗(On Diff #34417)

We cannot have target-defined behavior for target-independent IR intrinsics. You can say only that the result is undefined. If an application wants to rely on target-specific overflow behavior, then it will need to directly use some target-specific intrinsics, inline asm, etc.

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
747 ↗(On Diff #34417)

Should this be ISD::SETGE for both the signed and unsigned case?

Hi Hal,

Thanks for your comments, will update the doc accordingly.
See other response inlined.

Regards,
Shahid

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
747 ↗(On Diff #34417)

Here ISD::SETGE is for signed case only, unsigned case is handled with early exit.

hfinkel added inline comments.Sep 20 2015, 5:50 AM
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
747 ↗(On Diff #34417)

Okay, I see, you're using the TRUNCATE above.

docs/LangRef.rst updated to incorporate Hal's comments

hfinkel accepted this revision.Sep 23 2015, 6:04 AM
hfinkel edited edge metadata.

LGTM.

This revision was automatically updated to reflect the committed changes.