This is an archive of the discontinued LLVM Phabricator instance.

Allow binop C1, (select cc, CF, CT) -> select folding
ClosedPublic

Authored by rampitec on Jun 15 2018, 9:10 AM.

Details

Summary

Previously this folding was done only if select is a first operand.
However, for non-commutative operations constant may go before
select.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec created this revision.Jun 15 2018, 9:10 AM
This revision is now accepted and ready to land.Jun 15 2018, 12:37 PM

Would be good to have additional small test for some other target (e.g. x86), too.

rampitec added a comment.EditedJun 16 2018, 12:59 AM

Would be good to have additional small test for some other target (e.g. x86), too.

Sure. It turns out x86 needs it even more probably. I have created the test and it turns out both andl and orl were present before the patch.

rampitec updated this revision to Diff 151609.Jun 16 2018, 1:00 AM

Added x86 test.
Updated amdgcn test.

lebedev.ri added inline comments.
test/CodeGen/X86/dagcombine-select.ll
1 ↗(On Diff #151609)

Most tests (and practically all new x86 tests) use utils/update_llc_test_checks.py script to auto-generate these check lines.

3–17 ↗(On Diff #151609)

Hm, is there some omitted instruction, or is this actually better than what we currently normally do?
https://godbolt.org/g/7ULPfH

rampitec updated this revision to Diff 151610.Jun 16 2018, 1:31 AM

Updated x86 asm to contain full ISA.

rampitec marked an inline comment as done.Jun 16 2018, 1:37 AM
rampitec added inline comments.
test/CodeGen/X86/dagcombine-select.ll
3–17 ↗(On Diff #151609)

Yes. I have updated the test to contain the full ISA. First xor to zero out eax was omitted.
I am not sure what compiler explorer does, but that is what trunk llc has produced:

xorl    %eax, %eax
cmpl    $11, %edi
setl    %al
decl    %eax
andl    %esi, %eax
retq

I assume difference comes from running or not running opt.

rampitec added inline comments.Jun 16 2018, 1:42 AM
test/CodeGen/X86/dagcombine-select.ll
3–17 ↗(On Diff #151609)

E.g. compare w/o opt: https://godbolt.org/g/NMZ9he

lebedev.ri added inline comments.Jun 16 2018, 2:00 AM
test/CodeGen/X86/dagcombine-select.ll
1 ↗(On Diff #151609)

(not actually done, that does not look like the utility's output, and the first line does not say the script was used)

3–17 ↗(On Diff #151609)

Ok, so the only difference is that the strictness of the comparison is inverted (cmovg vs cmovge and the other way around).

rampitec marked an inline comment as done.Jun 16 2018, 2:24 AM
rampitec added inline comments.
test/CodeGen/X86/dagcombine-select.ll
1 ↗(On Diff #151609)

Oops, sorry. Clicked wrong checkbox. Now updated.

3–17 ↗(On Diff #151609)

Do you see "and eax, esi" instruction on the second from the right tab using my link? This one was eliminated.

In general if you run opt that is not an issue. If you run llc only (as in the test) you can see it. Normally that shall not happen, as I wrote in the description InstCombine can get rid of it. However, amdgcn can produce it during lowering and InstCombine does not play.

rampitec updated this revision to Diff 151611.Jun 16 2018, 2:25 AM
rampitec marked an inline comment as done.

Updated x86 test with utils/update_llc_test_checks.py

Sounds reasonable.

test/CodeGen/X86/dagcombine-select.ll
3–17 ↗(On Diff #151609)

Of course.

I was only talking about comparing the llc(D48223) test/CodeGen/X86/dagcombine-select.ll with the opt(trunk) test/CodeGen/X86/dagcombine-select.ll | llc(trunk) output.

What about
(X / Y) != 0 -> X >= Y ?

(X, Y are unsigned)

spatel requested changes to this revision.Jun 16 2018, 7:10 AM
spatel added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1883–1889 ↗(On Diff #151611)

This change is independent of the and/or enhancement. We're now allowing folding when the binop has a constant operand 0. That seems like a good enhancement for non-commutative binops, but it should have its own tests using opcodes besides 'and'/'or' and be split into its own patch.

Example based on one of the original tests from rL296699:

define <2 x double> @sel_constants_fmul_constant_vec(i1 %cond) {
  %sel = select i1 %cond, <2 x double> <double -4.0, double 12.0>, <2 x double> <double 23.3, double 11.0>
  %bo = fsub <2 x double> <double 5.1, double 3.14>, %sel
  ret <2 x double> %bo
}
This revision now requires changes to proceed.Jun 16 2018, 7:10 AM

What about
(X / Y) != 0 -> X >= Y ?

(X, Y are unsigned)

This might be a good idea, but seems a separate change.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1883–1889 ↗(On Diff #151611)

I am not sure why this would be beneficial. For example:

sub (select 0, -1), x -> select (sub 0, x), (sub -1, x)
add (select 0, -1), x -> select x, (add -1, x)

In this case select would remain and two subs instead of one, seems worse.
For add both select and add would remain, so the benefit is not obvious.

In turn and/or have clear benefit in this change because binop completely goes away, and
In fact the same can be done to xor, but I have decided not to since xor would be needed anyway with a non-zero operand (and one would never have a select with both sizes zero).

spatel added inline comments.Jun 16 2018, 12:51 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1883–1889 ↗(On Diff #151611)

It's beneficial for exactly the same reason as the existing fold - we eliminate the binop. The fact that the case with constant operand 0 isn't already folded was just an oversight in the original patch.

Currently, this patch miscompiles on these cases, so we need more tests and a code change whether we treat that difference as part of this patch or not.

Here's a scalar example for Aarch64 in case it makes it easier to see:

define i32 @sel_constants_sub_constant_op0(i1 %cond) {
  %sel = select i1 %cond, i32 12, i32 42
  %bo = sub i32 500, %sel
  ret i32 %bo
}

Current codegen:

	tst	w0, #0x1
	mov	w8, #42
	orr	w9, wzr, #0xc
	csel	w8, w9, w8, ne
	mov	w9, #500
	sub	w0, w9, w8
	ret

And with this patch applied (miscompile):

	tst	w0, #0x1
	mov	w8, #-458  ; 500 - 42 != -458
	mov	w9, #-488  ; 500 - 12 != -488
	csel	w0, w9, w8, ne
	ret
rampitec added inline comments.Jun 18 2018, 10:21 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1883–1889 ↗(On Diff #151611)

Yes, there is a bug. I will fix it.

rampitec added inline comments.Jun 18 2018, 12:16 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1883–1889 ↗(On Diff #151611)

Thanks for catching this! The confusion on my side was about "constant operand 0". I have misread it as "constant zero" instead of "operand zero". I will update diff shortly.

rampitec updated this revision to Diff 151763.Jun 18 2018, 12:26 PM
rampitec marked 5 inline comments as done.
rampitec retitled this revision from DAG combine "and|or (select c, -1, 0), x" -> "select c, x, 0|-1" to Allow binop C1, (select cc, CF, CT) -> select folding.
rampitec edited the summary of this revision. (Show Details)
  • Fixed handling of non-commutative operations if arguments are swapped.
  • Added tests for non-commutative operations with all-const value.
  • Retitled patch accordingly.
spatel added inline comments.Jun 18 2018, 2:01 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1883–1889 ↗(On Diff #151611)

Ah...I can see how that was confusing. :)

I think this patch is correct now, but let me make a couple of requests:

  1. Commit the new PPC and x86 tests with assertions based on trunk (ie, without this patch applied). It's easier to review if we can see the before/after changes directly in this review. I'd do the same for the AMDGPU tests too, but I'm not qualified to review those diffs anyway.
  2. Split the and/or enhancement into a follow-up patch. IIUC, that's independent of allowing the more flexible constant matching.
rampitec added inline comments.Jun 18 2018, 2:25 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1883–1889 ↗(On Diff #151611)

Will do. I will not do so with amdgpu tests as these are not autogenerated, but will do it with x86 and PPC.

rampitec added inline comments.Jun 18 2018, 2:58 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1883–1889 ↗(On Diff #151611)

Baseline tests committed https://reviews.llvm.org/rL334987

rampitec updated this revision to Diff 151799.Jun 18 2018, 3:08 PM
rampitec edited the summary of this revision. (Show Details)

Only commute part of the change.

spatel added inline comments.Jun 18 2018, 4:04 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1904 ↗(On Diff #151799)

Nit: I'd call this 'C' or 'CBO' or 'CBinOp' since it's not always operand 1 now.

1913–1914 ↗(On Diff #151799)

When does this happen (is there a test)? Why does this only happen when the select is operand 1 of the binop? Better to remove the 'SelOpNo' condition to be safer?

rampitec added inline comments.Jun 18 2018, 4:40 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1913–1914 ↗(On Diff #151799)

That is if shift value and amount have different types. On x86 shift amount is i8 regardless of LHS. The new test shl_constant_sel_constants does not fold on x86 but does on amdgpu. W/o the check this test asserts. At the same time old test does work on x86 and folds correctly, so I think it is only needed if I have swapped operands. In general that should be possible to write a piece of code to create correct VT for shifts, but I'd better leave that to x86 folks. I guess range checking will be also needed if such a code is about to be written.

rampitec updated this revision to Diff 151819.Jun 18 2018, 4:49 PM
rampitec marked an inline comment as done.
  • Added comment about shift VTs
  • Renamed C1 into CBO
rampitec updated this revision to Diff 151821.Jun 18 2018, 4:53 PM

Added x86 test for shifts with not reversed operands (as before the change).

arsenm added a subscriber: arsenm.Jun 19 2018, 4:17 AM
arsenm added inline comments.
test/CodeGen/AMDGPU/dagcombine-select.ll
1 ↗(On Diff #151821)

Use an i16 target for i16 tests

5 ↗(On Diff #151821)

Some tests with i16 would be useful (as well as vectors)

spatel added inline comments.Jun 19 2018, 8:50 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1913–1914 ↗(On Diff #151799)

Thanks - that was my guess, and the TODO + test makes it clear.

LGTM, but get final approval from @arsenm once the AMDGPU tests are updated.

rampitec updated this revision to Diff 151950.Jun 19 2018, 11:14 AM
rampitec marked 9 inline comments as done.
rampitec added a reviewer: arsenm.

Added some i16/f16/vector tests.

arsenm accepted this revision.Jun 20 2018, 9:43 AM

LGTM

rampitec added inline comments.Jun 20 2018, 11:29 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1913–1914 ↗(On Diff #151799)

So @arsenm has reviewed the tests. @spatel it looks like your previous vote technically holds the review now. Can this be submitted?

spatel accepted this revision.Jun 20 2018, 11:32 AM

LGTM.

This revision is now accepted and ready to land.Jun 20 2018, 11:32 AM
This revision was automatically updated to reflect the committed changes.