Previously this folding was done only if select is a first operand.
However, for non-commutative operations constant may go before
select.
Details
Diff Detail
Event Timeline
Sure. It turns out x86 needs it even more probably. I have created the test and it turns out both andl and orl were present before the patch.
test/CodeGen/X86/dagcombine-select.ll | ||
---|---|---|
1 | Most tests (and practically all new x86 tests) use utils/update_llc_test_checks.py script to auto-generate these check lines. | |
3–17 | Hm, is there some omitted instruction, or is this actually better than what we currently normally do? |
test/CodeGen/X86/dagcombine-select.ll | ||
---|---|---|
3–17 | Yes. I have updated the test to contain the full ISA. First xor to zero out eax was omitted. xorl %eax, %eax cmpl $11, %edi setl %al decl %eax andl %esi, %eax retq I assume difference comes from running or not running opt. |
test/CodeGen/X86/dagcombine-select.ll | ||
---|---|---|
3–17 | E.g. compare w/o opt: https://godbolt.org/g/NMZ9he |
test/CodeGen/X86/dagcombine-select.ll | ||
---|---|---|
1 | Oops, sorry. Clicked wrong checkbox. Now updated. | |
3–17 | Do you see "and eax, esi" instruction on the second from the right tab using my link? This one was eliminated. In general if you run opt that is not an issue. If you run llc only (as in the test) you can see it. Normally that shall not happen, as I wrote in the description InstCombine can get rid of it. However, amdgcn can produce it during lowering and InstCombine does not play. |
Sounds reasonable.
test/CodeGen/X86/dagcombine-select.ll | ||
---|---|---|
3–17 | Of course. I was only talking about comparing the llc(D48223) test/CodeGen/X86/dagcombine-select.ll with the opt(trunk) test/CodeGen/X86/dagcombine-select.ll | llc(trunk) output. |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
1883–1889 | This change is independent of the and/or enhancement. We're now allowing folding when the binop has a constant operand 0. That seems like a good enhancement for non-commutative binops, but it should have its own tests using opcodes besides 'and'/'or' and be split into its own patch. Example based on one of the original tests from rL296699: define <2 x double> @sel_constants_fmul_constant_vec(i1 %cond) { %sel = select i1 %cond, <2 x double> <double -4.0, double 12.0>, <2 x double> <double 23.3, double 11.0> %bo = fsub <2 x double> <double 5.1, double 3.14>, %sel ret <2 x double> %bo } |
This might be a good idea, but seems a separate change.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
1883–1889 | I am not sure why this would be beneficial. For example: sub (select 0, -1), x -> select (sub 0, x), (sub -1, x) add (select 0, -1), x -> select x, (add -1, x) In this case select would remain and two subs instead of one, seems worse. In turn and/or have clear benefit in this change because binop completely goes away, and |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
1883–1889 | It's beneficial for exactly the same reason as the existing fold - we eliminate the binop. The fact that the case with constant operand 0 isn't already folded was just an oversight in the original patch. Currently, this patch miscompiles on these cases, so we need more tests and a code change whether we treat that difference as part of this patch or not. Here's a scalar example for Aarch64 in case it makes it easier to see: define i32 @sel_constants_sub_constant_op0(i1 %cond) { %sel = select i1 %cond, i32 12, i32 42 %bo = sub i32 500, %sel ret i32 %bo } Current codegen: tst w0, #0x1 mov w8, #42 orr w9, wzr, #0xc csel w8, w9, w8, ne mov w9, #500 sub w0, w9, w8 ret And with this patch applied (miscompile): tst w0, #0x1 mov w8, #-458 ; 500 - 42 != -458 mov w9, #-488 ; 500 - 12 != -488 csel w0, w9, w8, ne ret |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
1883–1889 | Yes, there is a bug. I will fix it. |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
1883–1889 | Thanks for catching this! The confusion on my side was about "constant operand 0". I have misread it as "constant zero" instead of "operand zero". I will update diff shortly. |
- Fixed handling of non-commutative operations if arguments are swapped.
- Added tests for non-commutative operations with all-const value.
- Retitled patch accordingly.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
1883–1889 | Ah...I can see how that was confusing. :) I think this patch is correct now, but let me make a couple of requests:
|
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
1883–1889 | Will do. I will not do so with amdgpu tests as these are not autogenerated, but will do it with x86 and PPC. |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
1883–1889 | Baseline tests committed https://reviews.llvm.org/rL334987 |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
1924–1925 | That is if shift value and amount have different types. On x86 shift amount is i8 regardless of LHS. The new test shl_constant_sel_constants does not fold on x86 but does on amdgpu. W/o the check this test asserts. At the same time old test does work on x86 and folds correctly, so I think it is only needed if I have swapped operands. In general that should be possible to write a piece of code to create correct VT for shifts, but I'd better leave that to x86 folks. I guess range checking will be also needed if such a code is about to be written. |
This change is independent of the and/or enhancement. We're now allowing folding when the binop has a constant operand 0. That seems like a good enhancement for non-commutative binops, but it should have its own tests using opcodes besides 'and'/'or' and be split into its own patch.
Example based on one of the original tests from rL296699: