This is an archive of the discontinued LLVM Phabricator instance.

Disable DAGCombine for -O0 and optnone
Needs ReviewPublic

Authored by iid_iunknown on May 25 2015, 8:00 AM.

Details

Summary

Yet another attempt to disable DAGCombine for -O0 and optnone for better debugging experience.

Some additional info on the subject can be found in D7181, D8614 and in PR22346.

The previous attempt has been reverted due to the failures in instruction selection, which depends on the transforms done from DAGCombine. DAGCombine calls LegalizeOp that performs transformations such as ConstantPool -> <TargetConstantPool + Wrapper> and others. Their absence breaks instruction selection.

The patch disables combine leaving LegalizeOp enabled to satisfy instruction selection. The dependent tests were adjusted to reflect the changes in the code generated w/o combine.
It also has a fix in ARMISelDAGToDAG.cpp that prevents llc from crashing when compiling the CodeGen/ARM/2010-05-18-LocalAllocCrash.ll test with DAGCombine disabled, as described in PR22346.

I might be missing some important points so would like to have a sharp community's eye on this and hear any concerns / criticism about the patch.

*UPDATE*
The new version of the patch fixes broken FMA lowering on X86 (please find the comment from Michael Kuperstein).

Previously X86 FMA lowering was done at the combine step thereby requiring DAGCombine to get proper target specific FMA nodes. The patch moves this logic to the Legalize step to make FMA lowering independent on DAGCombine. It also adds a new CodeGen/X86/fma-no-dag-combine.ll test.

This broke fma_patterns.ll and other tests that expected operations like a * b - c to be combined into (fma a, b, (fneg c)). The patch was producing vfmadd instead of the expected vfmsub. The reason for this is that FNEG lowering is called before the added FMA lowering. FNEG gets transformed into other nodes making FMA lowering unable to match the pattern for vfmsub. To overcome this, the patch skips FNEG lowering if it is an FMA operand (LowerFABSorFNEG in X86ISelLowering.cpp).

Diff Detail

Repository
rL LLVM

Event Timeline

iid_iunknown retitled this revision from to Disable DAGCombine for -O0 and optnone.
iid_iunknown updated this object.
iid_iunknown edited the test plan for this revision. (Show Details)
iid_iunknown set the repository for this revision to rL LLVM.
iid_iunknown added a subscriber: Unknown Object (MLST).
asl added a subscriber: asl.May 25 2015, 8:14 AM
probinson edited edge metadata.May 26 2015, 12:28 PM

I had one inline comment about how to test for the correct conditions.

I think the updated 'dag-optnone.ll' already covers everything that's in 'fastmath-optnone.ll' so you don't need to add the latter as a new test.

I'm not qualified to review all the other test changes, someone else will need to do that.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1204

It should be sufficient to test OptLevel here. OptimizeNone should have reset OptLevel for the current function, so you should not need to check the attribute directly.

Thank you for your feedback Paul!
I will update the patch according to your remarks. New version will also contain a fix for the CodeGen/ARM/2010-05-18-LocalAllocCrash.ll test, which crashes without DAGCombine as you noticed in PR22346. Just missed it somehow when creating the patch file.

iid_iunknown edited edge metadata.

Corrections according to the remarks from Paul.
Fix for CodeGen/ARM/2010-05-18-LocalAllocCrash.ll crash (ARMISelDAGToDAG.cpp).

mkuper added a subscriber: mkuper.May 27 2015, 3:06 PM

So, there's something that I'm fairly sure will break on x86, and it doesn't seem to be covered by a test. :-(
The x86 PerformFMACombine actually performs an essential part of isel - it lowers a target-independent ISD node into a target-dependent one.

Now, for cases where the FMA comes from an x86 intrinsic, it's not an issue, since we never get the target-independent ISD. For cases where the FMA itself is constructed by a DAGCombine (which is what fma_patterns tests), it's not an issue either, because the FMA never gets formed. But it will be hit when the FMA comes from a target-independent intrinsic.

TL;DR:
Try compiling this with -mattr=+fma -O0 with your patch, I expect it to fail:

declare <4 x float> @llvm.fma.v4f32(<4 x float> %a, <4 x float>  %b, <4 x float>  %c)

define <4 x float> @test_fma1(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) {
  %res = call <4 x float> @llvm.fma.v4f32(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2)
  ret <4 x float> %res
}
asl added a comment.May 27 2015, 3:33 PM

So, there's something that I'm fairly sure will break on x86, and it doesn't seem to be covered by a test. :-(
The x86 PerformFMACombine actually performs an essential part of isel - it lowers a target-independent ISD node into a target-dependent one.

Now, for cases where the FMA comes from an x86 intrinsic, it's not an issue, since we never get the target-independent ISD. For cases where the FMA itself is constructed by a DAGCombine (which is what fma_patterns tests), it's not an issue either, because the FMA never gets formed. But it will be hit when the FMA comes from a target-independent intrinsic.

TL;DR:
Try compiling this with -mattr=+fma -O0 with your patch, I expect it to fail:

declare <4 x float> @llvm.fma.v4f32(<4 x float> %a, <4 x float>  %b, <4 x float>  %c)

define <4 x float> @test_fma1(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) {
  %res = call <4 x float> @llvm.fma.v4f32(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2)
  ret <4 x float> %res
}

Wow... then it should be a part of Legalize step.

So, there's something that I'm fairly sure will break on x86, and it doesn't seem to be covered by a test. :-(
The x86 PerformFMACombine actually performs an essential part of isel - it lowers a target-independent ISD node into a target-dependent one.

Now, for cases where the FMA comes from an x86 intrinsic, it's not an issue, since we never get the target-independent ISD. For cases where the FMA itself is constructed by a DAGCombine (which is what fma_patterns tests), it's not an issue either, because the FMA never gets formed. But it will be hit when the FMA comes from a target-independent intrinsic.

TL;DR:
Try compiling this with -mattr=+fma -O0 with your patch, I expect it to fail:

declare <4 x float> @llvm.fma.v4f32(<4 x float> %a, <4 x float>  %b, <4 x float>  %c)

define <4 x float> @test_fma1(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) {
  %res = call <4 x float> @llvm.fma.v4f32(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2)
  ret <4 x float> %res
}

Thank you Michael! You are right - your test fails with the patch.
The next version of the patch moves FMA lowering on X86 from combine to the Legalize step, thus it no longer depends on DAGCombine availability. It also adds a test you provided. I will update the review summary with the description of the changes made. Would you be able to take a look, please?

iid_iunknown updated this object.

Fix for broken FMA lowering on X86 according to the comment from Michael Kuperstein.

mkuper added a comment.Jun 3 2015, 1:03 AM

I'm not sure it's a good idea to move the entire combine into lowering.
The FNEG treatment is an actual combine. And I'm not 100% it will work correctly now (e.g. in cases where there are multiple users for the FNEG).

It would probably be better to leave the combine as is, but add a trivial "safety-net" lowering for ISD::FMA -> X86ISD::FMADD.

Adding Elena, who knows this code better than I do.

delena edited edge metadata.Jun 3 2015, 1:16 AM

All lit tests should pass in the -O0 mode. The code may be not optimal, but the compilation should not fail and the generated code should correct.

lib/Target/X86/X86ISelLowering.cpp
17616

You should not do the work of DAG-combiner in lowering. Lowering should deal with one SDNode.
Please put the FMA-combine back and just translate ISD::FMA to X86ISD::FMADD here.

I'm not sure it's a good idea to move the entire combine into lowering.
The FNEG treatment is an actual combine. And I'm not 100% it will work correctly now (e.g. in cases where there are multiple users for the FNEG).

It would probably be better to leave the combine as is, but add a trivial "safety-net" lowering for ISD::FMA -> X86ISD::FMADD.

Thanks Michael. I will look into this.

All lit tests should pass in the -O0 mode. The code may be not optimal, but the compilation should not fail and the generated code should correct.

Not sure I get this point correctly.
Many tests currently run without -O0 and their CHECK's expect some optimizations to happen. This doesn't mean the tests won't compile with -O0. This means the CHECK's will have to be changed if -O0 is used.

The patch has disabled DAGCombine for -O0. This broke some tests running with -O0 as their CHECK's were written with DAGCombine in mind. I have changed the CHECK's for those tests whose differences in the code generated with and without DAGCombine in -O0 were relatively small. Several tests however had quite significant differences in the generated code, and required full redesign of their CHECK's. Instead, DAGCombine was enabled for them by changing -O0 -> -O1 to get nearly the same code as the CHECK's expected. Please let me know if this is not the best way to handle such tests and we should rather stick to the 1st option (leave -O0 and redesign the CHECK's).

lib/Target/X86/X86ISelLowering.cpp
17616

Thank you for the clarification Elena. I will submit a new patch shortly.

delena added a comment.Jun 3 2015, 6:51 AM

I just afraid that some tests will fail to select instruction if you run them without DAGCombine.
So you should run ALL tests with O0 and be sure that they don't fail.

  • Elena
mkuper added a comment.Jun 3 2015, 7:25 AM

I think what Elena means is that it may be a good idea to check that the tests that currently run with -O3 don't *crash* (because of selection failures) when run with -O0.
They can, of course, fail due to unmatched CHECK lines.

probinson resigned from this revision.Jan 19 2016, 2:39 PM
probinson removed a reviewer: probinson.