Page MenuHomePhabricator

[SelDag][MIR] Add FREEZE
Needs ReviewPublic

Authored by aqjune on Jan 23 2017, 3:56 AM.

Details

Summary
  • Add FREEZE node to SelDag
  • Lower FreezeInst (in IR) to FREEZE node
  • Add Legalization for FREEZE node
  • Add FREEZE pseudoinstruction to MachineIR
  • Update description about IMPLICIT_DEF and FREEZE of MachineIR
  • Let ExpandPostRAPseudos expand FREEZE
  • Add a comment to TargetPassConfig stating that after ExpandPostRAPseudos IMPLICIT_DEF should not be optimized to different registers/constants

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Also, I didn't see a corresponding patch for FastISel. Did I miss one?

nlopes added a subscriber: nlopes.Jan 25 2017, 4:17 AM
nlopes added inline comments.
lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
2238 ↗(On Diff #85346)

If you have e.g.:
mul undef, undef

this eventually may get translated into:

%v1 = IMPLICIT_DEF
%v2 = IMPLICIT_DEF
%v3 = mul %v1, %v2

Even if it doesn't get translated to this explicitly, the semantics of UNDEF/IMPLICIT_DEF are that each read may see a different value. For example, IMPLICIT_DEF can be sank into a loop and each iteration may see a different value, and known-bits analyses will also take advantage of the fact that each bit can take any value it wants.
Therefore we use this "hack" of making a copy of the vreg where IMPLICIT_DEF is assigned to. This seems to guarantee that all uses see the same value, since there isn't a pass propagating IMPLICIT_DEF to copies.

aqjune updated this revision to Diff 85743.Jan 25 2017, 6:32 AM
  • FastISel for FREEZE added.
  • Updated description of FREEZE.
aqjune marked an inline comment as done.Jan 26 2017, 1:29 AM
mkuper added inline comments.Jan 26 2017, 2:36 PM
include/llvm/CodeGen/ISDOpcodes.h
180 ↗(On Diff #85743)

random -> arbitrary (sorry, it's a pet peeve :-) )

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
2238 ↗(On Diff #85346)

If I understand correctly, the freeze.ll test shows you do end up with just an IMPLICIT_DEF eventually.
So I'm not sure where the benefit is? You rely on backend phase ordering to elide the copy only after the last time IMPLICIT_DEF may be duplicated? This seems a bit fishy.

Anyway, this is just a drive-by, I defer to Quentin and the other if they think this is reasonable.

aqjune updated this revision to Diff 86018.Jan 26 2017, 11:23 PM
aqjune marked an inline comment as done.
regehr added a subscriber: regehr.Jan 28 2017, 10:03 PM
aqjune updated this revision to Diff 106874.Jul 17 2017, 8:05 AM

Rebased to svn commit 308173

aqjune updated this revision to Diff 145379.May 5 2018, 12:11 PM

Rebased to svn commit 331585

All these new llc tests - have you considered using utils/update_llc_test_checks.py ?

aqjune updated this revision to Diff 217190.Aug 26 2019, 10:21 AM

Rebase to trunk@369887 (August 26th, 2019)

I guess this is the next patch in the queue?
This needs rebasing, and more tests to match the langref wording.

aqjune updated this revision to Diff 224861.Oct 14 2019, 8:04 AM
  • Support floating points, pointers, aggregate types, add tests for those types

I believe vector legalization tests are missing.

Cost model handling might be missing (i would guess they should be treated as free)

This looks reasonable-ish, but i don't have much knowledge on this part, sadly.
I'm wondering if @RKSimon / @spatel / @craig.topper might want to help review this?

include/llvm/CodeGen/ISDOpcodes.h
179 ↗(On Diff #224861)

s/integer/value/

craig.topper added inline comments.Oct 14 2019, 10:21 AM
lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
385 ↗(On Diff #224861)

Just use ISD::FREEZE instead of N->getOpcode()

4021 ↗(On Diff #224861)

I think this should be with the other PromoteIntOp functions? I think that's how this file is sorted.

4021 ↗(On Diff #224861)

Does this function even get called? The input type and the output type are the same right? So PromoteIntRes should have been called first.

4025 ↗(On Diff #224861)

Same with this. We should expand the result and then we'll never have to expand the input separately.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
3105 ↗(On Diff #224861)

Would we better off just handling freeze on its own instead of adding all this aggregate code to the generic visitUnary?

lebedev.ri requested changes to this revision.Oct 20 2019, 4:40 AM

(as per inline comments)

This revision now requires changes to proceed.Oct 20 2019, 4:40 AM
aqjune updated this revision to Diff 225827.Oct 21 2019, 1:04 AM
  • Remove DAGTypeLegalizer::PromoteIntOp_FREEZE, DAGTypeLegalizer::ExpandIntOp_FREEZE as they are never called
  • Add cost model of freeze
  • Add vector legalization test to freeze-legalize.ll
aqjune marked 6 inline comments as done.Oct 21 2019, 1:06 AM
aqjune added inline comments.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
3105 ↗(On Diff #224861)

I put it here as visitBinary() was also handling opcodes, but I have no special preference.

@qcolombet - ping as per inline comment?
I'm also not really fully sold that the SelectionDAGISel::Select_FREEZE()
approach is fully future-proof/correct.

Other than that, nothing major stands out to me here.

include/llvm/Analysis/TargetTransformInfoImpl.h
853 ↗(On Diff #225827)

Also here
And in TargetTransformInfo::getInstructionThroughput()

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4055 ↗(On Diff #225827)

Spurious newline change

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
3106 ↗(On Diff #225827)

assert(Opcode == ISD::FREEZE && "Can only get here for freeze` nodes");`

3110 ↗(On Diff #225827)
for (unsigned i = 0, NumOps = Op.getNumOperands(); i < NumOps; ++i)
lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
2238 ↗(On Diff #85346)

@qcolombet ping, please?

I was looking through MachineInstr's optimizations to recheck validity of using IMPLICIT_DEF, and found that it caused problems at two cases.

  1. When a function call is involved, its value is not preserved.
define i32 @foo()
  %y1 = freeze i32 undef
  %k = add i32 %y1, 10
  call i32 @g()
  %k2 = add i32 %y1, 20
  ..

->

_foo:
  addl  $10, %ebx
  callq _g
  addl  $20, %eax <- this is wrong because %eax and %ebx may differ
  ..
  1. When PHI node is involved, it is folded into an incorrect value.
  br i1 %cond, label %BB1, label %BB2
BB1:
  %y1 = freeze i32 undef
  %k1 = sub i32 %y1, 1
  br label %END
BB2:
  %y2 = freeze i32 undef
  br label %END
END:
  %p = phi i32 [%y1, %BB1], [%y2, %BB2]
  %p2 = phi i32 [%k1, %BB1], [0, %BB2]
  store i32 %p, i32* @x
  store i32 %p2, i32* @y

->

LBB0_3:                                 ## %END
  movl  %eax, _x(%rip)
  movl  %eax, _y(%rip) <- should be "%eax - 1" if %cond is true

But, other than these two (function call and phi), other operations seemed okay.
Arithmetic operations including add / mul didn't fold IMPLICIT_DEF in a undef-like way.

What would be a good way to address the two cases?
The latter seemed to be done by ProcessImplicitDefs.cpp / PHIElimination.cpp. The former one seems more complicated; it seems to require preservation of information about the original IMPLICIT_DEF before it is removed by later pipeline.

arsenm added a subscriber: arsenm.Oct 27 2019, 2:33 PM
arsenm added inline comments.
lib/CodeGen/SelectionDAG/FastISel.cpp
1573 ↗(On Diff #225827)

s/unsigned/Register

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
3109 ↗(On Diff #225827)

1 seems a bit small for the size

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
2238 ↗(On Diff #85346)

I see one potential problem with using a COPY of IMPLICIT_DEF. Suppose an instruction has two registers operands with different register classes. A codegen pass may reasonably try to rematerialize one of the implicit_defs to the natural operand register class to avoid the copy to satisfy operand constraints

2302 ↗(On Diff #225827)

s/unsigned/Register/

test/CodeGen/X86/freeze.ll
89 ↗(On Diff #225827)

I'm not sure where Phabricator gets it syntax highlighting from, but it should add the freeze keyword

aqjune updated this revision to Diff 228705.Nov 11 2019, 8:56 AM
  • Add FREEZE pseudoinstruction for MachineIR
  • Let ExpandPostRAPseudos expand FREEZE
  • Address reviewers' comments
  • Add tests for function call / phi
  • Add a comment to TargetPassConfig stating that after ExpandPostRAPseudos IMPLICIT_DEF should not be optimized to different registers/constants
Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2019, 8:56 AM

I added FREEZE pseudoinstruction to MachineIR, as it seemed to be the succinct way to make it correct.
I made ExpandPostRAPseudos expand the FREEZE to register-copy assembly instruction. ExpandPostRAPseudos pass is done after register allocation (which, or at least a pass relevant to it, seems to replace uses of IMPLICIT_DEF with different registers) as well as ProcessImplicitDef (which is the culprit of the incorrect PHI optimization example) and PHIElimination. I left a commit that explicitly states that IMPLICIT_DEF cannot be used in an undef-y way after the pass.

craig.topper added inline comments.Nov 11 2019, 7:50 PM
llvm/include/llvm/Support/TargetOpcodes.def
59

Comment says IMPLICIT_DEF

llvm/lib/CodeGen/SelectionDAG/FastISel.cpp
1571

Why doesn't this TargetOpcode::Freeze?

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
3106

Separate this into just visitFreeze.

aqjune updated this revision to Diff 228816.Nov 11 2019, 10:58 PM
aqjune edited the summary of this revision. (Show Details)
  • Let FastISel emit FREEZE
  • Let SelectionDAGBuilder::visitFreeze lower freeze IR instructions
  • Minor fixes
aqjune marked 3 inline comments as done.Nov 11 2019, 10:59 PM
aqjune retitled this revision from [SelDag] Implement FREEZE node to [SelDag][MIR] Add FREEZE .Nov 11 2019, 11:07 PM
craig.topper added inline comments.Nov 11 2019, 11:42 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
3104

Drop this change?

lkail added a subscriber: lkail.Nov 12 2019, 12:02 AM

Do you also want a MIR test that shows IR freeze -> MIR freeze lowering?

aqjune updated this revision to Diff 229230.Wed, Nov 13, 10:45 PM
aqjune edited the summary of this revision. (Show Details)
  • Add freeze IR -> MIR test
aqjune marked an inline comment as done.Wed, Nov 13, 10:46 PM
aqjune added inline comments.
llvm/test/CodeGen/X86/freeze-mir.ll
2

I made a separate freeze-mir.ll file for mir test because update_mir_test_checks.py did not work after freeze.ll is processed with update_llc_test_checks.py

Hmm, thanks, i think this now looks good to me.
@craig.topper / @qcolombet / @arsenm ?

llvm/test/CodeGen/X86/freeze-call.ll
8–13

You can avoid cfi noise via define i32 @foo() nounwind {

aqjune updated this revision to Diff 229443.Thu, Nov 14, 8:47 PM
  • Add nounwind to tests
arsenm added inline comments.Thu, Nov 14, 8:51 PM
llvm/include/llvm/Target/Target.td
1067

Why isNotDuplicable? I don't think this is the best maintained instruction property

aqjune marked an inline comment as done.Thu, Nov 14, 9:36 PM
aqjune added inline comments.
llvm/include/llvm/Target/Target.td
1067

It is because different freeze defs can yield different values. For example, consider following transformation:

%undef = IMPLICIT_DEF
%x = FREEZE %undef
use(%x)
use(%x) // these two uses should see the same freezed value
->
%undef = IMPLICIT_DEF
%x = FREEZE %undef
use(%x)
%x' = FREEZE %undef
use(%x') // It is possible that %x and %x' are assigned differently freezed values.

This transformation is incorrect, because use()s after the transformation can see different values.
To prevent this class of optimizations, isNotDuplicable is set to 1.

arsenm added inline comments.Thu, Nov 14, 9:47 PM
llvm/include/llvm/Target/Target.td
1067

But these both are using the same undef vreg? The second freeze is still using the original undef, so this should be fine?

arsenm added inline comments.Thu, Nov 14, 9:53 PM
llvm/include/llvm/Target/Target.td
1067

To clarify, I think the property you are really looking for is not the property given to you by isNotDuplicable. What you really care about is the input operand not using a new instance of undef. The "nonduplicability" is a function of the input value and not the instruction itself

aqjune marked an inline comment as done.Thu, Nov 14, 9:55 PM
aqjune added inline comments.
llvm/include/llvm/Target/Target.td
1067

The second use of the undef register can be assigned a different physical register at (perhaps) register allocation, after seeing that its definition is IMPLICIT_DEF.

%undef = IMPLICIT_DEF
%x = FREEZE %undef
use(%x)
%x' = FREEZE %undef
use(%x')
=>
%x = FREEZE $eax
use(%x)
%x' = FREEZE $ebx
use(%x')

This transformation can happen if there is a function call between them. test/CodeGen/X86/freeze-call.ll checks that it never happens.

arsenm added inline comments.Thu, Nov 14, 9:59 PM
llvm/include/llvm/Target/Target.td
1067

I think handling this correctly requires changing to how IMPLICIT_DEF is handled in register allocation. I don't think noduplicate is really going to model this constraint sufficiently

aqjune marked an inline comment as not done.Thu, Nov 14, 10:13 PM
aqjune added inline comments.
llvm/include/llvm/Target/Target.td
1067

It would make complexity of this patch (lowering of freeze IR instruction) easier, but would the change in register allocation need performance tests?
For example, if all uses of IMPLICIT_DEF should see a same caller-save register, the register should be spilled whenever its liveness crosses a function call.
Currently IMPLICIT_DEF is commented as MachineInstr-level equivalent of undef, and undef`may evaluate to different values as well, so I wonder whether there are passes other than regalloc that use that semantics too.

isNotDuplicable's comment says

class MCInstrDesc {
...
  /// Return true if this instruction cannot be safely
  /// duplicated.  For example, if the instruction has a unique labels attached
  /// to it, duplicating it would cause multiple definition errors.
  bool isNotDuplicable() const { return Flags & (1ULL << MCID::NotDuplicable); }

so I thought non-duplicability was a property of the freeze instruction itself.

arsenm added inline comments.Thu, Nov 14, 10:40 PM
llvm/include/llvm/Target/Target.td
1067

I mean the definition of IMPLICIT_DEF needs to change to support this. I think we may actually need two different flavors of IMPLICIT_DEF. noduplicate is a stronger barrier to useful transforms, and also doesn't model the real problem here.

If you started out with two separate freeze instructions reading the same IMPLICIT_DEF register, you would still see the same problem in your example.

aqjune added inline comments.Thu, Nov 14, 11:36 PM
llvm/include/llvm/Target/Target.td
1067

I mean the definition of IMPLICIT_DEF needs to change to support this. I think we may actually need two different flavors of IMPLICIT_DEF.

Understood. So we can have two kinds of IMPLICIT_DEFs; the first one is an instruction that can be optimized to different values whenever used. The second one has the same value whenever it is used.

I think the second IMPLICIT_DEF is equivalent to FREEZE (IMPLICIT_DEF of the first kind), because freeze is an instruction that guarantees all uses of the value see the same value even though the operand is an undef-like value.

If you started out with two separate freeze instructions reading the same IMPLICIT_DEF register, you would still see the same problem in your example.

If the program originally contained two separated freezes, then it is okay, because the program is originally written in that way. The programmer must have written down two freezes in LLVM IR bitcode, and they are lowered into MIR.
The problematic case is when a program with a single freeze is optimized to a program with several freezes. Different freezes can return different values.

%undef = IMPLICIT_DEF
%x = FREEZE %undef
use(%x)
use(%x) // these two uses see the same value with help of freeze.
->
%undef = IMPLICIT_DEF
%x = FREEZE %undef
use(%x)
%x' = FREEZE %undef
use(%x') // Now %x and %x' can be assigned different values, so this optimization is wrong.
arsenm added inline comments.Thu, Nov 14, 11:50 PM
llvm/include/llvm/Target/Target.td
1067

I think it's important to define the rules here clearly for MIR, and not rely on expectations from the original IR. Some legalization for example may end up inserting multiple freezes of the same input value, and that should work correctly. Multiple freezes of the same input register should produce the same value.

aqjune added inline comments.Fri, Nov 15, 12:49 AM
llvm/include/llvm/Target/Target.td
1067

I agree, I'll write the rule for freeze.
Regarding legalization, in which case can it be copied? The case that I was aware of was splitting a register of illegal size, which copies freeze but does not increase # of uses per definition:

%x = ... // invalid type, say i50
%y = freeze %x
  =>
%x1 = ... // i32
%x2 = ... // i18
%y1 = freeze %x1
%y2 = freeze %x2
liuz added a subscriber: liuz.Fri, Nov 15, 8:39 PM
lebedev.ri requested changes to this revision.Fri, Nov 22, 3:19 AM

Looks like this is waiting on an update.
Really happy to see this progressing.

This revision now requires changes to proceed.Fri, Nov 22, 3:19 AM
aqjune updated this revision to Diff 230899.Mon, Nov 25, 7:27 AM
aqjune edited the summary of this revision. (Show Details)

I added description about FREEZE and IMPLICIT_DEF to TargetOpcodes.def.
To address the legalization issue, updated the semantics so FREEZEs return the same value if placed consecutively. Would this properly address the issue?