- Add FREEZE node to SelDag
- Lower FreezeInst (in IR) to FREEZE node
- Add Legalization for FREEZE node
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
include/llvm/CodeGen/ISDOpcodes.h | ||
---|---|---|
180 ↗ | (On Diff #85346) | Please add more documentation here. |
lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp | ||
2238 ↗ | (On Diff #85346) | I'm a bit confused about this. The function above selects UNDEF to IMPLICIT_DEF. (Note that the documentation for IMPLICIT_DEF just says that "this is the MachineInstr-level equivalent of undef." If we're not allowed to duplicate IMPLICIT_DEFs and they have a single defined value, that should be corrected.) |
lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp | ||
---|---|---|
2238 ↗ | (On Diff #85346) | If you have e.g.: this eventually may get translated into: %v1 = IMPLICIT_DEF %v2 = IMPLICIT_DEF %v3 = mul %v1, %v2 Even if it doesn't get translated to this explicitly, the semantics of UNDEF/IMPLICIT_DEF are that each read may see a different value. For example, IMPLICIT_DEF can be sank into a loop and each iteration may see a different value, and known-bits analyses will also take advantage of the fact that each bit can take any value it wants. |
include/llvm/CodeGen/ISDOpcodes.h | ||
---|---|---|
180 ↗ | (On Diff #85743) | random -> arbitrary (sorry, it's a pet peeve :-) ) |
lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp | ||
2238 ↗ | (On Diff #85346) | If I understand correctly, the freeze.ll test shows you do end up with just an IMPLICIT_DEF eventually. Anyway, this is just a drive-by, I defer to Quentin and the other if they think this is reasonable. |
All these new llc tests - have you considered using utils/update_llc_test_checks.py ?
I guess this is the next patch in the queue?
This needs rebasing, and more tests to match the langref wording.
I believe vector legalization tests are missing.
Cost model handling might be missing (i would guess they should be treated as free)
This looks reasonable-ish, but i don't have much knowledge on this part, sadly.
I'm wondering if @RKSimon / @spatel / @craig.topper might want to help review this?
include/llvm/CodeGen/ISDOpcodes.h | ||
---|---|---|
179 ↗ | (On Diff #224861) | s/integer/value/ |
lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp | ||
---|---|---|
385 ↗ | (On Diff #224861) | Just use ISD::FREEZE instead of N->getOpcode() |
4021 ↗ | (On Diff #224861) | I think this should be with the other PromoteIntOp functions? I think that's how this file is sorted. |
4021 ↗ | (On Diff #224861) | Does this function even get called? The input type and the output type are the same right? So PromoteIntRes should have been called first. |
4025 ↗ | (On Diff #224861) | Same with this. We should expand the result and then we'll never have to expand the input separately. |
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
3105 ↗ | (On Diff #224861) | Would we better off just handling freeze on its own instead of adding all this aggregate code to the generic visitUnary? |
- Remove DAGTypeLegalizer::PromoteIntOp_FREEZE, DAGTypeLegalizer::ExpandIntOp_FREEZE as they are never called
- Add cost model of freeze
- Add vector legalization test to freeze-legalize.ll
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
3105 ↗ | (On Diff #224861) | I put it here as visitBinary() was also handling opcodes, but I have no special preference. |
@qcolombet - ping as per inline comment?
I'm also not really fully sold that the SelectionDAGISel::Select_FREEZE()
approach is fully future-proof/correct.
Other than that, nothing major stands out to me here.
include/llvm/Analysis/TargetTransformInfoImpl.h | ||
---|---|---|
853 ↗ | (On Diff #225827) | Also here |
lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp | ||
4055 ↗ | (On Diff #225827) | Spurious newline change |
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
3106 ↗ | (On Diff #225827) | assert(Opcode == ISD::FREEZE && "Can only get here for freeze` nodes");` |
3110 ↗ | (On Diff #225827) | for (unsigned i = 0, NumOps = Op.getNumOperands(); i < NumOps; ++i) |
lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp | ||
2238 ↗ | (On Diff #85346) | @qcolombet ping, please? |
I was looking through MachineInstr's optimizations to recheck validity of using IMPLICIT_DEF, and found that it caused problems at two cases.
- When a function call is involved, its value is not preserved.
define i32 @foo() %y1 = freeze i32 undef %k = add i32 %y1, 10 call i32 @g() %k2 = add i32 %y1, 20 ..
->
_foo: addl $10, %ebx callq _g addl $20, %eax <- this is wrong because %eax and %ebx may differ ..
- When PHI node is involved, it is folded into an incorrect value.
br i1 %cond, label %BB1, label %BB2 BB1: %y1 = freeze i32 undef %k1 = sub i32 %y1, 1 br label %END BB2: %y2 = freeze i32 undef br label %END END: %p = phi i32 [%y1, %BB1], [%y2, %BB2] %p2 = phi i32 [%k1, %BB1], [0, %BB2] store i32 %p, i32* @x store i32 %p2, i32* @y
->
LBB0_3: ## %END movl %eax, _x(%rip) movl %eax, _y(%rip) <- should be "%eax - 1" if %cond is true
But, other than these two (function call and phi), other operations seemed okay.
Arithmetic operations including add / mul didn't fold IMPLICIT_DEF in a undef-like way.
What would be a good way to address the two cases?
The latter seemed to be done by ProcessImplicitDefs.cpp / PHIElimination.cpp. The former one seems more complicated; it seems to require preservation of information about the original IMPLICIT_DEF before it is removed by later pipeline.
lib/CodeGen/SelectionDAG/FastISel.cpp | ||
---|---|---|
1573 ↗ | (On Diff #225827) | s/unsigned/Register |
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
3109 ↗ | (On Diff #225827) | 1 seems a bit small for the size |
lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp | ||
2302 ↗ | (On Diff #225827) | s/unsigned/Register/ |
2238 ↗ | (On Diff #85346) | I see one potential problem with using a COPY of IMPLICIT_DEF. Suppose an instruction has two registers operands with different register classes. A codegen pass may reasonably try to rematerialize one of the implicit_defs to the natural operand register class to avoid the copy to satisfy operand constraints |
test/CodeGen/X86/freeze.ll | ||
89 ↗ | (On Diff #225827) | I'm not sure where Phabricator gets it syntax highlighting from, but it should add the freeze keyword |
- Add FREEZE pseudoinstruction for MachineIR
- Let ExpandPostRAPseudos expand FREEZE
- Address reviewers' comments
- Add tests for function call / phi
- Add a comment to TargetPassConfig stating that after ExpandPostRAPseudos IMPLICIT_DEF should not be optimized to different registers/constants
I added FREEZE pseudoinstruction to MachineIR, as it seemed to be the succinct way to make it correct.
I made ExpandPostRAPseudos expand the FREEZE to register-copy assembly instruction. ExpandPostRAPseudos pass is done after register allocation (which, or at least a pass relevant to it, seems to replace uses of IMPLICIT_DEF with different registers) as well as ProcessImplicitDef (which is the culprit of the incorrect PHI optimization example) and PHIElimination. I left a commit that explicitly states that IMPLICIT_DEF cannot be used in an undef-y way after the pass.
- Let FastISel emit FREEZE
- Let SelectionDAGBuilder::visitFreeze lower freeze IR instructions
- Minor fixes
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
3104 | Drop this change? |
llvm/test/CodeGen/X86/freeze-mir.ll | ||
---|---|---|
1 ↗ | (On Diff #229230) | I made a separate freeze-mir.ll file for mir test because update_mir_test_checks.py did not work after freeze.ll is processed with update_llc_test_checks.py |
Hmm, thanks, i think this now looks good to me.
@craig.topper / @qcolombet / @arsenm ?
llvm/test/CodeGen/X86/freeze-call.ll | ||
---|---|---|
7–12 ↗ | (On Diff #229230) | You can avoid cfi noise via define i32 @foo() nounwind { |
llvm/include/llvm/Target/Target.td | ||
---|---|---|
1067 ↗ | (On Diff #229443) | Why isNotDuplicable? I don't think this is the best maintained instruction property |
llvm/include/llvm/Target/Target.td | ||
---|---|---|
1067 ↗ | (On Diff #229443) | It is because different freeze defs can yield different values. For example, consider following transformation: %undef = IMPLICIT_DEF %x = FREEZE %undef use(%x) use(%x) // these two uses should see the same freezed value -> %undef = IMPLICIT_DEF %x = FREEZE %undef use(%x) %x' = FREEZE %undef use(%x') // It is possible that %x and %x' are assigned differently freezed values. This transformation is incorrect, because use()s after the transformation can see different values. |
llvm/include/llvm/Target/Target.td | ||
---|---|---|
1067 ↗ | (On Diff #229443) | But these both are using the same undef vreg? The second freeze is still using the original undef, so this should be fine? |
llvm/include/llvm/Target/Target.td | ||
---|---|---|
1067 ↗ | (On Diff #229443) | To clarify, I think the property you are really looking for is not the property given to you by isNotDuplicable. What you really care about is the input operand not using a new instance of undef. The "nonduplicability" is a function of the input value and not the instruction itself |
llvm/include/llvm/Target/Target.td | ||
---|---|---|
1067 ↗ | (On Diff #229443) | The second use of the undef register can be assigned a different physical register at (perhaps) register allocation, after seeing that its definition is IMPLICIT_DEF. %undef = IMPLICIT_DEF %x = FREEZE %undef use(%x) %x' = FREEZE %undef use(%x') => %x = FREEZE $eax use(%x) %x' = FREEZE $ebx use(%x') This transformation can happen if there is a function call between them. test/CodeGen/X86/freeze-call.ll checks that it never happens. |
llvm/include/llvm/Target/Target.td | ||
---|---|---|
1067 ↗ | (On Diff #229443) | I think handling this correctly requires changing to how IMPLICIT_DEF is handled in register allocation. I don't think noduplicate is really going to model this constraint sufficiently |
llvm/include/llvm/Target/Target.td | ||
---|---|---|
1067 ↗ | (On Diff #229443) | It would make complexity of this patch (lowering of freeze IR instruction) easier, but would the change in register allocation need performance tests? isNotDuplicable's comment says class MCInstrDesc { ... /// Return true if this instruction cannot be safely /// duplicated. For example, if the instruction has a unique labels attached /// to it, duplicating it would cause multiple definition errors. bool isNotDuplicable() const { return Flags & (1ULL << MCID::NotDuplicable); } so I thought non-duplicability was a property of the freeze instruction itself. |
llvm/include/llvm/Target/Target.td | ||
---|---|---|
1067 ↗ | (On Diff #229443) | I mean the definition of IMPLICIT_DEF needs to change to support this. I think we may actually need two different flavors of IMPLICIT_DEF. noduplicate is a stronger barrier to useful transforms, and also doesn't model the real problem here. If you started out with two separate freeze instructions reading the same IMPLICIT_DEF register, you would still see the same problem in your example. |
llvm/include/llvm/Target/Target.td | ||
---|---|---|
1067 ↗ | (On Diff #229443) |
Understood. So we can have two kinds of IMPLICIT_DEFs; the first one is an instruction that can be optimized to different values whenever used. The second one has the same value whenever it is used. I think the second IMPLICIT_DEF is equivalent to FREEZE (IMPLICIT_DEF of the first kind), because freeze is an instruction that guarantees all uses of the value see the same value even though the operand is an undef-like value.
If the program originally contained two separated freezes, then it is okay, because the program is originally written in that way. The programmer must have written down two freezes in LLVM IR bitcode, and they are lowered into MIR. %undef = IMPLICIT_DEF %x = FREEZE %undef use(%x) use(%x) // these two uses see the same value with help of freeze. -> %undef = IMPLICIT_DEF %x = FREEZE %undef use(%x) %x' = FREEZE %undef use(%x') // Now %x and %x' can be assigned different values, so this optimization is wrong. |
llvm/include/llvm/Target/Target.td | ||
---|---|---|
1067 ↗ | (On Diff #229443) | I think it's important to define the rules here clearly for MIR, and not rely on expectations from the original IR. Some legalization for example may end up inserting multiple freezes of the same input value, and that should work correctly. Multiple freezes of the same input register should produce the same value. |
llvm/include/llvm/Target/Target.td | ||
---|---|---|
1067 ↗ | (On Diff #229443) | I agree, I'll write the rule for freeze. %x = ... // invalid type, say i50 %y = freeze %x => %x1 = ... // i32 %x2 = ... // i18 %y1 = freeze %x1 %y2 = freeze %x2 |
I added description about FREEZE and IMPLICIT_DEF to TargetOpcodes.def.
To address the legalization issue, updated the semantics so FREEZEs return the same value if placed consecutively. Would this properly address the issue?
I'm sorry, it seems the track of this patch was lost.
We really need to get this last bit of freeze support landed.
Can anyone spot any remaining issues with the patch?
llvm/include/llvm/Support/TargetOpcodes.def | ||
---|---|---|
61 ↗ | (On Diff #230899) | constraint -> constrain |
The patch & semantics look good to me, but I'm not a backend expert. I'll leave the final LGTM to someone else.
It would be awesome if we could get this in for 10.0 so that we have complete support for freeze.
To synchronize - the remaining issue was about duplicability of the freeze instruction in MIR.
In IR, a duplicated freeze may result in a different value due to the nature of undef. However, in MIR, this may be problematic in some cases.
My suggestion is to allow freeze in MIR to yield the same value when they are consecutively arranged.
This involved clarificaton of when the value of IMPLICIT_DEF could change during execution, and the updated comments of TargetOpcodes.def reflects that.
Currently, isNotDuplicable is still set to 1 to leave it as conservative & copiers deal with freeze specially in the future.
BTW, there was a discussion about IMPLICIT_DEF as well, about whether it should return consistent value for every use vs. it can return different values per use (as IR's undef).
Currently IMPLICIT_DEF can yieid different values, and there are a few cases where that happens (such as using IMPLICIT_DEF across function calls).
Making it return consistent value will allow FREEZE to be duplicable, but it would require fixing several transformations that deal with IMPLICIT_DEF.
Before discussion about IMPLICIT_DEF is further made and it is changed to return concrete value, I'd like to suggest leaving FREEZE as semi-duplicable (as suggested by this patch), but not fully duplicable.
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp | ||
---|---|---|
1135 ↗ | (On Diff #237090) | If we expand FREEZE in ExpandPostRAPseudos, how can we reach this code? |
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp | ||
---|---|---|
1135 ↗ | (On Diff #237090) | It was a legacy code to see the freeze comment from assembly before updating ExpandPostRAPseudos. I removed it. |
For the next step, what can I do?
If duplicability is the main concern, I can split this patch so (1) FREEZE is defined to be duplicable first, and (2) when there happens a practical case where duplication matters, the next one is created.
It will reduce diffs in TargetOpcodes.def as well because the comment update at IMPLICIT_DEF is not needed anymore.
ping
If adding FREEZE to MachineIR causes maintenance cost, another possible solution is to lower SelDag's FREEZE(UNDEF) to IMPLICIT_DEF directly.
In this option, FREEZE only exist in SelDag, not in MIR.
Instead, now IMPLICIT_DEF in MIR should always yield the same value. This requires a few transformations regarding IMPLICIT_DEF in MIR to be fixed.
This may cause suboptimal codegen in a few cases, but I believe we have a solution for the cases, because a single UNDEF from SelDag can be always lowered into multiple IMPLICIT_DEFs (per each use) in MIR.
This seems good to me, but you want to wait for someone else more knowledgeable..
ping @arsenm @qcolombet @craig.topper
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h | ||
---|---|---|
65–68 ↗ | (On Diff #241401) | This is unrelated, please feel free to commit this right away. |
Leave the SelDag patch only, MachineIR part will be made as a separate patch (as discussed at D76483)
llvm/test/CodeGen/X86/freeze-legalize.ll | ||
---|---|---|
9–13 | Do we need to add basic simplify / constant folding for SDAG ? |
llvm/test/CodeGen/X86/freeze-legalize.ll | ||
---|---|---|
9–13 | Yes, I agree it will be great. Do you want to make this patch contain the change as well? |
llvm/test/CodeGen/X86/freeze-legalize.ll | ||
---|---|---|
9–13 | Or I can land this first, and add the simplify / constant folding for SDAG. |
llvm/test/CodeGen/X86/freeze-legalize.ll | ||
---|---|---|
9–13 | I prefer smaller patches too. Let's make that a follow-up. Is it correct that there is very little chance that this patch will create a visible performance regression (because there should be almost no freeze instruction creation in IR yet)? |
llvm/test/CodeGen/X86/freeze-legalize.ll | ||
---|---|---|
9–13 | Yes, it is. |
llvm/test/CodeGen/X86/freeze-legalize.ll | ||
---|---|---|
9–13 | That sounds good then. But can we avoid those 3 regressions cases before or within this patch? Ideally, we don't want to knowingly regress anything. |
llvm/test/CodeGen/X86/freeze-legalize.ll | ||
---|---|---|
9–13 | Among 3 files, two were simple regressions that had a bit more verbose assembly: ... sete %al orb %bpl, %al jne .LBB9_1 => ... sete %al orb %bpl, %al testb $1, %al jne .LBB9_1 ... testb $1, %al je .LBB0_2 => ... movl %eax, %ecx andl $1, %ecx je .LBB0_2 Case 3's assembly diff was bigger, so needs inspection. I can visit it after the simpler two cases are resolved. |
@bkramer committed a fix for a crash in LegalizeFloatTypes where this operator appeared in SoftPromoteHalfResult
Our downstream ARM compiler is crashing in regressions in a similar manner in SoftenFloatResult in the same file. Is it not conceivable that ISD::FREEZE might make it to this function? If that's the case, we can look into it and start up a new review if necessary. The test causing this crash is select-cc.ll.
Hi, I saw @bkramer 's fix too (commit id 0019c2f194a5e1f4cd65c5284e204328cc40ab3d ), and I think it is a valid fix. Similar thing can be applied to the SoftenFloatResult function. I'll make a patch for this.
Why doesn't this TargetOpcode::Freeze?