This is an archive of the discontinued LLVM Phabricator instance.

Guard FMF context by excluding some FP operators from FPMathOperator
ClosedPublic

Authored by mcberg2017 on Aug 22 2018, 5:16 PM.

Details

Summary

Some FPMathOperators succeed and the retrieve FMF context when they never have it, we should omit these cases to keep from removing FMF context.

For instance when we visit some FPMathOperator mapped Instructions which never have FMF flags and a Node was associated which does have FMF flags, that Node today will have all its flags cleared via the intersect operation. With this change, we exclude associating Nodes that never have FPMathOperator status under FMF.

Diff Detail

Event Timeline

mcberg2017 created this revision.Aug 22 2018, 5:16 PM

The attached test is a x86 approximation of internal tests, altered for platform, which show this problem which occurs frequently on many FMF optimization contexts.

Comments, alternatives?

Adding Amara as reviewer. The AnyDefined flag bit was added with:
D32527

It's not clear to me how copying flags to a node does not define them. It might help if you explain what is happening in the test. Also, is that the minimal test? I don't understand why we need vector ops and type conversions. It would be better to show the complete asm output using utils/update_llc_test_checks.py, and this patch should show the diff from the output of trunk.

mcberg2017 added a comment.EditedAug 30 2018, 2:12 PM

Copying all available flags as false should not change the initial state of the object after its creation. It only does so because we are using AnyDefined as a sticky bit for assignment of any flag regardless of value. My notion is that any true value indicates that at least a value is set to something other than a default value (false). The test conditions occurs in partial vectorization and in our case, after scalarization on gpus where interior math operations occur. It happens frequently in our cases. The problem is when an instruction is associated to a Node that does have flags and where the Instruction it is associated to does not. This only ever occurs where the association is disjunct, meaning the Node and the Instruction originated differently. Generally its the other way around, where Node is empty and we are inheriting flags from an Instruction as the Instruction originated the Node in question. All other operations of intersect are done from Node to Node, ergo this change does not modify that behavior, that only occurs in this context.

I will have both with and without assembler example shortly, added here. As for the test case, it is pretty close to minimal with context needed for association, but I will see if I can shrink it a bit after the assembler is up.

mcberg2017 added a comment.EditedAug 30 2018, 3:12 PM

without the change:

pushl %eax
.cfi_def_cfa_offset 8
vmovss 12(%esp), %xmm0 # xmm0 = mem[0],zero,zero,zero
vmulss 8(%esp), %xmm0, %xmm1
vaddss %xmm0, %xmm1, %xmm0
vmovss %xmm0, (%esp)
flds (%esp)
popl %eax
.cfi_def_cfa_offset 4
retl

with the change:

pushl %eax
.cfi_def_cfa_offset 8
vmovss 12(%esp), %xmm0 # xmm0 = mem[0],zero,zero,zero
vfmadd132ss 8(%esp), %xmm0, %xmm0 # xmm0 = (xmm0 * mem) + xmm0
vmovss %xmm0, (%esp)
flds (%esp)
popl %eax
.cfi_def_cfa_offset 4
retl

I have a smaller test case now, its baseline analog should show up in the tree in a bit...

With the test file sync'd

So with this change, if you have:

StrictInst = an FP instruction with no flags.
FastInst = an FP instruction with a flag.
SDNode A;
A.copyFMF(StrictInst);
SDNode B;
B.copyFMF(FastInst);
B.intersectWith(A);

Hit submit too early:
So with the above code B will not have the flag cleared? If that's the case it doesn't seem right to me.

Correctly stated, but out of context. Some clarity first. The kinds of instructions in our case are neither strict nor fast, they never carry flags. Neither scalarization nor partial vectorization should cause FMF obliteration. Perhaps we need additional mechanisms?

Another data point, this change caused no drift in the lit tests, meaning none of our FMF opts managed in the lit tests are modified by the behavior.

I’m not seeing how FP instructions can not carry flags. The IR semantics are defined to be strict FP unless explicitly relaxed.

The point of the AnyDefined state bit is to avoid pessimizing when the actual flags object has not been explicitly written to, e.g default constructed. When instantiating from a math operator that must obey strict semantics by default, the flags must also reflect those same semantics. Now if FPMathOperator also has a 3rd “undefined state” then it’s different.

In the newly created test case above, as checked in for test/CodeGen/X86/intersect-fma-fmf.ll, we get precisely this scenario. From a debug session breakpointed in llvm::SelectionDAGBuilder::visit at the line containing Node->intersectFlagsWith(IncomingFlags);
The Instruction in question is: %tmp11 = extractelement <4 x float> %tmp10, i32 1 as seen by the debugger, the Node we are associating it to is: t7: f32 = fadd nnan ninf nsz arcp contract afn reassoc t6, t5.
The extractelement is an Instruction that never carries flags and qualifies under

if (auto *FPMO = dyn_cast<FPMathOperator>(&I))

as a FPMathOperator, so it looks like we have the 3rd type. SelectionDAGBuilder::visitExtractElement results in the association under setValue.

Ok, to me it seems like instructions for which fast math flags can't apply shouldn't be classed as FPMathOperators. insertelement/extractelement are just moving raw bits. If we disallowed those instruction types in the FPMathOperator classof() then this problem should go away, and we don't incorrectly relax FP semantics like in the case I pointed out.

Thanks, that is a good alternative. I am testing it internally to see I get similar coverage. More later...

mcberg2017 updated this revision to Diff 163890.Sep 4 2018, 1:28 PM

I added one more non fmf instruction and there is room to add others if needed.

I added one more non fmf instruction and there is room to add others if needed.

There's also discussion about the definition of FPMathOperator and the relation to FMF here:
https://bugs.llvm.org/show_bug.cgi?id=38086
D51646

I added one more non fmf instruction and there is room to add others if needed.

There's also discussion about the definition of FPMathOperator and the relation to FMF here:
https://bugs.llvm.org/show_bug.cgi?id=38086
D51646

Thanks.

I think this approach makes sense, given that the langref only defines a restricted few instructions to carry fast-math flags, not the entire group that the current FPMathOperator allows. But I'd like someone else to check it over. @hfinkel does this look reasonable?

Hal/Sanjay could one of you have a look at this code, it would be great to wrap it up...

spatel added inline comments.Sep 11 2018, 9:37 AM
include/llvm/IR/Operator.h
367–386

I don't see the benefit of treating Instruction/ConstantExpr separately when the logic is identical for both and likely to keep growing.

Ie, can we do this:

static bool classof(const Value *V) {
  unsigned Opcode;
  if (auto *I = dyn_cast<Instruction>(V))
    Opcode = I->getOpcode();
  else if (auto *CE = dyn_cast<ConstantExpr>(V))
    Opcode = CE->getOpcode();
  else
    return false;

  switch (Opcode) {
  case Instruction::FCmp:
    return true;
  // Vector operators may have FP type, but they don't do math.
  case Instruction::ExtractElement:
  case Instruction::ShuffleVector:
  case Instruction::InsertElement:
    return false;
  default:
    return V->getType()->isFPOrFPVectorTy();
  }
}

Collapsed for single version...

mcberg2017 retitled this revision from make copyFMF consistent with AnyDefined for detection of any FMF flag set to true to Guard FMF context by excluding some FP operators from FPMathOperator.Sep 11 2018, 2:28 PM
mcberg2017 edited the summary of this revision. (Show Details)
spatel accepted this revision.Sep 11 2018, 3:11 PM

LGTM - I do wonder if it would be better to white-list just the FP ops that we know can carry FMF though (ie, make this consistent with IR verification)...unless FPMathOperator is used for some other purposes.

This revision is now accepted and ready to land.Sep 11 2018, 3:11 PM
This revision was automatically updated to reflect the committed changes.