This is an archive of the discontinued LLVM Phabricator instance.

[legalize-types] Clean up softening machinery.
ClosedPublic

Authored by ayartsev on Apr 11 2017, 9:35 AM.

Details

Summary

Hi all,

The attached patch is a replacement for D29265. It gets rid of special treating of SoftenedFloats and normalizes softening (makes it's logic similar to promotion, expansion, e.t.c). The patch is an effort to simplify and clarify legalization and to make perhaps the first step towards using a single map for all replacements (the idea from D29265).

Please review.

Diff Detail

Repository
rL LLVM

Event Timeline

ayartsev created this revision.Apr 11 2017, 9:35 AM
chh edited edge metadata.Apr 11 2017, 12:21 PM

I assume that this change is a clean up without functionality impact.
We should test this with Android code for the x86_64 f128 types.
I could do that later.

chh added inline comments.Apr 11 2017, 12:45 PM
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
115 ↗(On Diff #94832)

Was there a problem to fix with the new condition (R.getNode() != N)?
I vaguely remember that x86_64 f128 type implementation might depend on the ability to set a note as softened without changing the node.

813 ↗(On Diff #94832)

Why not skipping CopyToReg now?
If it missed some case, could you add a test case?

chh added a comment.Apr 13 2017, 11:30 AM


The attached c.zip contains a test program s_fdim-45daeb.c,
generated from Android bionic/libm/upstream-freebsd/lib/msun/src/s_fdim.c
The compilation command in c.sh failed with this patch.
Clang aborted with the following message:

fatal error: error in backend: Cannot select: t7: f128 = fsub t2, t4
  t2: f128,ch = CopyFromReg t0, Register:f128 %vreg2
    t1: f128 = Register %vreg2
  t4: f128,ch = CopyFromReg t0, Register:f128 %vreg3
    t3: f128 = Register %vreg3
In function: fdiml

I think the change to COPY_TO_REG is causing the error.

chh requested changes to this revision.Apr 14 2017, 11:39 AM
This revision now requires changes to proceed.Apr 14 2017, 11:39 AM
ayartsev updated this revision to Diff 95801.Apr 19 2017, 12:24 PM
ayartsev edited edge metadata.
ayartsev marked 2 inline comments as done.

Thanks for the review.
Addressed the issue, updated the patch.

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
115 ↗(On Diff #94832)

No problems arose. In case when a softened node is not in the SoftenedFloats map GetSoftenedFloat() just returns the key node.

813 ↗(On Diff #94832)

CopyToReg is now handled in SoftenFloatOperand(). CanSkipSoftenFloatOperand() only handles opcodes that were not handled by SoftenFloatOperand().

chh added a comment.Apr 19 2017, 1:38 PM

Does the new patch work with my previous attached c.zip?
I don't see new changes related to CopyToReg.

Again I worry about the new changes to FABS, FCOPYSIGN, and SELECT.
Why do you need those changes, but not FNEG, Register, etc.?
Those op codes used to cause problems with Android x86_64 fp128 types.
That's why I have them and some other opcodes in the
CanSkipSoftenFloatOperand function.

Does the new test soft-fp-legal-in-HW-reg.ll catch any error?
It should have some CHECK tags to verify compiled output.
It should also be tested with the android x86_64 mode, like:

RUN: llc < %s -O2 -mtriple=x86_64-linux-android -mattr=+mmx \
RUN:     -enable-legalize-types-checking | FileCheck %s --check-prefix=MMX
RUN: llc < %s -O2 -mtriple=x86_64-linux-gnu -mattr=+mmx \
RUN:     -enable-legalize-types-checking | FileCheck %s --check-prefix=MMX

Please see other tests in fp128-*.ll.
Thanks.

Does the new patch work with my previous attached c.zip?
I don't see new changes related to CopyToReg.

I've managed to reproduce the error from your attached c.zip. Here is the minimal test derived from the fdiml() implementation (the origin of the error as seen from the error message):

long double fn(long double x, long double y) {
  return (x > y ? x - y : 0.0);
}

The problem was not related to CopyToReg but to handling of ISD::SELECT. Before this patch operands of ISD::CopyToReg, ISD::SELECT, etc. were implicitly replaced by ReplaceSoftenFloatResult() called from SoftenFloatResult(). Now operands are explicitly replaced by their softened versions (if any) in SoftenFloatOperand().

Again I worry about the new changes to FABS, FCOPYSIGN, and SELECT.
Why do you need those changes, but not FNEG, Register, etc.?
Those op codes used to cause problems with Android x86_64 fp128 types.
That's why I have them and some other opcodes in the
CanSkipSoftenFloatOperand function.

Operands of CopyToReg, FABS, FCOPYSIGN, and SELECT may have been softened and must be replaced by SoftenFloatOperand() as I mentioned above. As for other opcodes - FNEG, Register, etc. - I'm currently failing to write regression tests for this opcodes. It may be impossible to have softened operands for this opcodes.

Does the new test soft-fp-legal-in-HW-reg.ll catch any error?
It should have some CHECK tags to verify compiled output.

The new test checks that stderr is empty and we don't end up with "fatal error: Cannot select:...". Similar tests exist among LLVM tests, look at \test\CodeGen\X86\xmm-r64.ll for example.

It should also be tested with the android x86_64 mode

Tested with -mtriple=x86_64-linux-android. The result remains the same. The patch cause no regression in LLVM tests. Do you have a chance to test the new patch over Android codebase?

Thanks for the feedback!

chh added a comment.Apr 19 2017, 4:29 PM

Maybe some problem with CopyToReg is fixed with new changes to SELECT,
but there are other errors I found from Android libm.
Please try the attached x.zip, which has three preprocessed files that
this new patch will fail to compile:

/tmp/cpowl-3f7b0a.c

fatal error: error in backend: Cannot select: t5: f128 = fmul t2, t4

t2: f128,ch = CopyFromReg t0, Register:f128 %vreg1
  t1: f128 = Register %vreg1
t4: f128,ch = CopyFromReg t0, Register:f128 %vreg5
  t3: f128 = Register %vreg5

In function: cpowl

/tmp/ccosl-afde3f.c

fatal error: error in backend: Cannot select: t53: f128 = fmul t51, t80

t51: f128,ch,glue = CopyFromReg t50, Register:f128 %XMM0, t50:1
  t25: f128 = Register %XMM0
  t50: ch,glue = callseq_end t49, TargetConstant:i64<0>, TargetConstant:i64<0>, t49:1
    t23: i64 = TargetConstant<0>
    t23: i64 = TargetConstant<0>
    t49: ch,glue = X86ISD::CALL t47, TargetGlobalAddress:i64<fp128 (fp128)* @sinl> 0 [TF=6], Register:f128 %XMM0, RegisterMask:Untyped, t47:1
      t48: i64 = TargetGlobalAddress<fp128 (fp128)* @sinl> 0 [TF=6]
      t25: f128 = Register %XMM0
      t32: Untyped = RegisterMask
      t47: ch,glue = CopyToReg t46, Register:f128 %XMM0, t35
        t25: f128 = Register %XMM0
        t35: f128,ch = load<LD16[%9]> t34, FrameIndex:i64<-1>, undef:i64
          t3: i64 = FrameIndex<-1>
          t9: i64 = undef
t80: f128,ch = load<LD16[%6](dereferenceable)> t50, FrameIndex:i64<2>, undef:i64
  t16: i64 = FrameIndex<2>
  t9: i64 = undef

In function: ccosl

/tmp/e_hypotl-9d5113.c

fatal error: error in backend: Cannot select: t20: f128 = fmul t19, t19

t19: f128,ch = CopyFromReg t0, Register:f128 %vreg32
  t18: f128 = Register %vreg32
t19: f128,ch = CopyFromReg t0, Register:f128 %vreg32
  t18: f128 = Register %vreg32

In function: hypotl

I will be on vacation for about 1 week.
Probably won't be able to help you test new changes before I come back.

Please note that even if clang does not abort, like your test case show,
we should check if the output code actually put fp128 values in SSE registers
as before. It's not just an optimization but a required calling convention in
many cases.

chh requested changes to this revision.Apr 19 2017, 4:30 PM
This revision now requires changes to proceed.Apr 19 2017, 4:30 PM
ayartsev updated this revision to Diff 96782.Apr 26 2017, 10:44 AM
ayartsev edited edge metadata.

Great! Here ISD::FNEG caused the problem at least in two tests. Updated the patch. Hope this was the last regression.

Please note that even if clang does not abort, like your test case show,
we should check if the output code actually put fp128 values in SSE registers
as before.

Done.
Compared the output code generated from the soft-fp-legal-in-HW-reg.ll before and after the patch - the code is the same.

chh added a comment.Apr 26 2017, 2:57 PM

Before making more changes and running more tests,
could you update the subject/summary or rethink purpose of this change?
I still don't understand all the reasons and scope.

(1) What is the "single map"?
I don't see elimination or unification of any map.

(2) It is not easy to see that all these changes to SoftenedFloats
are required to replace D29265.

D29265 was a simple change to remove stale entries to pass expensive checks.
The following change in SoftenFloatResult seems to be the replacement:

do not call SetSoftenedFloat(SDValue(N, ResNo), R) when (R.getNode() == N).

The move of some opcode out of CanSkipSoftenFloatOperand seems unrelated.

If SetSoftenedFloat(SDValue(N, ResNo), R) is not called, and no "stale"
entry is created, why do we need the code pattern in new SoftenFloatOp_*?
It says in general:

if (GetSoftenedFloat(operand) == operand)
  return SDValue();

When can (GetSoftenedFloat(operand) == operand) happen?
I know that the pattern must be needed or some test case will fail,
but it is just difficult to understand without each test case.

IMHO, I think it is easier to understand if we allow

SetSoftenedFloat(SDValue(N, ResNo), R) when (R.getNode() == N),

and

(GetSoftenedFloat(operand) == operand).

Then, the problem we have is just some "redundant" or "stale" entries
that can be easily removed by D29265.

(3) For whatever reason to change CanSkipSoftenFloatOperand,
why only some but not all cases are moved out of CanSkipSoftenFloatOperand?
It's difficult to prove that ISD::Register, ISD::ConstantFP,
and ISD::CopyFromReg do not deserve the same treatment.

Well, I'm still thinking of not changing CanSkipSoftenFloatOperand at all.
The new change requires more computation and code, and the benefit is not
seen yet without the "use a single map".

If you look at other legalization functions (ExpandFloatResult/Operand, PromoteIntegerResult/Operand, e.t.c) you can see that they all share the same logic in node processing: functions, that handle node results (e.g. ExpandFloatResult) perform their actions and place the resulting node to the corresponding map (e.g. ExpandFloats) for further use, while functions that process operands (e.g. ExpandFloatOperand) perform their action and replace the old nodes with the new ones (that is update the ReplacedValues map).
It is also seen from the code that for a given node legalization of node results preceed processing of this node as operand of another node, so values from maps like SoftenFloats (placed there by result legalizing routines) will be used for replacement/subtitution, if needed, by operand processing routines in the fullness of time.
SoftenFloatResult() routine design breaks this scheme - it puts a result to the SoftenFloats map AND replaces this result in place by calling ReplaceSoftenFloatResult() for dubious benefit. It forces us to specially remember and skip all this replacement cases in SoftenFloatOperand() by calling CanSkipSoftenFloatOperand(). It also makes a key present both in SoftenFloats and ReplacedValues maps under some circumstances that expensive check detects. It took an effort to understand whats going on. D29265 doesn't fix the problem with the same value in multiple maps and just makes the code more unclear.
Actually, softening really slightly differs from other legalization routines, its trait is that nothing should be done with the node result (except ISD::LOAD) if the result can be kept in HW registers. This trait gave the idea for, IMHO, unsuccessful improvement with replacing a result in place by calling ReplaceSoftenFloatResult().
So my patch makes SoftenFloatResult/Operand logic just the same as all other legalization routines have: SoftenFloatResult() fills the SoftenFloats map and SoftenFloatOperand() perform all needed replacements. With the special case for values, that can be kept in HW registers - in this case SoftenFloatResult() just does nothing. I intended to get rid of calling SoftenFloatResult() for HW suitable values at all, but ISD::LOAD creates a precedent. I've also analyzed all usages of the SoftenFloats map and didn't find any need for holding values that map to itself.
Hope this answers most of your questions.

(1) What is the "single map"?
I don't see elimination or unification of any map.

I ment unification of the node processing logic as described above.

D29265 was a simple change to remove stale entries to pass expensive checks.
The following change in SoftenFloatResult seems to be the replacement:

do not call SetSoftenedFloat(SDValue(N, ResNo), R) when (R.getNode() == N).

At first I've made ReplaceSoftenFloatResult() return 'true' in case if it performes a replacement and to return 'false' otherwise and tried many different combinations of conditions here and each one caused the regression in some spacial cases and forced me to complicate the logic. This experiments led me to understanding that this is not the right place to fix the problem.

If SetSoftenedFloat(SDValue(N, ResNo), R) is not called, and no "stale"
entry is created, why do we need the code pattern in new SoftenFloatOp_*?
It says in general:

if (GetSoftenedFloat(operand) == operand)
return SDValue();

When can (GetSoftenedFloat(operand) == operand) happen?

(GetSoftenedFloat(operand) == operand) mean that SoftenFloatResult() left this 'operand' node results unchanged (because of legality in HW) and there is nothing to do with this node here in SoftenFloatOperand() so we return SDValue() to tell that the node no need to be processed.

(3) For whatever reason to change CanSkipSoftenFloatOperand,
why only some but not all cases are moved out of CanSkipSoftenFloatOperand?
It's difficult to prove that ISD::Register, ISD::ConstantFP,
and ISD::CopyFromReg do not deserve the same treatment.

Yes, you are right, all this cases should be also moved out of CanSkipSoftenFloatOperand(). I'll do this in the new patch in case you accept my solution.

chh added a comment.May 12 2017, 4:32 PM

I think now I understand your goal, which still seems not exactly
explained by "use a single map for replacements".

Please take a look of my patches
https://reviews.llvm.org/D15134
https://reviews.llvm.org/D11438
and their review history.

Current code is a compromise of various reviewer comments and
to avoid larger risky changes of the existing legalization process.
When I worked on those patches I also wished to have some clean up
of the legalization and type "soften" process. Other reviewers also
have suggested what you are trying to do now, but we never have
found a simpler and low risk change.

The difficulty was due to our limited knowledge of the
current complicated "type legalization" code,
and our limited resource to test on all targets.

I tested this patch against Android open source today and it failed to
compile 15 source files in bionic/libm/upstream-freebsd/lib/msun/src
for the x86_64 target. We need more consistent changes to all those
switch cases, to have a better chance to avoid errors. Still, it is
quite time consuming to go through one Android build test.

From those 15 failed cases,
I copy only 3 instruction selection fatal errors like:

bionic/libc/upstream-openbsd/lib/libc/gdtoa/hdtoa.c

fatal error: error in backend: Cannot select: 0x5d43570: i32 = truncate 0x5d43a50

0x5d43a50: f128 = srl 0x5d48468, Constant:i8<112>
  0x5d48468: f128,ch,glue = CopyFromReg 0x5d48c20, Register:f128 %XMM0, 0x5d48c20:1
    0x5d48878: f128 = Register %XMM0
    0x5d48c20: ch,glue = callseq_end 0x5d43368, TargetConstant:i64<0>, TargetConstant:i64<0>, 0x5d43368:1
      0x5d488e0: i64 = TargetConstant<0>
      0x5d488e0: i64 = TargetConstant<0>
      0x5d43368: ch,glue = X86ISD::CALL 0x5d43c58, TargetExternalSymbol:i64'__multf3' [TF=6], Register:f128 %XMM0, Register:f128 %XMM1, RegisterMask:Untyped, 0x5d43c58:1
        0x5d48a80: i64 = TargetExternalSymbol'__multf3' [TF=6]
        0x5d48878: f128 = Register %XMM0
        0x5d50758: f128 = Register %XMM1
        0x5d44270: Untyped = RegisterMask
        0x5d43c58: ch,glue = CopyToReg 0x5d48810, Register:f128 %XMM1, 0x5d49030, 0x5d48810:1
          0x5d50758: f128 = Register %XMM1
          0x5d49030: f128,ch = load<LD16[ConstantPool]> 0x5c5f028, 0x5d49238, undef:i64
            0x5d49238: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<fp128 0xL00000000000000004201000000000000> 0
              0x5d440d0: i64 = TargetConstantPool<fp128 0xL00000000000000004201000000000000> 0
            0x5d50620: i64 = undef
          0x5d48810: ch,glue = CopyToReg 0x5d43300, Register:f128 %XMM0, 0x5d43d90
            0x5d48878: f128 = Register %XMM0
            0x5d43d90: f128,ch = CopyFromReg 0x5c5f028, Register:f128 %vreg108
              0x5d48dc0: f128 = Register %vreg108
  0x5d48fc8: i8 = Constant<112>

In function: __hldtoa

bionic/libm/upstream-freebsd/lib/msun/src/s_cbrtl.c
fatal error: error in backend: Cannot select: 0x3ed5838: i64 = truncate 0x3ed5b10

0x3ed5b10: f128 = srl 0x3ed5de8, Constant:i8<64>
  0x3ed5de8: f128,ch,glue = CopyFromReg 0x3ed6058, Register:f128 %XMM0, 0x3ed6058:1
    0x3eda090: f128 = Register %XMM0
    0x3ed6058: ch,glue = callseq_end 0x3ed6128, TargetConstant:i64<0>, TargetConstant:i64<0>, 0x3ed6128:1
      0x3eda028: i64 = TargetConstant<0>
      0x3eda028: i64 = TargetConstant<0>
      0x3ed6128: ch,glue = X86ISD::CALL 0x3ed6398, TargetExternalSymbol:i64'__multf3' [TF=6], Register:f128 %XMM0, Register:f128 %XMM1, RegisterMask:Untyped, 0x3ed6398:1
        0x3ed5d80: i64 = TargetExternalSymbol'__multf3' [TF=6]
        0x3eda090: f128 = Register %XMM0
        0x3ed9fc0: f128 = Register %XMM1
        0x3ed54f8: Untyped = RegisterMask
        0x3ed6398: ch,glue = CopyToReg 0x3ed6400, Register:f128 %XMM1, 0x3ed5490, 0x3ed6400:1
          0x3ed9fc0: f128 = Register %XMM1
          0x3ed5490: f128,ch = load<LD16[ConstantPool]> 0x3e2ed08, 0x3ed5970, undef:i64
            0x3ed5970: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<fp128 0xL00000000000000004201000000000000> 0
              0x3ed5560: i64 = TargetConstantPool<fp128 0xL00000000000000004201000000000000> 0
            0x3ed5b78: i64 = undef
          0x3ed6400: ch,glue = CopyToReg 0x3ed57d0, Register:f128 %XMM0, 0x3ed5700
            0x3eda090: f128 = Register %XMM0
            0x3ed5700: f128,ch = CopyFromReg 0x3e2ed08, Register:f128 %vreg21
              0x3eda160: f128 = Register %vreg21
  0x3eda0f8: i8 = Constant<64>

In function: cbrtl

bionic/libm/upstream-freebsd/lib/msun/src/e_hypotl.c
fatal error: error in backend: Cannot select: 0x44159a0: ch = store<ST8[FixedStack11+8], trunc to i64> 0x4368b88, 0x44197d0, 0x4415868, undef:i64

0x44197d0: f128 = srl 0x4415590, Constant:i8<64>
  0x4415590: f128,ch,glue = CopyFromReg 0x44151e8, Register:f128 %XMM0, 0x44151e8:1
    0x44156c8: f128 = Register %XMM0
    0x44151e8: ch,glue = callseq_end 0x44198a0, TargetConstant:i64<0>, TargetConstant:i64<0>, 0x44198a0:1
      0x4419d18: i64 = TargetConstant<0>
      0x4419d18: i64 = TargetConstant<0>
      0x44198a0: ch,glue = X86ISD::CALL 0x44150b0, TargetExternalSymbol:i64'__addtf3' [TF=6], Register:f128 %XMM0, Register:f128 %XMM1, RegisterMask:Untyped, 0x44150b0:1
        0x4419ff0: i64 = TargetExternalSymbol'__addtf3' [TF=6]
        0x44156c8: f128 = Register %XMM0
        0x4415800: f128 = Register %XMM1
        0x4414f78: Untyped = RegisterMask
        0x44150b0: ch,glue = CopyToReg 0x4419b10, Register:f128 %XMM1, 0x441a400, 0x4419b10:1
          0x4415800: f128 = Register %XMM1
          0x441a400: f128,ch = CopyFromReg 0x4368b88, Register:f128 %vreg25
            0x4419b78: f128 = Register %vreg25
          0x4419b10: ch,glue = CopyToReg 0x441a398, Register:f128 %XMM0, 0x441a400
            0x44156c8: f128 = Register %XMM0
            0x441a400: f128,ch = CopyFromReg 0x4368b88, Register:f128 %vreg25
              0x4419b78: f128 = Register %vreg25
  0x4415798: i8 = Constant<64>
0x4415868: i64 = or FrameIndex:i64<11>, Constant:i64<8>
  0x441a4d0: i64 = FrameIndex<11>
  0x4414c38: i64 = Constant<8>
0x4414ca0: i64 = undef

In function: hypotl

RKSimon resigned from this revision.Jun 1 2017, 2:24 AM
ayartsev retitled this revision from [legalize-types] Make softening result use a single map for replacements. to [legalize-types] Clean up softening machinery..Jun 28 2017, 5:25 PM
ayartsev updated this revision to Diff 104555.Jun 28 2017, 6:40 PM

Updated the patch, changed the title.

Please take a look of my patches
https://reviews.llvm.org/D15134
https://reviews.llvm.org/D11438
and their review history.

I looked through your patches and review history, great job! I'm feeling on the right way.

I copy only 3 instruction selection fatal errors like:

Failing to reproduce either of them, in my case the compilation succeed and I'm getting DAGs different from shown in error dumps. I've used command line args from 'c.sh' from your attach 'c.zip' above and preprocessed 'e_hypotl-9d5113.c' from your attach 'x.zip' (other two sources 'hdtoa.c' and 's_cbrtl.c' I've prepared myself based on the 'e_hypotl-9d5113.c').

What confuses me is that all this sources fail on the "f128 = srl 0x......., Constant:i8<64>" instruction, that takes and returns 'f128' type, but I've failed to invent a test to make codegen emit 'srl' instruction with float argument/return value. Neither test in LLVM codebase has such an example. Neither my output DAGs for failing tests had this float 'srl'. Maybe you know, how to make codegen emit f128' srl? Or could you, please, test the new patch one more time and, if problems persist, send me the script with Clang command line for the failing source and the preprocessed source itself?
Sorry for the long delay and thank you for looking at this.

chh added a comment.Jun 30 2017, 9:52 AM

Still running some Android tests.
This new diff looks good.
I only found one error in soft-fp-legal-in-HW-reg.ll so far.

test/CodeGen/X86/soft-fp-legal-in-HW-reg.ll
2 ↗(On Diff #104555)

s/RUN/RUN:/

chh accepted this revision.Jun 30 2017, 5:00 PM

Except the unit test failure, I did not find error in building Android.
Please fix the unit test before submit.
I probably won't have time working on this before next Wednesday.

This revision is now accepted and ready to land.Jun 30 2017, 5:00 PM
This revision was automatically updated to reflect the committed changes.