This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Implement lowering of GET_ROUNDING
ClosedPublic

Authored by tuliom on Jan 4 2023, 6:53 AM.

Details

Summary

Add support for _FLT_ROUNDS_ in SystemZ.

Diff Detail

Event Timeline

tuliom created this revision.Jan 4 2023, 6:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2023, 6:53 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
tuliom requested review of this revision.Jan 4 2023, 6:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2023, 6:53 AM
Kai added a subscriber: Kai.Jan 4 2023, 10:07 AM
Kai added inline comments.
llvm/test/CodeGen/SystemZ/flt-rounds.ll
2–4

The target triple is usually part of the command line.
This helps when testing other variations, e.g. s390x-zos.

tuliom updated this revision to Diff 486531.Jan 5 2023, 5:05 AM

I'm fixing the issues pointed out by @Kai and I'm also fixing formatting issues.

tuliom marked an inline comment as done.Jan 5 2023, 5:06 AM
tuliom added inline comments.
llvm/test/CodeGen/SystemZ/flt-rounds.ll
2–4

Fixed in the new revision.
Thanks!

Thanks for working on this!

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
9076

This second AND seems redundant, can't this just be

DAG.getNode(ISD::XOR, dl, MVT::i32, CWD1,
                         DAG.getConstant(3, dl, MVT::i32)),

instead?

9084

Since we're doing the computation in i32, shouldn't this be a TRUNCATE for all sizes < 32 ? Also, if the size is exactly 32, we don't need either truncate or extend (not sure if the extend gets optimized away?).

llvm/test/CodeGen/SystemZ/flt-rounds.ll
29

This is actually a case where it probably would be a good idea to test for the full sequence, ideally by using an auto-generated test.

tuliom updated this revision to Diff 487928.Jan 10 2023, 11:57 AM
tuliom marked an inline comment as done.

Fix the issues pointed out by @uweigand.

tuliom marked 3 inline comments as done.Jan 10 2023, 11:59 AM
tuliom added inline comments.
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
9076

Yes, it can.

9084

Oops. That's correct.
No, it does not get optimized away.

llvm/test/CodeGen/SystemZ/flt-rounds.ll
29

@uweigand I modified the test. Is this what you had in mind?

tuliom marked 3 inline comments as done.Jan 10 2023, 12:40 PM
tuliom added inline comments.
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
9084

No, it does not get optimized away.

Let me correct myself: in a way, it does get optimized away.
But , AFAIU, the LLGFR is needed after RXSBG anyway.
You can see the generated code in the updated test.

nikic added a subscriber: nikic.Jan 10 2023, 1:03 PM
nikic added inline comments.
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
9082

nit: There is a DAG.getZExtOrTrunc() helper.

llvm/test/CodeGen/SystemZ/flt-rounds.ll
24

nit: You can add nounwind to avoid irrelevant cfi directives.

uweigand added a comment.EditedJan 11 2023, 1:21 AM

With the changes pointed out by @nikic this looks good to me.

Optionally, there seems to be one more minor optimization opportunity. Looking at the generated code:

efpc %r0
nilf %r0, 3
lr %r1, %r0
xilf %r1, 2
rxsbg %r0, %r1, 33, 63, 63

I see that it should be possible to save one more instruction (and one register) by doing instead:

efpc %r0
nilf %r0, 3
rxsbg %r0, %r0, 33, 63, 63
xilf %r0, 1

I.e. re-associating the XORs to compute

CWD1 = efpc() & 3
CWD2 = (CWD1 ^ (CWD1 >> 1)) ^ 1

But , AFAIU, the LLGFR is needed after RXSBG anyway.

It's needed in your test case because the result is returned from the function, and function return values are implicitly extended in our ABI. If it were not directly returned, but used in some other way, the extension might not always be necessary.

CWD1 = efpc() & 3
CWD2 = (CWD1 ^ (CWD1 >> 1)) ^ 1

@uweigand This is not generating the right output.
Actually, the current revision here is already broken.
Originally, this patch had:

CWD2 = (~FPC ^ 0x3) >> 1

This can be rewritten as:

CWD2 = ((FPC >> 1) ^ 1 & 0x1)

But this is not going to save any instructions.

llvm/test/CodeGen/SystemZ/flt-rounds.ll
24

@nikic I'm afraid there is an issue between update_llc_test_checks.py and SystemZ.
Whenever nounwind is used, the CHECKs are removed.

CWD1 = efpc() & 3
CWD2 = (CWD1 ^ (CWD1 >> 1)) ^ 1

@uweigand This is not generating the right output.

How so? Looks like this would produce:

CWD1  CWD1>>1  (CWD1 ^ (CWD1>>1))   (CWD1 ^ (CWD1>>1)) ^ 1
 00     00           00                    01
 01     00           01                    00
 10     01           11                    10
 11     01           10                    11

which seems exactly the transformation we want?

tuliom updated this revision to Diff 488342.Jan 11 2023, 12:38 PM

Changes since revision 3:

  1. Started to use getZExtOrTrunc.
  2. Reverted the code for CWD2 to the same in revision 2 in order to fix an issue.
  3. Added a comment describing how CWD2 is calculated.
tuliom marked an inline comment as done.Jan 11 2023, 12:38 PM
tuliom added inline comments.
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
9082

Done.

tuliom updated this revision to Diff 488358.Jan 11 2023, 1:15 PM
tuliom marked an inline comment as done.

@uweigand You're right.
I hadn't realized the changes to RetVal.
This revision includes that modification.

uweigand accepted this revision.Jan 11 2023, 1:22 PM
uweigand added a subscriber: jonpa.
; CHECK-NEXT:    efpc %r0
; CHECK-NEXT:    lr %r1, %r0
; CHECK-NEXT:    nilf %r1, 3
; CHECK-NEXT:    rxsbg %r1, %r0, 63, 63, 63
; CHECK-NEXT:    xilf %r1, 1

Huh. We still get the lr :-( That now seems an inefficiency in the RXSBG optimization pass, so it's not a problem with this patch, which now LGTM. Thanks again!

@jonpa maybe you can have a look into why it doesn't simply use the same register as source and destination of the RXSGB?

This revision is now accepted and ready to land.Jan 11 2023, 1:22 PM
tuliom updated this revision to Diff 488944.Jan 13 2023, 4:29 AM

In this revision, I fix the last missing parts from @nikic 's suggestions:

tuliom marked an inline comment as done.Jan 13 2023, 4:31 AM

@uweigand I don't have commit access yet. If this new revision is still good, could you land this patch for me, please?
Please use “Tulio Magno Quites Machado Filho tuliom@redhat.com” to commit the change.

This revision was landed with ongoing or failed builds.Jan 18 2023, 12:41 PM
This revision was automatically updated to reflect the committed changes.