Add support for _FLT_ROUNDS_ in SystemZ.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/SystemZ/flt-rounds.ll | ||
---|---|---|
2–4 | The target triple is usually part of the command line. |
llvm/test/CodeGen/SystemZ/flt-rounds.ll | ||
---|---|---|
2–4 | Fixed in the new revision. |
Thanks for working on this!
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp | ||
---|---|---|
9076 | This second AND seems redundant, can't this just be DAG.getNode(ISD::XOR, dl, MVT::i32, CWD1, DAG.getConstant(3, dl, MVT::i32)), instead? | |
9084 | Since we're doing the computation in i32, shouldn't this be a TRUNCATE for all sizes < 32 ? Also, if the size is exactly 32, we don't need either truncate or extend (not sure if the extend gets optimized away?). | |
llvm/test/CodeGen/SystemZ/flt-rounds.ll | ||
29 | This is actually a case where it probably would be a good idea to test for the full sequence, ideally by using an auto-generated test. |
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp | ||
---|---|---|
9084 |
Let me correct myself: in a way, it does get optimized away. |
With the changes pointed out by @nikic this looks good to me.
Optionally, there seems to be one more minor optimization opportunity. Looking at the generated code:
efpc %r0 nilf %r0, 3 lr %r1, %r0 xilf %r1, 2 rxsbg %r0, %r1, 33, 63, 63
I see that it should be possible to save one more instruction (and one register) by doing instead:
efpc %r0 nilf %r0, 3 rxsbg %r0, %r0, 33, 63, 63 xilf %r0, 1
I.e. re-associating the XORs to compute
CWD1 = efpc() & 3 CWD2 = (CWD1 ^ (CWD1 >> 1)) ^ 1
But , AFAIU, the LLGFR is needed after RXSBG anyway.
It's needed in your test case because the result is returned from the function, and function return values are implicitly extended in our ABI. If it were not directly returned, but used in some other way, the extension might not always be necessary.
@uweigand This is not generating the right output.
Actually, the current revision here is already broken.
Originally, this patch had:
CWD2 = (~FPC ^ 0x3) >> 1
This can be rewritten as:
CWD2 = ((FPC >> 1) ^ 1 & 0x1)
But this is not going to save any instructions.
llvm/test/CodeGen/SystemZ/flt-rounds.ll | ||
---|---|---|
24 | @nikic I'm afraid there is an issue between update_llc_test_checks.py and SystemZ. |
How so? Looks like this would produce:
CWD1 CWD1>>1 (CWD1 ^ (CWD1>>1)) (CWD1 ^ (CWD1>>1)) ^ 1 00 00 00 01 01 00 01 00 10 01 11 10 11 01 10 11
which seems exactly the transformation we want?
Changes since revision 3:
- Started to use getZExtOrTrunc.
- Reverted the code for CWD2 to the same in revision 2 in order to fix an issue.
- Added a comment describing how CWD2 is calculated.
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp | ||
---|---|---|
9082 | Done. |
@uweigand You're right.
I hadn't realized the changes to RetVal.
This revision includes that modification.
; CHECK-NEXT: efpc %r0 ; CHECK-NEXT: lr %r1, %r0 ; CHECK-NEXT: nilf %r1, 3 ; CHECK-NEXT: rxsbg %r1, %r0, 63, 63, 63 ; CHECK-NEXT: xilf %r1, 1
Huh. We still get the lr :-( That now seems an inefficiency in the RXSBG optimization pass, so it's not a problem with this patch, which now LGTM. Thanks again!
@jonpa maybe you can have a look into why it doesn't simply use the same register as source and destination of the RXSGB?
In this revision, I fix the last missing parts from @nikic 's suggestions:
- I rebased the patches on top of commit 0f2c071fad60d7606ee1a05c71ab5e0510d5becc, allowing to use nounwind in the tests.
- I simplified the test_order's code.
@uweigand I don't have commit access yet. If this new revision is still good, could you land this patch for me, please?
Please use “Tulio Magno Quites Machado Filho tuliom@redhat.com” to commit the change.
This second AND seems redundant, can't this just be
instead?