This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Utilize Test Data Class instructions.
ClosedPublic

Authored by koriakin on Jul 1 2016, 7:29 PM.

Details

Summary

This adds a new SystemZ-specific intrinsic, llvm.s390.tdc.f(32|64|128),
which maps straight to the test data class instructions. A new IR pass
is added to recognize instructions that can be converted to TDC and
perform the necessary replacements.

Diff Detail

Repository
rL LLVM

Event Timeline

koriakin retitled this revision from to [SystemZ] Utilize Test Data Class instructions..
koriakin updated this object.
koriakin added a reviewer: uweigand.
koriakin set the repository for this revision to rL LLVM.
koriakin added a subscriber: llvm-commits.

This is my attempt at utilizing the test data class instructions. I've tried
and failed to do this during the SelectionDAG phase, via DAG combiners.
The problems are:

  • and/or instructions that would map to TDC are expanded to a branch sequence by the time we get to SelectionDAG if the result is used for conditional branching. We'd thus have to undo the CFG changes, which is impossible during ISel.
  • the DAG combiners have to work before type legalization, thus we need a new i1-typed SystemZISD node (to match icmp, and, or, xor). However, this doesn't play well with the existing CC handling combiners - by the time this is promoted to i32, the brcond combiners have already run and we're left with code that IPMs the result, does a test on the IPM, and does a branch based on that.
  • not all convertable instructions are actually profittable to convert - thus we'd have to have some DAG combiners that would convert TDCs back.

The last two problems are probably solvable by adding enough logic to the
combiners, but it quickly becomes too much of a mess. Handling this
as a separate IR pass greatly simplifies the interactions with CC and
comparison handling in ISel (by reusing the existing INTRINSIC_WO_CHAIN
code), and makes the first problem a non-issue. It's also very easy
to only convert instructions that are actually worthy of conversion.

There's a minor problem left with this code: the s390.tdc intrinsic really
wants to return i1, not i32. However, this again causes problems with CC
combiners (item 2 in the list above). Since all existing intrinsics that
return CC represent the result as an i32, I suppose I'm not the first one
dealing with this problem, and this solution is OK.

There are a few more patterns that could be recognized, but aren't:

  • negation (ie. xor i1 (tdc ...), true) - since this is removed by earlier phases, it's not a problem in practice.
  • tdc of negation (ie. tdc (fsub -0.0, X), mask) - in case of fcmp, this is removed by earlier phases (by flipping the sign on the const and predicate), so not a problem. Could be useful for signbit, I suppose, but I'm not sure what the semantics should be for NaNs - besides, it could be better to handle this in some target-independent pass.
  • signbit of fabs - likewise.
  • some way to tell apart SNaNs from QNaNs - unfortunately, such checks require many instructions, and there appears to be no "canonical" sequence. If it's really necessary, we might want to expose the tdc intrinsic as a clang builtin - WDYT?
uweigand edited edge metadata.Jul 4 2016, 11:15 AM

I haven't looked into the detailed implementation yet. However, one important feature to verify would be that use of the TEST DATA CLASS instruction is not only triggered by those specific IR sequences you test for on the LLVM level, but more importantly, triggered by whatever code clang generates for the isinf/isnan/signbit/fpclassify etc. builtins, since those are what users (hopefully) typically use ... [ This might actually involve changing clang as well, not sure about that. ]

I haven't looked into the detailed implementation yet. However, one important feature to verify would be that use of the TEST DATA CLASS instruction is not only triggered by those specific IR sequences you test for on the LLVM level, but more importantly, triggered by whatever code clang generates for the isinf/isnan/signbit/fpclassify etc. builtins, since those are what users (hopefully) typically use ... [ This might actually involve changing clang as well, not sure about that. ]

I've covered signbit and isnormal in the tests, I'll update it to cover isinf and fpclassify as well. isnan on its own is not profittable to implement with TEST DATA CLASS (a normal compare is good enough for that).

I haven't looked into the detailed implementation yet. However, one important feature to verify would be that use of the TEST DATA CLASS instruction is not only triggered by those specific IR sequences you test for on the LLVM level, but more importantly, triggered by whatever code clang generates for the isinf/isnan/signbit/fpclassify etc. builtins, since those are what users (hopefully) typically use ... [ This might actually involve changing clang as well, not sure about that. ]

I've covered signbit and isnormal in the tests, I'll update it to cover isinf and fpclassify as well.

OK, thanks.

isnan on its own is not profittable to implement with TEST DATA CLASS (a normal compare is good enough for that).

Agreed, but if we get code like isinf(x) || isnan(x), it might become profitable in combination.

koriakin edited edge metadata.

More tests

uweigand accepted this revision.Jul 8 2016, 8:33 AM
uweigand edited edge metadata.

More tests

Ah, so your code already recognized those cases, excellent :-)

See the inline comments. Otherwise, this looks really good to me. Thanks!

lib/Target/SystemZ/SystemZTDC.cpp
141

Do we need to check for impossible comparisons here (like x > inf)?

165

These mask constants are a bit hard to read. It would be nicer to define symbolic constants for the various TDC mask bits ...

This revision is now accepted and ready to land.Jul 8 2016, 8:33 AM
koriakin added inline comments.Jul 8 2016, 8:44 AM
lib/Target/SystemZ/SystemZTDC.cpp
141

Not really - at worst we'll get an all-0 or all-1 TDC mask here, so there's no correctness problem. And such comparisons should be removed by earlier optimization passes, so there's no performance problem here either.

165

OK, will do.

uweigand added inline comments.Jul 8 2016, 8:53 AM
lib/Target/SystemZ/SystemZTDC.cpp
141

OK, makes sense.

165

Thanks!

joncmu added a subscriber: joncmu.Jul 8 2016, 1:32 PM

Would it also make sense to support the VECTOR FP TEST DATA CLASS IMMEDIATE instruction?

koriakin edited edge metadata.

Demagicked the masks; the only remaining magic constant is the shift right by 1 in fabs handling, which cannot be helped much.

Would it also make sense to support the VECTOR FP TEST DATA CLASS IMMEDIATE instruction?

Certainly, but IMO this patch is big enough already, and there's a bunch of extra logic invoved (mostly handling bool vectors instead of i1/i32).

Minor fix for old gcc - use make_tuple instead of initializer list.

This revision was automatically updated to reflect the committed changes.