This adds a new SystemZ-specific intrinsic, llvm.s390.tdc.f(32|64|128),
which maps straight to the test data class instructions. A new IR pass
is added to recognize instructions that can be converted to TDC and
perform the necessary replacements.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
This is my attempt at utilizing the test data class instructions. I've tried
and failed to do this during the SelectionDAG phase, via DAG combiners.
The problems are:
- and/or instructions that would map to TDC are expanded to a branch sequence by the time we get to SelectionDAG if the result is used for conditional branching. We'd thus have to undo the CFG changes, which is impossible during ISel.
- the DAG combiners have to work before type legalization, thus we need a new i1-typed SystemZISD node (to match icmp, and, or, xor). However, this doesn't play well with the existing CC handling combiners - by the time this is promoted to i32, the brcond combiners have already run and we're left with code that IPMs the result, does a test on the IPM, and does a branch based on that.
- not all convertable instructions are actually profittable to convert - thus we'd have to have some DAG combiners that would convert TDCs back.
The last two problems are probably solvable by adding enough logic to the
combiners, but it quickly becomes too much of a mess. Handling this
as a separate IR pass greatly simplifies the interactions with CC and
comparison handling in ISel (by reusing the existing INTRINSIC_WO_CHAIN
code), and makes the first problem a non-issue. It's also very easy
to only convert instructions that are actually worthy of conversion.
There's a minor problem left with this code: the s390.tdc intrinsic really
wants to return i1, not i32. However, this again causes problems with CC
combiners (item 2 in the list above). Since all existing intrinsics that
return CC represent the result as an i32, I suppose I'm not the first one
dealing with this problem, and this solution is OK.
There are a few more patterns that could be recognized, but aren't:
- negation (ie. xor i1 (tdc ...), true) - since this is removed by earlier phases, it's not a problem in practice.
- tdc of negation (ie. tdc (fsub -0.0, X), mask) - in case of fcmp, this is removed by earlier phases (by flipping the sign on the const and predicate), so not a problem. Could be useful for signbit, I suppose, but I'm not sure what the semantics should be for NaNs - besides, it could be better to handle this in some target-independent pass.
- signbit of fabs - likewise.
- some way to tell apart SNaNs from QNaNs - unfortunately, such checks require many instructions, and there appears to be no "canonical" sequence. If it's really necessary, we might want to expose the tdc intrinsic as a clang builtin - WDYT?
I haven't looked into the detailed implementation yet. However, one important feature to verify would be that use of the TEST DATA CLASS instruction is not only triggered by those specific IR sequences you test for on the LLVM level, but more importantly, triggered by whatever code clang generates for the isinf/isnan/signbit/fpclassify etc. builtins, since those are what users (hopefully) typically use ... [ This might actually involve changing clang as well, not sure about that. ]
I've covered signbit and isnormal in the tests, I'll update it to cover isinf and fpclassify as well. isnan on its own is not profittable to implement with TEST DATA CLASS (a normal compare is good enough for that).
OK, thanks.
isnan on its own is not profittable to implement with TEST DATA CLASS (a normal compare is good enough for that).
Agreed, but if we get code like isinf(x) || isnan(x), it might become profitable in combination.
Ah, so your code already recognized those cases, excellent :-)
See the inline comments. Otherwise, this looks really good to me. Thanks!
lib/Target/SystemZ/SystemZTDC.cpp | ||
---|---|---|
141 | Do we need to check for impossible comparisons here (like x > inf)? | |
165 | These mask constants are a bit hard to read. It would be nicer to define symbolic constants for the various TDC mask bits ... |
Would it also make sense to support the VECTOR FP TEST DATA CLASS IMMEDIATE instruction?
Demagicked the masks; the only remaining magic constant is the shift right by 1 in fabs handling, which cannot be helped much.
Certainly, but IMO this patch is big enough already, and there's a bunch of extra logic invoved (mostly handling bool vectors instead of i1/i32).
Do we need to check for impossible comparisons here (like x > inf)?