This introduces a generic opcode for floating point floor, working towards selecting @llvm.floor.
Details
Diff Detail
Event Timeline
@arsenm, adding this opcode breaks AMDGPU somehow. Do you have any idea why that might be?
Here's an example:
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/24100/steps/build%20stage%201/logs/stdio
Some tablegen seems unhappy?
It looks like the definition uncovered an unusual case the importer can't handle yet. These two rules are the cause:
// Convert (x - floor(x)) to fract(x) def : GCNPat < (f32 (fsub (f32 (VOP3Mods f32:$x, i32:$mods)), (f32 (ffloor (f32 (VOP3Mods f32:$x, i32:$mods)))))), (V_FRACT_F32_e64 $mods, $x, DSTCLAMP.NONE, DSTOMOD.NONE) >; // Convert (x + (-floor(x))) to fract(x) def : GCNPat < (f64 (fadd (f64 (VOP3Mods f64:$x, i32:$mods)), (f64 (fneg (f64 (ffloor (f64 (VOP3Mods f64:$x, i32:$mods)))))))), (V_FRACT_F64_e64 $mods, $x, DSTCLAMP.NONE, DSTOMOD.NONE) >;
The importer doesn't have any code to handle same-operand constraints in combination with the (foo $x, $y) style of matching complex pattern foo at the moment. I had a quick look for workarounds but there doesn't seem to be a variant (e.g. naming the overall complex-operand and matching that) that works at the moment. This:
(f32 (fsub (f32 VOP3Mods:$a), (f32 (ffloor (f32 VOP3Mods:$a)))))
would probably work but then you wouldn't be able to reverse the sub-operands.
I think that this is safe to recommit after the changes in https://reviews.llvm.org/D57980.