This patch is a part of two reviews and base on the https://reviews.llvm.org/D31396.
This patch adds new optimization (Folding cmp(sub(a,b),0) into cmp(a,b))
to instCombineCall pass and was written specific for X86 CMP intrinsics.
Differential D31398
[X86][X86 intrinsics]Folding cmp(sub(a,b),0) into cmp(a,b) optimization m_zuckerman on Mar 27 2017, 8:42 AM. Authored by
Details This patch is a part of two reviews and base on the https://reviews.llvm.org/D31396. This patch adds new optimization (Folding cmp(sub(a,b),0) into cmp(a,b))
Diff Detail Event TimelineComment Actions Please make sure you always include llvm-commits as a subscriber in future patches.
Comment Actions If this transform is valid (cc'ing @scanon), then should we also do this for general IR? define i1 @fcmpsub(double %x, double %y) { %sub = fsub nnan ninf nsz double %x, %y %cmp = fcmp nnan ninf nsz ugt double %sub, 0.0 ret i1 %cmp } define i1 @fcmpsub(double %x, double %y) { %cmp = fcmp nnan ninf nsz ugt double %x, %y ret i1 %cmp }
Comment Actions (x-y) == 0 --> x == y does not require nsz (zeros of any sign compare equal), nor does it require nnan (if x or y is NaN, both comparisons are false). It *does* require ninf (because inf-inf = NaN), and it also requires that subnormals are not flushed by the CPU. There's been some churn around flushing recently, Hal may have thoughts (+Hal). Comment Actions You are absolutely right, your transform is valid and we will do it after this patch.
Comment Actions
Comment Actions 1.I agree with you, in the general case we need to be caution, but this transformation is feasible. 3.I am with you on that.
Comment Actions LGTM - see inline comments for a couple of cleanups.
Comment Actions We still assume standard denormal handling. There's ongoing work to add intrinsics for operations that are sensitive to this behavior (and rounding modes, etc.), but that shouldn't affect this. Comment Actions Hi, Thanks,
|
There are many ways to deal with commuted patterns, and a 2-loop is my least favorite. Would you consider using std::swap and matchers instead? Something like: