Please use GitHub pull requests for new patches. Avoid migrating existing patches. Phabricator shutdown timeline
Changeset View
Standalone View
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
- This file is larger than 256 KB, so syntax highlighting is disabled by default.
Show First 20 Lines • Show All 10,845 Lines • ▼ Show 20 Lines | SDValue DAGCombiner::visitFREM(SDNode *N) { | ||||
if (SDValue NewSel = foldBinOpIntoSelect(N)) | if (SDValue NewSel = foldBinOpIntoSelect(N)) | ||||
return NewSel; | return NewSel; | ||||
return SDValue(); | return SDValue(); | ||||
} | } | ||||
SDValue DAGCombiner::visitFSQRT(SDNode *N) { | SDValue DAGCombiner::visitFSQRT(SDNode *N) { | ||||
if (!DAG.getTarget().Options.UnsafeFPMath) | SDNodeFlags Flags = N->getFlags(); | ||||
if (!DAG.getTarget().Options.UnsafeFPMath && !Flags.hasAllowReciprocal()) | |||||
spatel: Here, we're entering a bit of uncharted FMF territory. cc'ing some more potential reviewers for… | |||||
mcberg2017AuthorUnsubmitted Not Done ReplyInline ActionsNot quite true, I reference AMDGPU which uses 1.0/sqrt to map to AMDGPUISD::RSQ, which is rsqrt, although a partial condition. See SITargetLowering::lowerFastUnsafeFDIV. Adding Matt as this is a AMD topic. mcberg2017: Not quite true, I reference AMDGPU which uses 1.0/sqrt to map to AMDGPUISD::RSQ, which is rsqrt… | |||||
wristowUnsubmitted Not Done ReplyInline ActionsIt does seem to me that 'afn' is what we want here, rather than 'arcp'. I think the point about SITargetLowering::lowerFastUnsafeFDIV means that the code there ought to be checking both 'afn' and 'arcp'. wristow: It does seem to me that 'afn' is what we want here, rather than 'arcp'. I think the point… | |||||
arsenmUnsubmitted Not Done ReplyInline ActionsThe point of the code there is it doesn't need either. In some situations the instruction does the right thing without any need for a flag arsenm: The point of the code there is it doesn't need either. In some situations the instruction does… | |||||
arsenmUnsubmitted Not Done ReplyInline ActionsFor the cases where it's not legal, I suppose checking both makes sense arsenm: For the cases where it's not legal, I suppose checking both makes sense | |||||
efriedmaUnsubmitted Not Done ReplyInline ActionsI can't see how arcp is relevant to this patch at all; visitFSQRT doesn't deal with division. The flags required for rsqrt were discussed here: https://reviews.llvm.org/D37686#877987 . efriedma: I can't see how arcp is relevant to this patch at all; visitFSQRT doesn't deal with division. | |||||
hfinkelUnsubmitted Not Done ReplyInline Actions
I agree. This is just a question of how we compute sqrt(x). For an approximation of 1/sqrt(x), then I can see also needing arcp. hfinkel: > I can't see how arcp is relevant to this patch at all; visitFSQRT doesn't deal with division. | |||||
return SDValue(); | return SDValue(); | ||||
SDValue N0 = N->getOperand(0); | SDValue N0 = N->getOperand(0); | ||||
if (TLI.isFsqrtCheap(N0, DAG)) | if (TLI.isFsqrtCheap(N0, DAG)) | ||||
return SDValue(); | return SDValue(); | ||||
// TODO: FSQRT nodes should have flags that propagate to the created nodes. | // FSQRT nodes have flags that propagate to the created nodes. | ||||
// For now, create a Flags object for use with reassociation math transforms. | |||||
SDNodeFlags Flags; | |||||
Flags.setAllowReassociation(true); | |||||
return buildSqrtEstimate(N0, Flags); | return buildSqrtEstimate(N0, Flags); | ||||
} | } | ||||
/// copysign(x, fp_extend(y)) -> copysign(x, y) | /// copysign(x, fp_extend(y)) -> copysign(x, y) | ||||
/// copysign(x, fp_round(y)) -> copysign(x, y) | /// copysign(x, fp_round(y)) -> copysign(x, y) | ||||
static inline bool CanCombineFCOPYSIGN_EXTEND_ROUND(SDNode *N) { | static inline bool CanCombineFCOPYSIGN_EXTEND_ROUND(SDNode *N) { | ||||
SDValue N1 = N->getOperand(1); | SDValue N1 = N->getOperand(1); | ||||
if ((N1.getOpcode() == ISD::FP_EXTEND || | if ((N1.getOpcode() == ISD::FP_EXTEND || | ||||
▲ Show 20 Lines • Show All 7,208 Lines • Show Last 20 Lines |
Here, we're entering a bit of uncharted FMF territory. cc'ing some more potential reviewers for opinions - @wristow @efriedma @andrew.w.kaylor
I think we want to allow this transform when the node allows "approximate functions" (afn). I don't think we should care about 'arcp' - the transform of 1.0/sqrt(x) is handled on a different path AFAICT.