# Changeset View

# Standalone View

# lib/Transforms/Utils/SimplifyCFG.cpp

Context not available. | |||||

SmallDenseMap<PHINode *, Type *> ResultTypes; | SmallDenseMap<PHINode *, Type *> ResultTypes; | ||||

SmallVector<PHINode *, 4> PHIs; | SmallVector<PHINode *, 4> PHIs; | ||||

for (SwitchInst::CaseIt E = SI->case_end(); CI != E; ++CI) { | for (SwitchInst::CaseIt E = SI->case_end(); CI != E; ++CI) { | ||||

ConstantInt *CaseVal = CI->getCaseValue(); | ConstantInt *CaseVal = CI->getCaseValue(); | ||||

jmolloy: I'm having trouble groking at a glance why this is correct. Could you comment? Previously we… | |||||

if (CaseVal->getValue().slt(MinCaseVal->getValue())) | if (CaseVal->getValue().ult(MinCaseVal->getValue())) | ||||

MinCaseVal = CaseVal; | MinCaseVal = CaseVal; | ||||

if (CaseVal->getValue().sgt(MaxCaseVal->getValue())) | if (CaseVal->getValue().ugt(MaxCaseVal->getValue())) | ||||

MaxCaseVal = CaseVal; | MaxCaseVal = CaseVal; | ||||

// Resulting value at phi nodes for this case value. | // Resulting value at phi nodes for this case value. | ||||

using ResultsTy = SmallVector<std::pair<PHINode *, Constant *>, 4>; | using ResultsTy = SmallVector<std::pair<PHINode *, Constant *>, 4>; | ||||

ResultsTy Results; | ResultsTy Results; | ||||

Not Done ReplyInline ActionsWhy is this being changed? Can this be a separate patch? lebedev.ri: Why is this being changed? Can this be a separate patch? | |||||

ReduceSwitchRange now re-orders values so that they start at zero, so is superior to this simple "sort by signed" trick. shawnl: ReduceSwitchRange now re-orders values so that they start at zero, so is superior to this… | |||||

Not Done ReplyInline ActionsConstants are always on RHS anyway lebedev.ri: Constants are always on RHS anyway | |||||

Not Done ReplyInline ActionsThis can't be committed as-is. If you can't modify the existing code to work in the interim between your two patches, the other patch should be merged into this one. Note this will make it more of a pain to review, so I'd suggest trying hard to modify the existing code :) jmolloy: This can't be committed as-is. If you can't modify the existing code to work in the interim… | |||||

Not Done ReplyInline ActionsI'm having trouble working out how this heuristic relates to the existing code. Have you changed the heuristic as well as the generated code? Note, a NFC change is much easier to review and get landed. jmolloy: I'm having trouble working out how this heuristic relates to the existing code. Have you… | |||||

Not Done ReplyInline ActionsThis whole calculation should be outlined into a helper function. Then for example you wouldn't need to nest the whole thing inside if (BitWidth > 8). jmolloy: This whole calculation should be outlined into a helper function. Then for example you wouldn't… | |||||

Not Done ReplyInline ActionsYou're describing a tricky gotcha here. Please be more specific and go into more detail as to why it's there in the first place. jmolloy: You're describing a tricky gotcha here. Please be more specific and go into more detail as to… | |||||

Context not available. | |||||

if (NeedMask) | if (NeedMask) | ||||

++NumLookupTablesHoles; | ++NumLookupTablesHoles; | ||||

return true; | return true; | ||||

} | } | ||||

static bool isSwitchDense(ArrayRef<int64_t> Values) { | |||||

// See also SelectionDAGBuilder::isDense(), which this function was based on. | |||||

uint64_t Diff = (uint64_t)Values.back() - (uint64_t)Values.front(); | |||||

uint64_t Range = Diff + 1; | |||||

uint64_t NumCases = Values.size(); | |||||

Not Done ReplyInline ActionsSo what about lebedev.ri: So what about `-Os`/`-Oz`? | |||||

This is not making the table, that calculation is elsewhere. This optimization is cheap and should always be applied. shawnl: This is not making the table, that calculation is elsewhere. This optimization is cheap and… | |||||

// 40% is the default density for building a jump table in optsize/minsize mode. | |||||

uint64_t MinDensity = 40; | |||||

return NumCases * 100 >= Range * MinDensity; | |||||

} | |||||

/// Try to transform a switch that has "holes" in it to a contiguous sequence | /// Try to transform a switch that has "holes" in it to a contiguous sequence | ||||

/// of cases. | /// of cases. | ||||

Not Done ReplyInline Actionsuse std::min/std::max here? jmolloy: use std::min/std::max here? | |||||

/// | /// | ||||

/// A switch such as: switch(i) {case 5: case 9: case 13: case 17:} can be | /// A switch such as: switch(i) {case 5: case 9: case 13: case 17:} can be | ||||

/// range-reduced to: switch ((i-5) / 4) {case 0: case 1: case 2: case 3:}. | /// range-reduced to: switch ((i-5) / 4) {case 0: case 1: case 2: case 3:}. | ||||

Context not available. | |||||

/// This converts a sparse switch into a dense switch which allows better | /// This converts a sparse switch into a dense switch which allows better | ||||

Not Done ReplyInline ActionsPlease change the phrasing. Instead of describing what we might do *without* the check, please describe what the goal of the check is (negative vs. positive sense makes it easier to grok what the condition is and whether it matches the descriptive comment). jmolloy: Please change the phrasing. Instead of describing what we might do *without* the check, please… | |||||

/// lowering and could also allow transforming into a lookup table. | /// lowering and could also allow transforming into a lookup table. | ||||

Not Done ReplyInline Actions
... but why? jmolloy: > might be worse
... but why? | |||||

static bool ReduceSwitchRange(SwitchInst *SI, IRBuilder<> &Builder, | static bool ReduceSwitchRange(SwitchInst *SI, IRBuilder<> &Builder, | ||||

Not Done ReplyInline ActionsWhy > 64? jmolloy: Why > 64? | |||||

const DataLayout &DL, | const DataLayout &DL, | ||||

const TargetTransformInfo &TTI) { | const TargetTransformInfo &TTI) { | ||||

Not Done ReplyInline ActionsThis whole comment has me worried. Target-specific guessing games don't belong in SimplifyCFG :) A heuristic that isn't obviously target-agnostic should get a TTI hook. They're cheap, go for it! :) jmolloy: This whole comment has me worried. Target-specific guessing games don't belong in SimplifyCFG… | |||||

// The number of cases that need to be removed by a subtraction operation | |||||

// to make it worth using. | |||||

const unsigned SubThreshold = (SI->getFunction()->hasOptSize() ? 2 : 8); | |||||

bool MadeChanges = false; | |||||

auto *CondTy = cast<IntegerType>(SI->getCondition()->getType()); | auto *CondTy = cast<IntegerType>(SI->getCondition()->getType()); | ||||

if (CondTy->getIntegerBitWidth() > 64 || | unsigned BitWidth = CondTy->getIntegerBitWidth(); | ||||

!DL.fitsInLegalInteger(CondTy->getIntegerBitWidth())) | if (BitWidth > 64 || | ||||

return false; | !DL.fitsInLegalInteger(BitWidth)) | ||||

// Only bother with this optimization if there are more than 3 switch cases; | |||||

// SDAG will only bother creating jump tables for 4 or more cases. | |||||

if (SI->getNumCases() < 4) | |||||

return false; | return false; | ||||

// This transform is agnostic to the signedness of the input or case values. We | SmallVector<uint64_t,4> Values; | ||||

// can treat the case values as signed or unsigned. We can optimize more common | |||||

// cases such as a sequence crossing zero {-4,0,4,8} if we interpret case values | |||||

// as signed. | |||||

SmallVector<int64_t,4> Values; | |||||

for (auto &C : SI->cases()) | for (auto &C : SI->cases()) | ||||

Values.push_back(C.getCaseValue()->getValue().getSExtValue()); | Values.push_back(C.getCaseValue()->getLimitedValue()); | ||||

llvm::sort(Values); | llvm::sort(Values); | ||||

Not Done ReplyInline ActionsYou're not only mutating but rearranging Values while iterating? this is an antipattern. Please remove. jmolloy: You're not only mutating but rearranging Values while iterating? this is an antipattern. Please… | |||||

// If the switch is already dense, there's nothing useful to do here. | // A cheap speculative transform: handle power-of-two flags using | ||||

if (isSwitchDense(Values)) | // @clz or @ctz as the key function. | ||||

return false; | // TODO: this transform would benifit from proper range analysis in SwitchToLookupTable | ||||

// BitWidth > 8 could be relaxed after this is fixed, but not eliminated, because | |||||

// this transforms 0 => BitWidth. | |||||

bool UseClz = false; | |||||

bool UseCtz = false; | |||||

if (BitWidth > 8) { | |||||

unsigned MaxPopCount = 0; | |||||

unsigned MinClz = 64, MaxClz = 0, MinCtz = 64, MaxCtz = 0; | |||||

for (auto &V : Values) { | |||||

MaxPopCount = countPopulation(V); | |||||

if (MaxPopCount > 1) | |||||

break; | |||||

unsigned Clz = APInt(BitWidth, V).countLeadingZeros(); | |||||

unsigned Ctz = APInt(BitWidth, V).countTrailingZeros(); | |||||

if (Clz < MinClz) MinClz = Clz; | |||||

if (Clz > MaxClz) MaxClz = Clz; | |||||

if (Ctz < MinCtz) MinCtz = Ctz; | |||||

if (Ctz > MaxCtz) MaxCtz = Ctz; | |||||

} | |||||

// Without this check we might do clz followed by ctz, and if Values contains 0, | |||||

// the result might be woorse. | |||||

if (MaxPopCount == 1 && Values.back() > 64) { | |||||

MadeChanges = true; | |||||

// Prefer clz because it is one instruction cheaper | |||||

// on ARM, but they cost the same on x86 so if we only need | |||||

// the subtraction on clz use ctz. | |||||

UseClz = (MinClz < SubThreshold) || (MinCtz >= SubThreshold); | |||||

if (UseClz) | |||||

for (auto &V : Values) | |||||

V = APInt(BitWidth, V).countLeadingZeros(); | |||||

else { | |||||

UseCtz = true; | |||||

for (auto &V : Values) { | |||||

V = APInt(BitWidth, V).countTrailingZeros(); | |||||

} | |||||

} | |||||

// 0 will suddenly become the largest (BitWidth), so we need to sort again. | |||||

llvm::sort(Values); | |||||

} | |||||

} | |||||

// First, transform the values such that they start at zero and ascend. | // Find the element that has the most distance from it's previous, wrapping around. | ||||

int64_t Base = Values[0]; | uint64_t BestDistance = APInt::getMaxValue(CondTy->getIntegerBitWidth()).getLimitedValue() - | ||||

for (auto &V : Values) | Values.back() + Values.front() + 1; | ||||

V -= (uint64_t)(Base); | unsigned BestIndex = 0; | ||||

for (unsigned i = 1;i != Values.size();i++) { | |||||

if (Values[i] - Values[i-1] > BestDistance) { | |||||

BestIndex = i; | |||||

BestDistance = Values[i] - Values[i-1]; | |||||

} | |||||

} | |||||

// Now we have signed numbers that have been shifted so that, given enough | uint64_t Base = 0; | ||||

// precision, there are no negative values. Since the rest of the transform | // Now transform the values such that they start at zero and ascend. | ||||

// is bitwise only, we switch now to an unsigned representation. | if ((BestDistance > SubThreshold) && | ||||

uint64_t GCD = 0; | (BestIndex != 0 || (Values[0] >= SubThreshold))) { | ||||

for (auto &V : Values) | Base = Values[BestIndex]; | ||||

GCD = GreatestCommonDivisor64(GCD, (uint64_t)V); | MadeChanges = true; | ||||

Not Done ReplyInline ActionsJust looking at this, shouldn't there be some changes to existing tests? nikic: Just looking at this, shouldn't there be some changes to existing tests? | |||||

There were no tests for the old behavior, surprisingly. shawnl: There were no tests for the old behavior, surprisingly. | |||||

for (auto &V : Values) | |||||

V = (APInt(BitWidth, V) - Base).getLimitedValue(); | |||||

} | |||||

// This transform can be done speculatively because it is so cheap - it results | // This transform can be done speculatively because it is so cheap - it results | ||||

Not Done ReplyInline ActionsWhy not use Builder.CreateCall here? jmolloy: Why not use Builder.CreateCall here? | |||||

// in a single rotate operation being inserted. This can only happen if the | // in a single rotate operation being inserted. | ||||

// factor extracted is a power of 2. | unsigned Shift = 64; | ||||

// FIXME: If the GCD is an odd number we can multiply by the multiplicative | for (auto &V : Values) { | ||||

// inverse of GCD and then perform this transform. | // There is no edge condition when the BitWidth is less than 64, because if | ||||

// FIXME: It's possible that optimizing a switch on powers of two might also | // 0 is the only value then a shift does nothing, and LLVM requires | ||||

// be beneficial - flag values are often powers of two and we could use a CLZ | // well-formed IR to not have duplicate cases. | ||||

// as the key function. | unsigned TZ = countTrailingZeros(V); | ||||

if (GCD <= 1 || !isPowerOf2_64(GCD)) | if (TZ < Shift) { | ||||

// No common divisor found or too expensive to compute key function. | Shift = TZ; | ||||

return false; | if (Shift == 0) | ||||

Not Done ReplyInline ActionsHow well supported is codegen for fshr? shift/shift/or for rotate will go through pretty much any backend; I've been out of the loop for a little while - how new is fshr? jmolloy: How well supported is codegen for fshr? shift/shift/or for rotate will go through pretty much… | |||||

break; | |||||

unsigned Shift = Log2_64(GCD); | } | ||||

for (auto &V : Values) | } | ||||

V = (int64_t)((uint64_t)V >> Shift); | if (Shift) { | ||||

MadeChanges = true; | |||||

for (auto &V : Values) | |||||

V = V >> Shift; | |||||

} | |||||

if (!isSwitchDense(Values)) | if (!MadeChanges) | ||||

// Transform didn't create a dense switch. | // Didn't do anything | ||||

return false; | return false; | ||||

// The obvious transform is to shift the switch condition right and emit a | // The obvious transform is to shift the switch condition right and emit a | ||||

// check that the condition actually cleanly divided by GCD, i.e. | // check that the condition actually cleanly divided by GCD, i.e. | ||||

// C & (1 << Shift - 1) == 0 | // C & (1 << Shift - 1) == 0 | ||||

Context not available. | |||||

// shift and puts the shifted-off bits in the uppermost bits. If any of these | // shift and puts the shifted-off bits in the uppermost bits. If any of these | ||||

// are nonzero then the switch condition will be very large and will hit the | // are nonzero then the switch condition will be very large and will hit the | ||||

// default case. | // default case. | ||||

auto *Ty = cast<IntegerType>(SI->getCondition()->getType()); | auto *Ty = cast<IntegerType>(SI->getCondition()->getType()); | ||||

Builder.SetInsertPoint(SI); | { | ||||

auto *ShiftC = ConstantInt::get(Ty, Shift); | auto Zero = ConstantInt::get(IntegerType::get(Ty->getContext(), 1), 0); | ||||

auto *Sub = Builder.CreateSub(SI->getCondition(), ConstantInt::get(Ty, Base)); | Builder.SetInsertPoint(SI); | ||||

auto *LShr = Builder.CreateLShr(Sub, ShiftC); | Value *ZerosTransform; | ||||

auto *Shl = Builder.CreateShl(Sub, Ty->getBitWidth() - Shift); | if (UseClz) { | ||||

auto *Rot = Builder.CreateOr(LShr, Shl); | Function *Ctlz = Intrinsic::getDeclaration(SI->getModule(), Intrinsic::ctlz, Ty); | ||||

SI->replaceUsesOfWith(SI->getCondition(), Rot); | ZerosTransform = Builder.Insert(CallInst::Create(Ctlz, {SI->getCondition(), Zero})); | ||||

} else if (UseCtz) { | |||||

Function *Cttz = Intrinsic::getDeclaration(SI->getModule(), Intrinsic::cttz, Ty); | |||||

ZerosTransform = Builder.Insert(CallInst::Create(Cttz, {SI->getCondition(), Zero})); | |||||

} else | |||||

ZerosTransform = SI->getCondition(); | |||||

auto *Sub = Builder.CreateSub(ZerosTransform, ConstantInt::get(Ty, Base)); | |||||

Value *Key; | |||||

Not Done ReplyInline ActionsDid you see this comment? lebedev.ri: Did you see this comment? | |||||

This change does not change whether a lookup table gets created (although I am working on such patches). It only simplifies the switch. shawnl: This change does not change whether a lookup table gets created (although I am working on such… | |||||

if (Shift) { | |||||

Function *Fshr = Intrinsic::getDeclaration(SI->getModule(), Intrinsic::fshr, Ty); | |||||

auto *ShiftC = ConstantInt::get(Ty, Shift); | |||||

Key = Builder.Insert(CallInst::Create(Fshr, {Sub, Sub, ShiftC})); | |||||

} else | |||||

Key = Sub; | |||||

SI->replaceUsesOfWith(SI->getCondition(), Key); | |||||

} | |||||

for (auto Case : SI->cases()) { | for (auto Case : SI->cases()) { | ||||

auto *Orig = Case.getCaseValue(); | auto *Orig = Case.getCaseValue(); | ||||

auto Sub = Orig->getValue() - APInt(Ty->getBitWidth(), Base); | uint64_t Zeros; | ||||

if (UseClz) | |||||

Zeros = Orig->getValue().countLeadingZeros(); | |||||

else if (UseCtz) | |||||

Zeros = Orig->getValue().countTrailingZeros(); | |||||

else | |||||

Zeros = Orig->getValue().getLimitedValue(); | |||||

auto Sub = (APInt(BitWidth, Zeros) - Base).getLimitedValue(); | |||||

Case.setValue( | Case.setValue( | ||||

cast<ConstantInt>(ConstantInt::get(Ty, Sub.lshr(ShiftC->getValue())))); | cast<ConstantInt>(ConstantInt::get(Ty, APInt(BitWidth, Sub >> Shift)))); | ||||

} | } | ||||

return true; | return true; | ||||

} | } | ||||

bool SimplifyCFGOpt::SimplifySwitch(SwitchInst *SI, IRBuilder<> &Builder) { | bool SimplifyCFGOpt::SimplifySwitch(SwitchInst *SI, IRBuilder<> &Builder) { | ||||

BasicBlock *BB = SI->getParent(); | BasicBlock *BB = SI->getParent(); | ||||

Context not available. | |||||

return requestResimplify(); | return requestResimplify(); | ||||

if (Options.ForwardSwitchCondToPhi && ForwardSwitchConditionToPHI(SI)) | if (Options.ForwardSwitchCondToPhi && ForwardSwitchConditionToPHI(SI)) | ||||

return requestResimplify(); | return requestResimplify(); | ||||

if (ReduceSwitchRange(SI, Builder, DL, TTI)) | |||||

return requestResimplify(); | |||||

// The conversion from switch to lookup tables results in difficult-to-analyze | // The conversion from switch to lookup tables results in difficult-to-analyze | ||||

// code and makes pruning branches much harder. This is a problem if the | // code and makes pruning branches much harder. This is a problem if the | ||||

// switch expression itself can still be restricted as a result of inlining or | // switch expression itself can still be restricted as a result of inlining or | ||||

// CVP. Therefore, only apply this transformation during late stages of the | // CVP. Therefore, only apply this transformation during late stages of the | ||||

// optimisation pipeline. | // optimisation pipeline. | ||||

if (Options.ConvertSwitchToLookupTable && | if (Options.ConvertSwitchToLookupTable && | ||||

SwitchToLookupTable(SI, Builder, DL, TTI)) | SwitchToLookupTable(SI, Builder, DL, TTI)) | ||||

return requestResimplify(); | return requestResimplify(); | ||||

if (ReduceSwitchRange(SI, Builder, DL, TTI)) | |||||

return requestResimplify(); | |||||

return false; | return false; | ||||

} | } | ||||

bool SimplifyCFGOpt::SimplifyIndirectBr(IndirectBrInst *IBI) { | bool SimplifyCFGOpt::SimplifyIndirectBr(IndirectBrInst *IBI) { | ||||

BasicBlock *BB = IBI->getParent(); | BasicBlock *BB = IBI->getParent(); | ||||

Context not available. |

I'm having trouble groking at a glance why this is correct. Could you comment? Previously we had a slt/sgt pair, now just a ugt?

Note that if you'd have uploaded with full context I could go look closer to understand more ;)