Page MenuHomePhabricator

[SimplifyCFG] Implement the suggested ctlz transform

Authored by shawnl on Apr 25 2019, 4:23 PM.



This is patch 5 is a series beginning with D61150

Diff Detail

Event Timeline

shawnl created this revision.Apr 25 2019, 4:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2019, 4:23 PM
nikic added a subscriber: nikic.Apr 26 2019, 12:32 AM

Can you please point to patch 5? I think it's missing. (Best directly add it as a parent revision.)

jmolloy requested changes to this revision.Apr 26 2019, 1:42 AM

Thanks for implementing this! It's really cool.


s/Another/A. "Another" requires the user to know some context which has now gone.




No underscores in function names. ReduceSwitchRangeWithCtlz?


It looks to me like this isn't clang-formatted. Can you git-clang-format all your diffs?


You don't need DL or TTI here, please remove them.


I'd use a reference here. It's what most of the rest of LLVM uses.



"when given input 0, they return the bitwidth" would be a more natural phrasing.


I would avoid talking about the solution you didn't implement. Checking for zero input (which your popcount does) is a totally reasonable and understandable thing (and much more understandable than trying to grok the xor/rotate explanation to then find it isn't used :D )


getIntegerBitWidth() is defined on Type. You can avoid the cast.

unsigned BitWidth = SI->getCondition()->getType()->getIntegerBitWidth()

This would be a pattern for instcombine to pick up. We shouldn't have to worry about this here.


Isn't the order exactly reversed?


And if you do need to sort, please use


Please keep using SmallVector.


Please keep using llvm::sort.


Please remove the extra scope. There's no RAII objects here, so they aren't required.


IntegerType::get(Ty->getContext(), 1)

Change to


And please change "Zero" to a more descriptive name for the second parameter to ctlz.


See comments on diffbase about fshr. I don't think you need it, the rotate sequence is good enough. Is there a good reason why you think fshr is better?


Why are you converting from APInt to uint64_t to APInt again? Just remove .getLimitedValue() and keep Sub.lshr() in the next line?

85 ↗(On Diff #196754)

Uh, something's fishy here - why was this fshr here in the first place? This switch just needs a subtract.

121 ↗(On Diff #196754)

Your previous patch is clearly bogus; this is a shift by zero.

This revision now requires changes to proceed.Apr 26 2019, 1:42 AM
shawnl marked 6 inline comments as done.Apr 26 2019, 5:23 AM
shawnl added inline comments.

which your popcount does

It does not. It only checks if the MAXIMUM is 1.


Yes we do, because InstCombine does not know how to rearrange case statements.


Nope, the first becomes the last. I guess I could use swap() and move().


I was having problems when I tried to modify the reference to SmallVectorImpl that I did not know how to fix.

121 ↗(On Diff #196754)

shifts by zero are well defined, and get cleaned up by the optimizer, but I agree it didn't need to be inserted.

shawnl edited the summary of this revision. (Show Details)Apr 26 2019, 5:24 AM
lebedev.ri added inline comments.

Sounds like a missed optimization on instcombine side then?
Trying to handle every single unrelated micro-transform in every optimization is a path to failure.

jmolloy added inline comments.Apr 26 2019, 6:10 AM

I'm not sure I agree with that, or at least I'm missing something.

Given a sorted array A of integers with popcount 1 (single bit set), I believe:

order(A) == order(cttz(A))  //  cttz does not change the order.
order(A) == reverse(order(ctlz(A))) // Conversely ctlz reverses the order.

If you need to handle the zero value too, then I would have thought cttz would barrel-shift as you describe and ctlz would barrel-shift the reversed order.

My comment above was based on the assumption that you're not allowing zero to be passed to ctlz/cttz. Given you are, just keep on using llvm::sort.


We can fix that. Please keep using SmallVector and we can work through the difficulties.

shawnl marked 3 inline comments as done.Apr 28 2019, 12:24 AM
shawnl added inline comments.

Yeah I am working on some patches.


Oh yes, I was think about cttz, but now I am only using ctlz, so I can use llvm::reverse()

shawnl updated this revision to Diff 197008.Apr 28 2019, 1:24 AM
shawnl planned changes to this revision.Apr 28 2019, 7:56 PM

This transform is deeply flawed, and needs much work. If the default is not reachable, there is no need to limit to countPopulation == 1, only assure that there are no duplicates. If the default _is_ reachable, then countPopulation <= 1 continues to be important, but also ctlz and cttz must be run, except in very specific circumstances (such as utf-8 first character parsing)

shawnl updated this revision to Diff 197296.Apr 30 2019, 4:09 AM
jmolloy requested changes to this revision.Apr 30 2019, 5:32 AM
This revision now requires changes to proceed.Apr 30 2019, 5:32 AM
shawnl updated this revision to Diff 197334.Apr 30 2019, 8:03 AM
jmolloy requested changes to this revision.May 1 2019, 1:58 AM
jmolloy added inline comments.

This loop is scary and hard to read. It's really asking for a helper function. I would do something like this:

// Given a set of equivalence classes, return true if, for every pair A, B in
// Values:
//   Fn(A) == Fn(B) if and only if EC(A) == EC(B).
template <typename Function>
bool IsBijectiveOverEquivalenceClasses(const EquivalenceClasses<APInt> &Values,
                                       const Function &Fn) {
  DenseMap<unsigned, APInt> Leaders;
  for (auto &V : Values) {
    auto TransformedValue = Fn(V.getData());
    auto LeaderIt = Leaders.find(TransformedValue);
    if (LeadersIt == Leaders.end()) {
      Leaders[TransformedValue] = V.getLeader()->getData();
    if (LeaderIt->getData() != V.getData())
      // This transformed value has already been seen and did not belong to the
      // same equivalence class. This is not a bijective relation.
      return false;
  return true;

static bool CanReduceSwitchRangeWithCtlzOrCttz(SwitchInst &SI, bool &UseCtlz,
                                               bool &UseCttz, bool &UseInvert) {
  // Preconstruct equivalence classes of values based on successor block.
  DenseMap<BasicBlock *, APInt> Last;
  EquivalenceClasses<APInt> ECs;
  for (auto CI : SI.cases()) {
    auto I = Last.try_emplace(CI.getCaseSuccessor(), CI.getCaseValue()).first;
    ECs.unionSets(I.second, V);
  UseInvert = false;

  UseCtlz = true;
  if (IsBijectiveOverEquivalenceClasses(
          ECs, [&](const APInt &V) { return V.countLeadingZeros(); }))
    return true;

  UseCtlz = false;
  if (IsBijectiveOverEquivalenceClasses(
          ECs, [&](const APInt &V) { return V.countTrailingZeros(); }))
    return true;

  UseInvert = true;
  UseCtlz = true;
  if (IsBijectiveOverEquivalenceClasses(
          ECs, [&](const APInt &V) { return (~V).countLeadingZeros(); }))
    return true;

  UseCtlz = false;
  if (IsBijectiveOverEquivalenceClasses(
          ECs, [&](const APInt &V) { return (~V).countTrailingZeros(); }))
    return true;

  return false;

static bool ReduceSwitchRangeWithCtlzOrCttz(SwitchInst &SI,
                                            SmallVectorImpl<uint64_t> &Values,
                                            bool &UseCtlz, bool &UseCttz,
                                            bool &UseInvert) {
  // Note we don't pass Values into this function. This predicate uses the
  // switch's current case values *and* their associated successor block, so a
  // rearranged Values array cannot be analyzed (because we wouldn't know which
  // successsor belonged to which value).
  if (!CanReduceSwitchRangeWithCtlzOrCttz(SI, UseCtlz, UseCttz, UseInvert))
    return false;

  for (auto CI : SI.cases()) {
    APInt I = CI.getCaseValue();
    if (UseCtlz)
      Values.push_back((Invert ? ~V : V).countTrailingZeros());
      Values.push_back((Invert ? ~V : V).countLeadingZeros());
  return true;

This is a fallacy (will always be false) because:

  1. SwitchInst::CaseHandle is unique based on switched value. There is one CaseHandle for every value in Values.
  2. Prev is zero-initialized just above this line, in the loop.
This revision now requires changes to proceed.May 1 2019, 1:58 AM
shawnl abandoned this revision.Jun 14 2019, 8:31 AM

This optimization is not really worth it, for the amount of effort needed to implement it.