Page MenuHomePhabricator

[LoopIdiom] Introduce 'left-shift until bittest' idiom

Authored by lebedev.ri on Nov 8 2020, 11:49 AM.



The motivation here is the following inner loop in fp16/fp24 -> fp32 expander,
that runs as part of the floating-point DNG decompression in RawSpeed library:

while (!(fp32_fraction & (1 << 23))) {
  fp32_exponent -= 1;
  fp32_fraction <<= 1;

As one might notice, that loop is currently uncountable, and that whole code stays scalar.
Yet, it is rather trivial to make that loop countable:
and we can prove that via alive2: (ha nice, isn't it?)
... and that allow for the whole fp16->fp32 code to vectorize:

Now, while i'd love to get there, i feel like i should take it in steps.

For now, this introduces support for the most basic case,
where the bit position is known as a variable,
and the loop *will* go away (has no live-outs other than the recurrence,
no extra instructions in the loop).

I have added sufficient (i believe) test coverage,
and alive2 is happy with those transforms.


Diff Detail

Event Timeline

lebedev.ri created this revision.Nov 8 2020, 11:49 AM
lebedev.ri requested review of this revision.Nov 8 2020, 11:49 AM
lebedev.ri edited the summary of this revision. (Show Details)Nov 8 2020, 12:42 PM
liuz added a subscriber: liuz.Nov 8 2020, 5:52 PM
jdoerfert added inline comments.Nov 9 2020, 8:02 AM

Drive by: Should we move this "early exit" before step 2?

lebedev.ri marked an inline comment as done.Nov 9 2020, 8:14 AM
lebedev.ri added inline comments.

We can't do that, because we can't do this check before we've canonicalized the comparison predicate
(see previous if), which we can't do until after we've matched the bit-mask
(because in next patches said matching will result in other predicate changes)

lebedev.ri marked an inline comment as done.Nov 12 2020, 5:42 AM

Note that i'm mostly fine with doing this in post-commit-review mode,
i'm reasonably confident with the base change itself,
i'm mostly only looking for the high-level review.

Rebase, NFC.

Do recurrence analysis afterwards, not when matching backedge condition.

lebedev.ri edited the summary of this revision. (Show Details)Nov 19 2020, 10:21 AM
craig.topper added inline comments.Nov 19 2020, 1:51 PM

Why are we adding a whole new matcher infrastructure for phis? Why can't we use getIncomingValueForBlock?

(This is in a need of a bit of rebase once i'm ready to post the final patch to actually rewrite the loop into countable.)


There is no particular reason other than that
m_c_And(m_Value(), m_Constant())
is preferred to
I.getOpcode() == Instruction::And && (isa<Constant>I.getOperand(0) || isa<Constant>I.getOperand(1)),
because it's just that much more readable.
Of course, i can use old good methodology..

lebedev.ri updated this revision to Diff 309874.Dec 7 2020, 4:40 AM

Rebased. Small tidyups.

Originally, i allowed the wrong instruction to liveout - in my motivational example
i need to liveout, but i allowed only %x.curr.
So let's just allow both of them to liveout.

lebedev.ri marked an inline comment as done.

@craig.topper thank you for taking a look!
Dropping PHI abstraction - on a second look it's more decent that i remember.

craig.topper added inline comments.Dec 14 2020, 2:17 PM

Should this be add -1 to match the code?


This is called ".leadingonepos" in the comment block.

lebedev.ri marked 2 inline comments as done.

Fixup values names / comments, NFC otherwise.


On a second look, let's drop it for now.


Argh, i was sure i fixed that already :)

This revision is now accepted and ready to land.Dec 18 2020, 12:53 PM


Thank you for the review!
Note that there are 3 essential followups here, don't bother with D91725/D91726, but it would be good to get an agreement on D92754.

Rebased, NFC.

This revision was landed with ongoing or failed builds.Wed, Dec 23, 11:28 AM
This revision was automatically updated to reflect the committed changes.