This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] Create more extloads and fewer ands
ClosedPublic

Authored by gberry on Nov 11 2015, 12:58 PM.

Details

Summary

Add and instructions immediately after loads that only have their low
bits used, assuming that the (and (load x) c) will be matched as a
extload and the ands/truncs fed by the extload will be removed by isel.

Diff Detail

Repository
rL LLVM

Event Timeline

gberry updated this revision to Diff 39962.Nov 11 2015, 12:58 PM
gberry retitled this revision from to [CodeGenPrepare] Create more extloads and fewer ands.
gberry updated this object.
gberry added reviewers: mcrosier, qcolombet, ab.
gberry added a subscriber: llvm-commits.
qcolombet added inline comments.Nov 11 2015, 5:03 PM
lib/CodeGen/CodeGenPrepare.cpp
4197 ↗(On Diff #39962)

why would you duplicate the 'and' instead of just moving it?
Moreover, would it make sense to emit a narrower load instead followed by a zext?

gberry updated this revision to Diff 40083.Nov 12 2015, 1:29 PM

New version addressing qcolombet's feedback.

Quentin,

Thanks for the quick review. I've added a new revision that addresses your question about removing the old add. This can't be done in general since it might have a narrower mask or reach the load through a phi, but for the safe cases I now go ahead and remove it.

As for your question about transforming the load to be narrower with a zext, my reservation about doing this here is that I can't check the TLI.shouldReduceLoadWidth() hook here, since it wants to see the load SDNode, so I might end up narrowing some special case loads that the target doesn't want narrowed. Does this seem reasonable to you?

qcolombet added inline comments.Nov 17 2015, 1:42 PM
lib/CodeGen/CodeGenPrepare.cpp
4184 ↗(On Diff #40083)

Could you put some quotes around “and” or something?
I found it difficult to process as it is :).

4216 ↗(On Diff #40083)

If I read the code correctly, this will happen only after a call to this optimization for each load.
Although the example is interesting, you should call out that it needs both loads to be optimized to end up in this situation, otherwise the expectations while reading the code are misled.

4240 ↗(On Diff #40083)

The dyn_cast is useless.
A user of an instruction must be an instruction or we are doing something weird.

4242 ↗(On Diff #40083)

Just test getParent == getParent. Indeed, if you are in the same block, by construction the node cannot be a phi, since phis are the first instructions in the block.

4268 ↗(On Diff #40083)

You mean its users, right?

4274 ↗(On Diff #40083)

As far as I can tell, you do not need to access anything specific to BinaryOperator.
I then suggest to turn this if…else… if… sequence into a switch:
switch (V->getOpcode()) {
case And:

default: return false;
}

4284 ↗(On Diff #40083)

BinOp must be an instruction at this point, given it has been constructed with dyn_cast<BinaryOperator>.
I.e., you don’t need to test that BinOp isa Instruction and you don’t need to cast BinOp to Instruction.

4308 ↗(On Diff #40083)

Could you give more details here on the reason of the unlikeliness and what would happen if we do generate a i1 EXTLOAD?

4317 ↗(On Diff #40083)

TruncTy would be a better name, wouldn’t it?

5053 ↗(On Diff #40083)

The enableExtLdPromotion was aimed at a different optimization.
I think it does not make much sense to reuse it here.

Wouldn’t it make sense to just do it?
That should always be a win, right?

qcolombet edited edge metadata.Nov 17 2015, 1:43 PM

One additional comment, it would like to see an IR to IR test case to test the optimization in isolation.

Thanks,
-Quentin

gberry updated this revision to Diff 40555.Nov 18 2015, 1:48 PM
gberry edited edge metadata.

Updated to address Quentin's comments.

gberry marked 8 inline comments as done.Nov 18 2015, 1:54 PM
gberry added inline comments.
lib/CodeGen/CodeGenPrepare.cpp
4242 ↗(On Diff #40555)

I don't think your comment about the user not being a phi in the same block is correct. Consider a single block loop where a load inside the loop feeds a phi at the top of the loop. I've added a test case for this (see test/Transforms/CodeGenPrepare/AArch64/free-zext.ll test_free_zext3).

4308 ↗(On Diff #40555)

I tried to add more of an explanation here, let me know if it makes sense, or if you think this is a shortcoming in the AArch64 back-end that needs to be addressed.

5061 ↗(On Diff #40555)

I went ahead and removed this, though I'm not 100% happy with it. I would like to avoid doing this work for targets that don't support any extloads at all (since it is wasted computation), but I couldn't find a good way of checking for that, so I was using enableExtLdPromotion as a proxy.

qcolombet added inline comments.Nov 18 2015, 2:30 PM
lib/CodeGen/CodeGenPrepare.cpp
4242 ↗(On Diff #40555)

Of course, you’re right! Please disregard that comment and thanks for adding a test case to cover that code.

4285 ↗(On Diff #40555)

LLVM coding style is to go with early exits in those cases.
I.e.,
if (<inverted cond>)

return false;

// Else, do the work.

4294 ↗(On Diff #40555)

Ditto.

4308 ↗(On Diff #40555)

That feels like a shortcoming in the AArch64 backend to me.
I could live with it for now if you file a PR so that we remember to look into it.

5061 ↗(On Diff #40555)

I see. In that case, we may consider adding a new target hook.

What is the impact of this optimization on the compile time anyway?

test/CodeGen/AArch64/free-zext.ll
30 ↗(On Diff #40555)

Could you be more explicit on what case of optimizeLoadExt you are checking? E.g., test… when the phi is in the same block as the load.
This is usually useful when we have to update the tests.

This applies to all the added test cases.

55 ↗(On Diff #40555)

Make sure to file a PR when this land.

gberry updated this revision to Diff 40675.Nov 19 2015, 10:31 AM

Update to address Quentin's comments (round 2)

gberry marked 3 inline comments as done.Nov 19 2015, 10:36 AM

Hi Quentin,

I believe I have addressed all of your concerns. One other change to note in the lastest update is that I moved the CodeGenPrepare IR test up out of the AArch64 directory since the optimization is now enabled for all targets.

I'll be sure to file PR's for the two cases mentioned above as well.

Let me know if you have any other concerns.

Thanks,
-Geoff

lib/CodeGen/CodeGenPrepare.cpp
4392 ↗(On Diff #40675)

I will file a PR for the AArch64 i1 zextload case.

5150 ↗(On Diff #40675)

I'm less concerned about this now after checking the compile time (CodeGenPrepare time within the noise for spec2006.gcc where this code is hit many times), and the fact that this feature is more common across targets than I initially thought.

test/CodeGen/AArch64/free-zext.ll
55 ↗(On Diff #40675)

Will do.

qcolombet accepted this revision.Nov 19 2015, 1:36 PM
qcolombet edited edge metadata.

Hi Geoff,

Thanks for your patience!

LGTM.

Cheers,
-Quentin

This revision is now accepted and ready to land.Nov 19 2015, 1:36 PM
This revision was automatically updated to reflect the committed changes.