Add and instructions immediately after loads that only have their low
bits used, assuming that the (and (load x) c) will be matched as a
extload and the ands/truncs fed by the extload will be removed by isel.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
4197 ↗ | (On Diff #39962) | why would you duplicate the 'and' instead of just moving it? |
Quentin,
Thanks for the quick review. I've added a new revision that addresses your question about removing the old add. This can't be done in general since it might have a narrower mask or reach the load through a phi, but for the safe cases I now go ahead and remove it.
As for your question about transforming the load to be narrower with a zext, my reservation about doing this here is that I can't check the TLI.shouldReduceLoadWidth() hook here, since it wants to see the load SDNode, so I might end up narrowing some special case loads that the target doesn't want narrowed. Does this seem reasonable to you?
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
4184 ↗ | (On Diff #40083) | Could you put some quotes around “and” or something? |
4216 ↗ | (On Diff #40083) | If I read the code correctly, this will happen only after a call to this optimization for each load. |
4240 ↗ | (On Diff #40083) | The dyn_cast is useless. |
4242 ↗ | (On Diff #40083) | Just test getParent == getParent. Indeed, if you are in the same block, by construction the node cannot be a phi, since phis are the first instructions in the block. |
4268 ↗ | (On Diff #40083) | You mean its users, right? |
4274 ↗ | (On Diff #40083) | As far as I can tell, you do not need to access anything specific to BinaryOperator. |
4284 ↗ | (On Diff #40083) | BinOp must be an instruction at this point, given it has been constructed with dyn_cast<BinaryOperator>. |
4308 ↗ | (On Diff #40083) | Could you give more details here on the reason of the unlikeliness and what would happen if we do generate a i1 EXTLOAD? |
4317 ↗ | (On Diff #40083) | TruncTy would be a better name, wouldn’t it? |
5053 ↗ | (On Diff #40083) | The enableExtLdPromotion was aimed at a different optimization. Wouldn’t it make sense to just do it? |
One additional comment, it would like to see an IR to IR test case to test the optimization in isolation.
Thanks,
-Quentin
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
4242 ↗ | (On Diff #40555) | I don't think your comment about the user not being a phi in the same block is correct. Consider a single block loop where a load inside the loop feeds a phi at the top of the loop. I've added a test case for this (see test/Transforms/CodeGenPrepare/AArch64/free-zext.ll test_free_zext3). |
4308 ↗ | (On Diff #40555) | I tried to add more of an explanation here, let me know if it makes sense, or if you think this is a shortcoming in the AArch64 back-end that needs to be addressed. |
5061 ↗ | (On Diff #40555) | I went ahead and removed this, though I'm not 100% happy with it. I would like to avoid doing this work for targets that don't support any extloads at all (since it is wasted computation), but I couldn't find a good way of checking for that, so I was using enableExtLdPromotion as a proxy. |
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
4242 ↗ | (On Diff #40555) | Of course, you’re right! Please disregard that comment and thanks for adding a test case to cover that code. |
4285 ↗ | (On Diff #40555) | LLVM coding style is to go with early exits in those cases. return false; // Else, do the work. |
4294 ↗ | (On Diff #40555) | Ditto. |
4308 ↗ | (On Diff #40555) | That feels like a shortcoming in the AArch64 backend to me. |
5061 ↗ | (On Diff #40555) | I see. In that case, we may consider adding a new target hook. What is the impact of this optimization on the compile time anyway? |
test/CodeGen/AArch64/free-zext.ll | ||
30 ↗ | (On Diff #40555) | Could you be more explicit on what case of optimizeLoadExt you are checking? E.g., test… when the phi is in the same block as the load. This applies to all the added test cases. |
55 ↗ | (On Diff #40555) | Make sure to file a PR when this land. |
Hi Quentin,
I believe I have addressed all of your concerns. One other change to note in the lastest update is that I moved the CodeGenPrepare IR test up out of the AArch64 directory since the optimization is now enabled for all targets.
I'll be sure to file PR's for the two cases mentioned above as well.
Let me know if you have any other concerns.
Thanks,
-Geoff
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
4392 ↗ | (On Diff #40675) | I will file a PR for the AArch64 i1 zextload case. |
5150 ↗ | (On Diff #40675) | I'm less concerned about this now after checking the compile time (CodeGenPrepare time within the noise for spec2006.gcc where this code is hit many times), and the fact that this feature is more common across targets than I initially thought. |
test/CodeGen/AArch64/free-zext.ll | ||
55 ↗ | (On Diff #40675) | Will do. |