This is an archive of the discontinued LLVM Phabricator instance.

[CGP] Sink invariant load to its use
AbandonedPublic

Authored by skatkov on Nov 8 2018, 7:53 PM.

Details

Summary

If use of invariant load is located in colder basic block
it makes sense to sink the load to use.

Diff Detail

Event Timeline

skatkov created this revision.Nov 8 2018, 7:53 PM

Hi @arsenm, could you please take a look into this patch? If I turn the option on I see some AMDGPU test failures. I'm not an expert in AMDGPU asm/arch but it seems that 4 invariant loads are added in the beginning of pipeline in entry block and I hoist one of them to colder block. As a result as I understand instead of loading of two load at once this "double-load" is split into two loads. I'm not sure that it is a beneficial for this platform. So I wonder whether there is a easy way to prohibit optimization for such cases for this platform?
Even if there is no easy way to do that I'd like to land it with option off by default.

Thank you in advance,

arsenm added a comment.Nov 8 2018, 8:09 PM

Hi @arsenm, could you please take a look into this patch? If I turn the option on I see some AMDGPU test failures. I'm not an expert in AMDGPU asm/arch but it seems that 4 invariant loads are added in the beginning of pipeline in entry block and I hoist one of them to colder block. As a result as I understand instead of loading of two load at once this "double-load" is split into two loads. I'm not sure that it is a beneficial for this platform. So I wonder whether there is a easy way to prohibit optimization for such cases for this platform?
Even if there is no easy way to do that I'd like to land it with option off by default.

Thank you in advance,

This is possibly beneficial in some specific circumstances, but probably not in general (especially if it inhibits vectorization, which I thought happened after CGP already?).

lib/CodeGen/CodeGenPrepare.cpp
5567–5568

Maybe make this cheap check before isDeferenceablePointer?

5567–5579

Maybe make these cheaper check before isDeferenceablePointer?

arsenm added a comment.Nov 8 2018, 8:09 PM

There's also a sinking pass, why isn't this done there?

There's also a sinking pass, why isn't this done there?

Sink pass is written in the way it sink instructions only to nearest successors while for invariant load I want to move it anywhere in CFG...

I'll update a patch according to your comments.

skatkov updated this revision to Diff 173267.Nov 8 2018, 8:54 PM

Handled comments. Fow a while I'll double check Sink.cpp...

There's also a sinking pass, why isn't this done there?

Sink pass is written in the way it sink instructions only to nearest successors while for invariant load I want to move it anywhere in CFG...

Ok, I was wrong Sink pass iterates. If I add a support for invariant.load it can sink it but it definitely cannot sink it through diamond or loop. In parallel I'm going to take a look deeper whether I can handle my case in Sink pass...

There's also a sinking pass, why isn't this done there?

Sink pass is written in the way it sink instructions only to nearest successors while for invariant load I want to move it anywhere in CFG...

Ok, I was wrong Sink pass iterates. If I add a support for invariant.load it can sink it but it definitely cannot sink it through diamond or loop. In parallel I'm going to take a look deeper whether I can handle my case in Sink pass...

I took a look at Sinking pass and I worry about its impact on register allocator. So it seems that this solution only for invariant load is safer.

skatkov planned changes to this revision.Nov 19 2018, 8:40 PM
skatkov added a reviewer: sanjoy.

Basing on https://bugs.llvm.org/show_bug.cgi?id=23603 I think this patch is incorrect.

It seems if I'd like to move invariant load I must ensure that pointer is dereferenceable at point of use.

skatkov abandoned this revision.Nov 28 2018, 8:50 PM

This is invalid patch.