If use of invariant load is located in colder basic block
it makes sense to sink the load to use.
Details
- Reviewers
wmi arsenm reames craig.topper sanjoy
Diff Detail
Event Timeline
Hi @arsenm, could you please take a look into this patch? If I turn the option on I see some AMDGPU test failures. I'm not an expert in AMDGPU asm/arch but it seems that 4 invariant loads are added in the beginning of pipeline in entry block and I hoist one of them to colder block. As a result as I understand instead of loading of two load at once this "double-load" is split into two loads. I'm not sure that it is a beneficial for this platform. So I wonder whether there is a easy way to prohibit optimization for such cases for this platform?
Even if there is no easy way to do that I'd like to land it with option off by default.
Thank you in advance,
This is possibly beneficial in some specific circumstances, but probably not in general (especially if it inhibits vectorization, which I thought happened after CGP already?).
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
5567–5568 | Maybe make this cheap check before isDeferenceablePointer? | |
5567–5579 | Maybe make these cheaper check before isDeferenceablePointer? |
Sink pass is written in the way it sink instructions only to nearest successors while for invariant load I want to move it anywhere in CFG...
I'll update a patch according to your comments.
Ok, I was wrong Sink pass iterates. If I add a support for invariant.load it can sink it but it definitely cannot sink it through diamond or loop. In parallel I'm going to take a look deeper whether I can handle my case in Sink pass...
I took a look at Sinking pass and I worry about its impact on register allocator. So it seems that this solution only for invariant load is safer.
Basing on https://bugs.llvm.org/show_bug.cgi?id=23603 I think this patch is incorrect.
It seems if I'd like to move invariant load I must ensure that pointer is dereferenceable at point of use.
Maybe make this cheap check before isDeferenceablePointer?