This is an archive of the discontinued LLVM Phabricator instance.

[CGP] Check for existing inttotpr before creating new one
ClosedPublic

Authored by rtereshin on Jan 17 2019, 2:57 AM.

Details

Summary

Make sure CodeGenPrepare doesn't emit multiple inttoptr instructions of
the same integer value while sinking address computations, but rather
CSEs them on the fly: excessive inttoptr's confuse SCEV into thinking
that related pointers have nothing to do with each other.

This problem blocks LoadStoreVectorizer from vectorizing some of the
loads / stores in a downstream target.

Diff Detail

Repository
rL LLVM

Event Timeline

rtereshin created this revision.Jan 17 2019, 2:57 AM

Can we run EarlyCSE after CGP (instead of trying to do these kinds of point fixes)?

Can we run EarlyCSE after CGP (instead of trying to do these kinds of point fixes)?

Hi Hal,

Thank you for looking into this!

That's a great suggestion! Originally I just eyeballed it as too expensive compile-time wise for the little effect that is achieved here.

This time around with you pointing it out, I performed the actual measurements for a very large suite of shaders (the downstream target in question is a GPU). I see about 1.0 (+/-0.3)% increase in overall compile time (the part of it that happens at an application run-time) on average. The generated code quality improvement that comes out of this may be important, but it doesn't trigger often enough to justify 1% compile time increase across the board.

Thanks,
Roman

hfinkel accepted this revision.Jan 17 2019, 7:27 PM

Can we run EarlyCSE after CGP (instead of trying to do these kinds of point fixes)?

Hi Hal,

Thank you for looking into this!

That's a great suggestion! Originally I just eyeballed it as too expensive compile-time wise for the little effect that is achieved here.

This time around with you pointing it out, I performed the actual measurements for a very large suite of shaders (the downstream target in question is a GPU). I see about 1.0 (+/-0.3)% increase in overall compile time (the part of it that happens at an application run-time) on average. The generated code quality improvement that comes out of this may be important, but it doesn't trigger often enough to justify 1% compile time increase across the board.

Thanks,
Roman

Fair enough. Thanks for checking.

I have a comment about the test case, but otherwise LGTM.

test/Transforms/CodeGenPrepare/X86/sink-addrmode-cse-inttoptrs.ll
19 ↗(On Diff #182236)

Please check for the expected result, not just the absence of the scalar load.

This revision is now accepted and ready to land.Jan 17 2019, 7:27 PM
rtereshin marked an inline comment as done.

Updated the test as requested

Can we run EarlyCSE after CGP (instead of trying to do these kinds of point fixes)?

Hi Hal,

Thank you for looking into this!

That's a great suggestion! Originally I just eyeballed it as too expensive compile-time wise for the little effect that is achieved here.

This time around with you pointing it out, I performed the actual measurements for a very large suite of shaders (the downstream target in question is a GPU). I see about 1.0 (+/-0.3)% increase in overall compile time (the part of it that happens at an application run-time) on average. The generated code quality improvement that comes out of this may be important, but it doesn't trigger often enough to justify 1% compile time increase across the board.

Thanks,
Roman

Fair enough. Thanks for checking.

I have a comment about the test case, but otherwise LGTM.

Fixed the test, thank you!

Roman

This revision was automatically updated to reflect the committed changes.