Page MenuHomePhabricator

[GlobalOpt] Drop SRA split limit for struct types.
Needs ReviewPublic

Authored by fhahn on Jul 11 2022, 5:26 PM.

Details

Reviewers
reames
nikic
Summary

4796b4ae7bccc7 limited the SRA to 16 types for all globals, while the
code previously did not have a similar limit for globals with
struct-typed initializers AFAICT.

This is causing notable size regressions for some large workloads. This
patch skips the size check for struct types, similar to code before
4796b4ae7bccc7 to recover the regression.

Diff Detail

Event Timeline

fhahn created this revision.Jul 11 2022, 5:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2022, 5:26 PM
fhahn requested review of this revision.Jul 11 2022, 5:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2022, 5:26 PM
nikic requested changes to this revision.Jul 12 2022, 12:29 AM

This will also bypass the check if you have an array that is nested inside a struct. Besides, I don't think we should be making decisions based on the global value type here, which is a another vacuous concept that shouldn't exist (e.g. in rust the global type will often be something like [N x i8], in "bag of bytes" representation).

Does raising the limit to the next power of two cover your test case?

This revision now requires changes to proceed.Jul 12 2022, 12:29 AM
fhahn updated this revision to Diff 444366.Jul 13 2022, 11:52 AM

This will also bypass the check if you have an array that is nested inside a struct. Besides, I don't think we should be making decisions based on the global value type here, which is a another vacuous concept that shouldn't exist (e.g. in rust the global type will often be something like [N x i8], in "bag of bytes" representation).

Right , I think this should be similar to the previous cut-off for struct with arrays inside, e.g.: https://llvm.godbolt.org/z/crje3aYzW . I added additional tests with a struct containing an array and a plain array.

I agree that the type shouldn't really matter. But I think changing the way the cut-off is handled should be done independently of the change to use an offset-based approach.

Does raising the limit to the next power of two cover your test case?

Unfortunately the struct has stores to 100+ fields in the workload.

@fhahn I think your last update is missing the code changes :)

I assume that you don't actually care about SRA as such, but rather about some follow-on transforms it enables. I've been thinking that we might want to make GlobalOpt in general work on the SRAd representation internally, which might remove the need to actually materialize it in some cases. So if some parts of the global are never written, it will certainly always be profitable to replace them with (part of) the initializer, regardless of how many parts there are. The limit is only relevant for cases where we actually do leave behind the globals. Do you know which optimization in particular is important for your workload?

@fhahn I think your last update is missing the code changes :)

I didn't make any code changes to the original version and it looks like the original code change is still there. Am I missing something? :)

I assume that you don't actually care about SRA as such, but rather about some follow-on transforms it enables. I've been thinking that we might want to make GlobalOpt in general work on the SRAd representation internally, which might remove the need to actually materialize it in some cases. So if some parts of the global are never written, it will certainly always be profitable to replace them with (part of) the initializer, regardless of how many parts there are. The limit is only relevant for cases where we actually do leave behind the globals. Do you know which optimization in particular is important for your workload?

Yeah, ideally SRA would not materialize each transform it applies straight away, e.g. for the test cases the split up globals will get removed after materializing them.

I think the main regression comes from SRA not removing stores of function pointers that are never read, which in turn prevents LLVM from removing those functions.

nikic added a comment.Jul 13 2022, 1:05 PM

@fhahn I think your last update is missing the code changes :)

I didn't make any code changes to the original version and it looks like the original code change is still there. Am I missing something? :)

Oh sorry, I misread your comment. I thought you wanted to add a check for nested arrays, but you were only referring to new tests.

I think the main regression comes from SRA not removing stores of function pointers that are never read, which in turn prevents LLVM from removing those functions.

Okay, that's a particularly simple case. It's probably worthwhile to give skipping parts that are only stored a try, as we expect them to be dropped (it's not strictly guaranteed due to leak checker roots, but close enough). I can take a look.

fhahn updated this revision to Diff 457563.Sep 2 2022, 5:02 AM

Just a rebase on top of a test added in 91e67c074922cc667fa1c43fc1f01acb96faa0f9 that can't be handled by D129857.

fhahn updated this revision to Diff 467054.Oct 12 2022, 1:20 AM

Ping, any additional thoughts on how to best resolve the regression soon?

nikic added a comment.Oct 12 2022, 2:44 AM

I've updated D129857 to handle the new case -- I'd still generally prefer not to special case struct types if we can avoid it.