This is an archive of the discontinued LLVM Phabricator instance.

Manually create unique_ptr in various pass adaptors
ClosedPublic

Authored by aeubanks on Sep 29 2021, 4:07 PM.

Details

Summary

This avoids creating tons of make_unique template instantiations. And we
only create a unique_ptr of the actual pass concept type, rather than
creating a unique_ptr of the pass model subclass then casting it to the
pass concept type.

This reduces the work spent compiling PassBuilder.cpp from 83M -> 73M
instructions according to perf stat.

Diff Detail

Event Timeline

aeubanks created this revision.Sep 29 2021, 4:07 PM
aeubanks requested review of this revision.Sep 29 2021, 4:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2021, 4:07 PM
This revision is now accepted and ready to land.Sep 30 2021, 9:13 AM
This revision was landed with ongoing or failed builds.Sep 30 2021, 9:55 AM
This revision was automatically updated to reflect the committed changes.

"83M -> 73M" is peak RAM usage during compilation, I guess?

Is that worth the code changes? I think there's value in make_unique making it easier to read code/check for memory leaks, etc. (makes explicit new stand out more/get extra scrutiny when reading code)

"83M -> 73M" is peak RAM usage during compilation, I guess?

Is that worth the code changes? I think there's value in make_unique making it easier to read code/check for memory leaks, etc. (makes explicit new stand out more/get extra scrutiny when reading code)

That's compile time, specifically the "instructions" stat in perf stat (I should have specified more clearly in the description).
PassBuilder.cpp is by far the longest file to compile (if you only build X86). With many cores/distributed build farms, it's the bottleneck. So yeah I think it's worth it.

"83M -> 73M" is peak RAM usage during compilation, I guess?

Is that worth the code changes? I think there's value in make_unique making it easier to read code/check for memory leaks, etc. (makes explicit new stand out more/get extra scrutiny when reading code)

That's compile time, specifically the "instructions" stat in perf stat (I should have specified more clearly in the description).
PassBuilder.cpp is by far the longest file to compile (if you only build X86). With many cores/distributed build farms, it's the bottleneck. So yeah I think it's worth it.

Ah, sorry, didn't catch the "instructions" on the line wrap there. I see.

What about splitting up the file in some way(s)?

"83M -> 73M" is peak RAM usage during compilation, I guess?

Is that worth the code changes? I think there's value in make_unique making it easier to read code/check for memory leaks, etc. (makes explicit new stand out more/get extra scrutiny when reading code)

That's compile time, specifically the "instructions" stat in perf stat (I should have specified more clearly in the description).
PassBuilder.cpp is by far the longest file to compile (if you only build X86). With many cores/distributed build farms, it's the bottleneck. So yeah I think it's worth it.

Ah, sorry, didn't catch the "instructions" on the line wrap there. I see.

What about splitting up the file in some way(s)?

We can't get around (at least with the current design) tons of instantiations of these because for each entry in PassRegistry.def we instantiate at least one of these adaptors, sometimes multiple ones. Splitting these up into multiple files doesn't make sense logically from a code organization point of view, since they're all about parsing a pass pipeline and a lot of the code is very similar between module/cgscc/function/loop passes.

I have tried splitting things out before with no real impact, although I can look again.

"83M -> 73M" is peak RAM usage during compilation, I guess?

Is that worth the code changes? I think there's value in make_unique making it easier to read code/check for memory leaks, etc. (makes explicit new stand out more/get extra scrutiny when reading code)

That's compile time, specifically the "instructions" stat in perf stat (I should have specified more clearly in the description).
PassBuilder.cpp is by far the longest file to compile (if you only build X86). With many cores/distributed build farms, it's the bottleneck. So yeah I think it's worth it.

Ah, sorry, didn't catch the "instructions" on the line wrap there. I see.

What about splitting up the file in some way(s)?

We can't get around (at least with the current design) tons of instantiations of these because for each entry in PassRegistry.def we instantiate at least one of these adaptors, sometimes multiple ones. Splitting these up into multiple files doesn't make sense logically from a code organization point of view, since they're all about parsing a pass pipeline and a lot of the code is very similar between module/cgscc/function/loop passes.

Yeah, I don't think it'd avoid any instantiations, but could improve build parallelism.

If these explicit new+unique_ptr's are going to stay, they might warrant comments so someone doesn't come along and clean them up again? (I think we have clang-tidy checks that would probably identify this code and undo the work of this commit, for instance - and sometimes I go looking for uses of explicit new to see if they can be moved to more automatic memory management).

I have tried splitting things out before with no real impact, although I can look again.

No real impact on the compile time of each file separately/added together? (I think splitting a single long-compile, even if it adds some total time (eg: the compile time of the split files adds up to more than the compile time of the single file) can still be worthwhile for build parallelism)

"83M -> 73M" is peak RAM usage during compilation, I guess?

Is that worth the code changes? I think there's value in make_unique making it easier to read code/check for memory leaks, etc. (makes explicit new stand out more/get extra scrutiny when reading code)

That's compile time, specifically the "instructions" stat in perf stat (I should have specified more clearly in the description).
PassBuilder.cpp is by far the longest file to compile (if you only build X86). With many cores/distributed build farms, it's the bottleneck. So yeah I think it's worth it.

Ah, sorry, didn't catch the "instructions" on the line wrap there. I see.

What about splitting up the file in some way(s)?

We can't get around (at least with the current design) tons of instantiations of these because for each entry in PassRegistry.def we instantiate at least one of these adaptors, sometimes multiple ones. Splitting these up into multiple files doesn't make sense logically from a code organization point of view, since they're all about parsing a pass pipeline and a lot of the code is very similar between module/cgscc/function/loop passes.

Yeah, I don't think it'd avoid any instantiations, but could improve build parallelism.

If these explicit new+unique_ptr's are going to stay, they might warrant comments so someone doesn't come along and clean them up again? (I think we have clang-tidy checks that would probably identify this code and undo the work of this commit, for instance - and sometimes I go looking for uses of explicit new to see if they can be moved to more automatic memory management).

Yeah they already have comments.

I have tried splitting things out before with no real impact, although I can look again.

No real impact on the compile time of each file separately/added together? (I think splitting a single long-compile, even if it adds some total time (eg: the compile time of the split files adds up to more than the compile time of the single file) can still be worthwhile for build parallelism)

I split out some of the largest functions into PassBuilderPipelines.cpp and there was no measurable change in the compile time of PassBuilder.cpp. (I still submitted that because it was nice to separate it out)

"83M -> 73M" is peak RAM usage during compilation, I guess?

Is that worth the code changes? I think there's value in make_unique making it easier to read code/check for memory leaks, etc. (makes explicit new stand out more/get extra scrutiny when reading code)

That's compile time, specifically the "instructions" stat in perf stat (I should have specified more clearly in the description).
PassBuilder.cpp is by far the longest file to compile (if you only build X86). With many cores/distributed build farms, it's the bottleneck. So yeah I think it's worth it.

Ah, sorry, didn't catch the "instructions" on the line wrap there. I see.

What about splitting up the file in some way(s)?

We can't get around (at least with the current design) tons of instantiations of these because for each entry in PassRegistry.def we instantiate at least one of these adaptors, sometimes multiple ones. Splitting these up into multiple files doesn't make sense logically from a code organization point of view, since they're all about parsing a pass pipeline and a lot of the code is very similar between module/cgscc/function/loop passes.

Yeah, I don't think it'd avoid any instantiations, but could improve build parallelism.

If these explicit new+unique_ptr's are going to stay, they might warrant comments so someone doesn't come along and clean them up again? (I think we have clang-tidy checks that would probably identify this code and undo the work of this commit, for instance - and sometimes I go looking for uses of explicit new to see if they can be moved to more automatic memory management).

Yeah they already have comments.

Ah, sorry, looked at the last ones - seems the comment was missed in the last couple of ones in LoopPassManager.h

I have tried splitting things out before with no real impact, although I can look again.

No real impact on the compile time of each file separately/added together? (I think splitting a single long-compile, even if it adds some total time (eg: the compile time of the split files adds up to more than the compile time of the single file) can still be worthwhile for build parallelism)

I split out some of the largest functions into PassBuilderPipelines.cpp and there was no measurable change in the compile time of PassBuilder.cpp. (I still submitted that because it was nice to separate it out)

Ah, fair enough :/ happy to throw around ideas. It might be that it'd be more improvable by splitting into chunks of passes, rather than splitting functions (eg: rather than one file with the X operation for all passes and one file with the Y operation for all passes - instead one file with X and Y operations for the first half of passes, and another for the second - not that simple, I get, but a thought)

"83M -> 73M" is peak RAM usage during compilation, I guess?

Is that worth the code changes? I think there's value in make_unique making it easier to read code/check for memory leaks, etc. (makes explicit new stand out more/get extra scrutiny when reading code)

That's compile time, specifically the "instructions" stat in perf stat (I should have specified more clearly in the description).
PassBuilder.cpp is by far the longest file to compile (if you only build X86). With many cores/distributed build farms, it's the bottleneck. So yeah I think it's worth it.

Ah, sorry, didn't catch the "instructions" on the line wrap there. I see.

What about splitting up the file in some way(s)?

We can't get around (at least with the current design) tons of instantiations of these because for each entry in PassRegistry.def we instantiate at least one of these adaptors, sometimes multiple ones. Splitting these up into multiple files doesn't make sense logically from a code organization point of view, since they're all about parsing a pass pipeline and a lot of the code is very similar between module/cgscc/function/loop passes.

Yeah, I don't think it'd avoid any instantiations, but could improve build parallelism.

If these explicit new+unique_ptr's are going to stay, they might warrant comments so someone doesn't come along and clean them up again? (I think we have clang-tidy checks that would probably identify this code and undo the work of this commit, for instance - and sometimes I go looking for uses of explicit new to see if they can be moved to more automatic memory management).

Yeah they already have comments.

Ah, sorry, looked at the last ones - seems the comment was missed in the last couple of ones in LoopPassManager.h

I added the comment to the remaining places.

I have tried splitting things out before with no real impact, although I can look again.

No real impact on the compile time of each file separately/added together? (I think splitting a single long-compile, even if it adds some total time (eg: the compile time of the split files adds up to more than the compile time of the single file) can still be worthwhile for build parallelism)

I split out some of the largest functions into PassBuilderPipelines.cpp and there was no measurable change in the compile time of PassBuilder.cpp. (I still submitted that because it was nice to separate it out)

Ah, fair enough :/ happy to throw around ideas. It might be that it'd be more improvable by splitting into chunks of passes, rather than splitting functions (eg: rather than one file with the X operation for all passes and one file with the Y operation for all passes - instead one file with X and Y operations for the first half of passes, and another for the second - not that simple, I get, but a thought)

Some numbers since I was curious:
Removing everything but the includes: 15B instructions.
Removing parseModule/CGSCC/Function/LoopPass: 37B instructions.
Current state: 80B instructions.

I don't think it makes sense to split the parse*Pass() methods, they have a lot of similar code and it'd be unintuitive to have to update them if they were separate. And almost all the remaining code is to support those methods.
And that's how you'd probably split the passes, by splitting them per IR unit type. I don't think any other way would make sense.