This is an archive of the discontinued LLVM Phabricator instance.

Reduce inlining that had larger binary size impact
Needs RevisionPublic

Authored by jpienaar on Mar 9 2023, 8:37 PM.

Details

Summary

On a build with a large number of registered operations this resulted in
~3.4 MB reduction in binary in release mode.

These were found to reduce the size but this change is RFC (it contains changes
related to debugging and error path but not exclusively).

Diff Detail

Event Timeline

jpienaar created this revision.Mar 9 2023, 8:37 PM
Herald added a project: Restricted Project. · View Herald Transcript
jpienaar requested review of this revision.Mar 9 2023, 8:37 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 9 2023, 8:37 PM

~3.4 MB reduction in binary in release mode.

I don’t know how to interpret this without the total binary size, what is the percentage?

mehdi_amini requested changes to this revision.Mar 10 2023, 2:03 AM

In general I am quite concerned about messing with the optimizer this way, this should be extremely exceptional. This'll be ad-hoc, forces a performance tradeoff specific to a given use case, and couple the "heuristic" to the exact compiler you're using (what does it do on Windows? On Mac?).
Have you tried building your project with -Os? -Oz? PGO? FullLTO?

mlir/include/mlir/Support/TypeID.h
192

Isn't this a potentially hot routine?

This revision now requires changes to proceed.Mar 10 2023, 2:03 AM

~3.4 MB reduction in binary in release mode.

I don’t know how to interpret this without the total binary size, what is the percentage?

The total binary size is relevant for user to decide which part they want to optimize and that's where biggest impact will be by doing more application specific removal. But here I'm more interested in the MLIR parts as this reduction corresponds to ~30% of total MLIR codesize (filtering on any file with mlir in it) in the original binary rather than caring about this specific binary.

In general I am quite concerned about messing with the optimizer this way, this should be extremely exceptional. This'll be ad-hoc, forces a performance tradeoff specific to a given use case, and couple the "heuristic" to the exact compiler you're using (what does it do on Windows? On Mac?).
Have you tried building your project with -Os? -Oz? PGO? FullLTO?

Performance tradeoff wrt debugging and error case was what I wanted to discuss (these were programmatically identified but I don't think all would make sense). I agree that if pure size is goal it's one thing, but these are intended to be performance optimized binaries in general. The debugging or error case seems not specific, inlining vs function call for these paths trades size for speed in exceptional paths. This is similar to me to where we document expectations around verification in production runs, it's just in code :-). PGO would be able to change the heuristic either way wouldn't it? E.g., having the default of the error/debugging cases be marked as cold.