This is an archive of the discontinued LLVM Phabricator instance.

[WIP][PGO] Add ability to mark cold functions as optsize/minsize/optnone
AbandonedPublic

Authored by aeubanks on May 3 2023, 4:43 PM.

Details

Summary

The performance of cold functions shouldn't matter too much, so if we care about binary sizes, add an option to mark cold functions as optsize/minsize for binary size, or optnone for compile times [1]. Clang patch will be in a future patch

Chrome size (ThinLTO + instrumented PGO):
base: 371004392
optsize: 366883296
minsize: 342128928

[1] https://discourse.llvm.org/t/rfc-new-feature-proposal-de-optimizing-cold-functions-using-pgo-info/56388

Diff Detail

Event Timeline

aeubanks created this revision.May 3 2023, 4:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2023, 4:43 PM
Herald added subscribers: wlei, Enna1, ormris and 3 others. · View Herald Transcript
aeubanks requested review of this revision.May 3 2023, 4:43 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 3 2023, 4:43 PM

looking for feedback if this makes sense at all

high level question, why not have it as a pass that runs after profiles (of whatever kind - instrumented or sample-based) are ingested. The pass would attribute functions as described.

Previously @yamauchi did a bunch of work related to this called PGSO (Profile Guided Size Optimization). See the functions in https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Utils/SizeOpts.cpp, which are used to guide a number of optimizations that can affect code size. Do you know why those weren't sufficient?

@davidxl may remember the details, but iirc fully using minsize/optnone for cold funcs might have been too big of a hammer performance-wise.

high level question, why not have it as a pass that runs after profiles (of whatever kind - instrumented or sample-based) are ingested. The pass would attribute functions as described.

that was also my initial thought, but I found it weird that only the ifdo pass marked functions as cold, but not the samplepgo pass. I remembered something about "precise" profiles and thought maybe that had something to do with not marking functions as cold with samplepgo? But maybe PSI->isFunctionColdInCallGraph(F, *BFI) takes care of all of that

Previously @yamauchi did a bunch of work related to this called PGSO (Profile Guided Size Optimization). See the functions in https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Utils/SizeOpts.cpp, which are used to guide a number of optimizations that can affect code size. Do you know why those weren't sufficient?

I think there are still a lot of places that just look at optsize/minsize. I'll take a look at the inline cost model more to see how it interacts with hotness. Separately, it seems good for compile times to calculate once if a function is cold or not and save that result as optsize/minsize.

@davidxl may remember the details, but iirc fully using minsize/optnone for cold funcs might have been too big of a hammer performance-wise.

s/optnone/optsize? But I'm surprised that even minsize would make a perf difference just in functions that are supposedly cold. IIUC there aren't too many differences at least in the optimization pipeline for minsize functions aside from inlining thresholds (maybe there's more drastic changes in the backend?). It would be good to know if previous attempts at this sort of thing have shown perf regressions.

This patch uses a cleaner method than the previous effort. There is some differences:

  1. yamauchi's patch works for both iFDO and autoFDO, while this patch limits to iFDO
  2. yamauchi's patch also handles size optimization (IIRC) at granularity smaller than function

Longer term, the previous functionality should fold into the new framework.

This patch uses a cleaner method than the previous effort. There is some differences:

  1. yamauchi's patch works for both iFDO and autoFDO, while this patch limits to iFDO
  2. yamauchi's patch also handles size optimization (IIRC) at granularity smaller than function

Longer term, the previous functionality should fold into the new framework.

Could you clarify what "previous functionality" and "new framework" mean here?

This patch uses a cleaner method than the previous effort. There is some differences:

  1. yamauchi's patch works for both iFDO and autoFDO, while this patch limits to iFDO
  2. yamauchi's patch also handles size optimization (IIRC) at granularity smaller than function

Longer term, the previous functionality should fold into the new framework.

IIRC 1) and 2) are correct. I don't have a strong preference on which folds into which.

Sorry I missed the question. What I meant is that a lot of the profile guided size optimization (PGSO) done by Hiroshi can be adapted to use the attribute based method proposed in the patch as we don't need two different mechanisms.