This is an archive of the discontinued LLVM Phabricator instance.

[llvm][NFC] Inliner.cpp: ensure InlineHistory ID is always initialized;
ClosedPublic

Authored by mtrofin on Apr 10 2020, 8:26 AM.

Details

Summary

The inline history is associated with a call site. There are two locations
we fetch inline history. In one, we fetch it together with the call
site. In the other, we initialize it under certain conditions, use it
later under same conditions (different if check), and otherwise is
uninitialized. Although currently there is no uninitialized use, the
code is more challenging to maintain correctly, than if the value were
always initialized.

Changed to the upfront initialization pattern already present in this
file.

Diff Detail

Event Timeline

mtrofin created this revision.Apr 10 2020, 8:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2020, 8:26 AM
davidxl accepted this revision.Apr 10 2020, 9:20 AM

Looks like you really don't like uninitialized locals -- but this change looks fine (wont block msan etc) :)

This revision is now accepted and ready to land.Apr 10 2020, 9:20 AM

This would still make MSan unable to diagnose the bugs it could diagnose in the original code (If a new use of InlineHistoryID was introduced outside the IsTriviallDead condition - in the current code that could be caught by MSan or the Static Analyzer - now it'll be indistinguishable from an intentional use)

I don't think these changes are consistent with an "upfront initialization pattern" - generally variables are declared as they're needed ("Caller" is declared a little early, but otherwise the variables here aren't declared at the top of the scope of this loop unless they're needed there), and I think it's particularly different declaring/initializing essentially const helper local variables ("here's a convenient name") versus mutating state like InlineHistoryID.

If the spooky-action-at-a-distance is sufficiently risky (that the two conditions might get out of sync) I'd be inclined to use Optional here, so it's checked on access and fails on a +Asserts build if a new un-paired use is introduced, and generally keep variables scopes as short as reasonable (especially for mutating state, less so for "here's a convenient name" sort of names, if they make more sense grouped together somewhere, etc.

I'm not sure why "Instr" is even a variable... ah, probably because of the old CallSite version of this code. With CallBase in use, should probably just pass "&CB" as the parameter to isInstructionTriviallyDead...

Looking more at the code, yeah, I guess I don't mind the change so much (you're right, both these things come straight out of the vector/current element anyway - and, yes, initializing those elements at the start of the loop is quite common/reasonable/readable (std::tie/pointer/reference dance notwithstanding)) - though the motivation to initialize things that the code, when correct, doesn't need to initialize still makes me uncomfortable & it's a habit/routine I'd like to discourage so it doesn't become common & defeat bug finding tools we use. I've said my peace on that & I'll leave this up to you.

llvm/lib/Transforms/IPO/Inliner.cpp
616–619

This seems a bit convoluted. Given CallSites is a SmallVector, so [] lookup is cheap, I'd either write this with two array lookups:

CallBase &CB = *CallSites[CSi].first;
int InlineHistoryID = CallSites[CSi].second;

Or as you've done, but without "CS" (& rename Instr to CB) - having both the pointer and reference version of the variable seems like overkill/confusing. (I think the "Instr" variable existed in the original code due to CallSite usage, and is a bit vestigial now with CallBase being used - if you end up with a CallBase& in this code, then "isInstructionTriviallDead" should be called with "&CB" rather than needing a separate named variable that's a pointer version of the CallBase&)

Also the name "CS" is a bit erroneous now that this is CallBase, not CallSite.

This change essentially makes a lazy initialization to become eagerly initialized. It may be better to keep the 'laziness' here, so why not just push the declaration of InlineHistoryID into the if(IsTrivialllyDead) {} block?

Re: "This would still make MSan unable to diagnose the bugs it could diagnose in the original code (If a new use of InlineHistoryID was introduced outside the IsTriviallDead condition - in the current code that could be caught by MSan or the Static Analyzer - now it'll be indistinguishable from an intentional use)"

I think the ultimate goal is to have logically correct code, and making *San happy is the secondary goal which choosing implementations which can not be proved to be correct statically. In the original change, the two choices are 1) one with uninitialized variable use, and 2) one with possible wrong init value. In this case, we should prefer choice 1 to help facilitate finding the bug at runtime or some other choice that can expose the bug in running the Assert build.

For the latest change, we can statically prove it is always correct -- as InlineHistoryID is attribute associated with the callsite. For the case you describe where there is new use in a different path, the right fix will be to eagerly initialize the ID too.

This change essentially makes a lazy initialization to become eagerly initialized. It may be better to keep the 'laziness' here, so why not just push the declaration of InlineHistoryID into the if(IsTrivialllyDead) {} block?

I believe the value of the CL is improved readability / maintainability: one doesn't need to trace through the code to understand where the value of InlineHistoryID comes from, and what value it is. In fact, looking at it more, we could even mark it const.

Not sure what the value of lazy initing would be here. If the initialization doesn't end up being optimized equivalently, I doubt there's a meaningful performance impact.

mtrofin marked 2 inline comments as done.Apr 10 2020, 10:38 AM
mtrofin added inline comments.
llvm/lib/Transforms/IPO/Inliner.cpp
616–619

There is an existing place (line 1043) doing the std::tie thing. I thought that was a preferred way, but if that's not the case, I could see doing in both places either what you're describing, or:

auto &P = CallSites[CSi];
CallBase &CG = *P.first;
int InlineHistoryID = P.second;

Happy to change it, and yank out vestigials.

CS: sure, but that's a larger change throughout, if we wanted to be consistent. On the other hand, the meaning of the variable is that it's a "call site" (the concept, not the type), so calling it 'CS' is, I'd say, reasonable.

Lazy declaration is commonly used in C++ code. In general it has the advantage of limiting the life ranges of locals and reduce register pressure. Here we have a lazy declaration and also a lazy initialization but scoped, making the declaration also scoped can avoid introducing uninitialized variable use accidentally in the future.

On the other hand, I don't have strong opinion on this so it is up to you.

mtrofin updated this revision to Diff 256609.Apr 10 2020, 10:45 AM
mtrofin marked an inline comment as done.

no need for std::tie

dblaikie accepted this revision.Apr 10 2020, 2:01 PM

Dealer's choice here, especially given the fact that it's just accessing/aliasing a member of the object being iterated over anyway - mostly wanted to avoid tool-based warnings maybe encouraging a style that would be systematically applied & go in not the best direction, imho.

(the fact that the value is then mutated makes it trickier for me - it's no longer the element you're iterating over, it's got state that was only established in one codepath & only usable on a matching codepath, etc... but yep, if the code had been written this new way from the start I doubt I'd have batted an eyelid/made any point that it should be written the old way just to make MSan able to find a certain set of possible bugs)

mtrofin updated this revision to Diff 256684.Apr 10 2020, 3:28 PM

const -ed a few values

This revision was automatically updated to reflect the committed changes.
dblaikie added inline comments.Apr 10 2020, 3:53 PM
llvm/lib/Transforms/IPO/Inliner.cpp
616–619

Yeah, I was going to go look at how the other std::tie evolved - and I think it's more legible without doing the pointer-to-reference dance too. (I probably wouldn't've objected to it written that way in the initial coding - just reviewing changes like this makes it a bit more visible, etc)

I think tie+letting it be a pointer, your solution, two accesses to Callsites as I suggested, all seem pretty fine.

1045

FWIW LLVM doesn't usually use top-level const like that - because it's so rare it can be mistaken for an accidental missing const /reference/

(but the change above does clarify a thing I was clearly misunderstanding in my head - I'd kept thinking for some reason that inlineHistoryIncludes might be able to modify InlineHistoryID somehow - so that InlineHistoryID's state was important in how it came out of that first " if (!IsTriviallyDead) {" - which I had completely wrong & removes the last slight hiccup I had around that variable - but I don't think marking it "const" is necessary to clarify that, I get that I was wrong now :) )

mtrofin marked 3 inline comments as done.Apr 10 2020, 4:34 PM
mtrofin added inline comments.
llvm/lib/Transforms/IPO/Inliner.cpp
1045

Ah, ok - I'll yank out the const on the next pass here.