This is an archive of the discontinued LLVM Phabricator instance.

[Analysis] fold load of untouched alloca to undef
AbandonedPublic

Authored by spatel on Feb 18 2019, 2:50 PM.

Details

Summary

This is not stated explicitly in the LangRef, but loading directly from an alloca should always fold to undef?
We already do this fold more generally in GVN::AnalyzeLoadAvailability() (but apparently not in NewGVN), so I'm assuming it's just an oversight that it was not included in FindAvailableLoadedValue().

The diffs here result from calling FindAvailableLoadedValue() from instcombine's visitLoadInst(). I tried to salvage existing regression tests to still provide coverage for their original bugs by removing the now undef alloca+load patterns.

Diff Detail

Event Timeline

spatel created this revision.Feb 18 2019, 2:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 18 2019, 2:50 PM

Yes, we fold loads from alloca to undef. (In GVN like you mention, but also in mem2reg.) LangRef should state memory allocated with alloca is uninitialized, and that loading from uninitialized memory produces undef; if either of those is missing, patch welcome to fix it.

That said, this seems like the wrong direction, in terms of where we want to perform this sort of optimization. Most functions have more than one basic block, and you won't catch any of those cases. EarlyCSE/GVN/NewGVN should be able to handle this case, and similar cases where there's more than one basic block. Also, FindAvailablePtrLoadStore is basically only used from two places: JumpThreading, and InstCombine. InstCombine really shouldn't be doing this sort of scan, and JumpThreading obviously only triggers in functions with more than one BB.

Yes, we fold loads from alloca to undef. (In GVN like you mention, but also in mem2reg.) LangRef should state memory allocated with alloca is uninitialized, and that loading from uninitialized memory produces undef; if either of those is missing, patch welcome to fix it.

That said, this seems like the wrong direction, in terms of where we want to perform this sort of optimization. Most functions have more than one basic block, and you won't catch any of those cases. EarlyCSE/GVN/NewGVN should be able to handle this case, and similar cases where there's more than one basic block. Also, FindAvailablePtrLoadStore is basically only used from two places: JumpThreading, and InstCombine. InstCombine really shouldn't be doing this sort of scan, and JumpThreading obviously only triggers in functions with more than one BB.

Thanks. I agree that the load optimizations from instcombine are iffy. I assumed that was there as a cheap first cut of optimization to save time for other passes, but it might be doing the opposite for overall compile-time at this point. I only noticed this because an unrelated instcombine change showed up as a possible regression on 1 of the fuzzer-reduced tests, but it sounds like I should ignore that.

I'll add some text to the LangRef.

spatel abandoned this revision.Feb 19 2019, 2:39 PM

Abandoning. Added a sentence to the LangRef here:
rL354394

JFYI, I disagree w/the sentiment expressed that we shouldn't do obvious memory optimizations in InstCombine. InstCombine does local peephole optimizations *including memory optimizations*. Even if I accept the stated goal that it *shouldn't*, today it *does* and there's no reason to reject this patch.

Now, we should *also* handle this transform in GVN/DSE/etc..., but that's a different point.