This is an archive of the discontinued LLVM Phabricator instance.

[MemDep] Fixed handling of invariant.group
ClosedPublic

Authored by Prazek on Apr 5 2018, 6:31 AM.

Details

Summary

Memdep had a funny bug related to invariant.groups - because it did not
invalidated cache, in some very rare cases it was possible to show memory
dependence of the instruction that was deleted, but because other
instruction took it's place it resulted in call to vtable!
Thanks @amharc for repro!.

Diff Detail

Event Timeline

Prazek created this revision.Apr 5 2018, 6:31 AM
amharc added inline comments.Apr 5 2018, 6:49 AM
include/llvm/IR/ValueMap.h
109 ↗(On Diff #141148)

Can this be marked noexcept, like the move operator= below?

Prazek updated this revision to Diff 141155.Apr 5 2018, 7:45 AM

noexcept

Prazek marked an inline comment as done.Apr 5 2018, 7:45 AM
amharc added a comment.Apr 5 2018, 9:50 AM

I am not sure that ValueMap can be moveable: if it was moved, the ValueMapT *Map pointer in ValueMapCallbackVH could become invalid.

Wouldn't std::unique_ptr<ValueMap<Instruction *, NonLocalDepResult>> NonLocalDefsCache be a better solution?

It's kind of hard for me to know what to do with this.
The cache invalidation is still very very broken with or without this patch.
So this patch is not wrong but everything it does probably needs to be rewritten anyway

Would handling NonLocalDefsCache in the same places as other DenseMaps from Instruction * be sufficient? In particular, RemoveCachedNonLocalPointerDependencies, called by both removeInstruction and invalidateCachedPointerInfo, looks promising ;)

Prazek added a comment.Apr 5 2018, 1:28 PM

It's kind of hard for me to know what to do with this.
The cache invalidation is still very very broken with or without this patch.
So this patch is not wrong but everything it does probably needs to be rewritten anyway

I know, I also wish we would get rid of MemDep asap, but as long as it's there, it would be good to avoid weird miscompilations that causes calls to vtable.

Would handling NonLocalDefsCache in the same places as other DenseMaps from Instruction * be sufficient? In particular, RemoveCachedNonLocalPointerDependencies, called by both removeInstruction and invalidateCachedPointerInfo, looks promising ;)

Yep, this is probably the right way to go

"I know, I also wish we would get rid of MemDep asap, but as long as it's there, it would be good to avoid weird miscompilations that causes calls to vtable."

But it still will, just not in this particular way.
But it will still fail in very common ways anyway :)

We may actually have to disable preservation entirely for the moment anyway.

Prazek updated this revision to Diff 141536.Apr 8 2018, 6:01 AM

Removed ValueMap, use removeCachedNonLocalPointerDependency

Prazek updated this revision to Diff 141550.Apr 8 2018, 9:45 AM
  • small fix
Prazek updated this revision to Diff 141553.Apr 8 2018, 9:47 AM
  • small fix
amharc accepted this revision.May 3 2018, 4:24 AM

I think it would be a little bit cleaner if the relevant checks were added to verifyRemoved too, where all other maps are traversed in debug builds. However, AssertingVH is probably a good enough guarantee ;)

You probably want to wait for dberlin to accept this.

This revision is now accepted and ready to land.May 3 2018, 4:24 AM

So, please don't let me stand in the way of reasonable fixes. I only get
~20 minutes a day to look at llvm stuff these days unless i'm on vacation.
Maybe some day it'll be better again, but i suspect it'll be a while ;)

dberlin added inline comments.May 3 2018, 7:55 AM
llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
1475

You may want to do compilation time measurements on this on some bigger cases, if you haven't.

This is likely to be painfully slow in some cases.

dberlin accepted this revision.May 3 2018, 7:55 AM
Prazek updated this revision to Diff 145371.May 5 2018, 7:13 AM

Added reverse map

Prazek updated this revision to Diff 145372.May 5 2018, 7:25 AM
  • fix typos

During MemorySSA development, i'm pretty positive we tried some similar types of reverse maps, and while it did relatively okay on "llvm + clang testsuite", it fell down pretty hard in the real world from memory usage.

Adding George to see if he remembers any cases, so you have some stuff to test this all with.
He's on vacation, but i believe gets back next week.

Prazek added a comment.May 5 2018, 7:59 AM

During MemorySSA development, i'm pretty positive we tried some similar types of reverse maps, and while it did relatively okay on "llvm + clang testsuite", it fell down pretty hard in the real world from memory usage.

Adding George to see if he remembers any cases, so you have some stuff to test this all with.
He's on vacation, but i believe gets back next week.

I don't think it would be a problem here, I don't expect this map to keep more than couple of entries (for most cases 0 and 1). The same thing goes to the previous approach.

Adding George to see if he remembers any cases, so you have some stuff to test this all with.

I don't remember anything in particular. I'd recommend just bootstrapping clang/llvm and seeing what happens, assuming clang will emit these intrinsics if you pass an added cflag to it or something.

It's my impression that halide is great at generating code that hits worst-case behaviors in LLVM. I don't know whether they're users of these invariant intrinsics, however.

Adding George to see if he remembers any cases, so you have some stuff to test this all with.

I don't remember anything in particular. I'd recommend just bootstrapping clang/llvm and seeing what happens, assuming clang will emit these intrinsics if you pass an added cflag to it or something.

It's my impression that halide is great at generating code that hits worst-case behaviors in LLVM. I don't know whether they're users of these invariant intrinsics, however.

Probably not. I will make some analysis and get back here

Prazek marked an inline comment as done.May 18 2018, 9:33 AM

I measured build time of llvm& clang with devirtualization and either the variance is too big, or the compiler without this patch is faster.
As I found out, after writing other bugfix for AA, clang does not bootstrap correctly without this patch so I would love to push it.
If you don't have any objections, I would like to push it. I will still have to do build time measurements with and without devirtualization, but I would like to do it when everything is ready.
(Note that this code will not be executed unless you use devirtualization)

dberlin accepted this revision.May 18 2018, 9:35 AM

WFM; thanks for checking!

Prazek updated this revision to Diff 147602.May 18 2018, 3:37 PM

Added very small repro for the bug :)

Prazek updated this revision to Diff 147603.May 18 2018, 3:39 PM

changed comment

Prazek edited the summary of this revision. (Show Details)May 18 2018, 3:41 PM
This revision was automatically updated to reflect the committed changes.