Page MenuHomePhabricator

[DTU] Refine the interface and logic of applyUpdates
ClosedPublic

Authored by NutshellySima on Feb 13 2019, 3:39 AM.

Details

Summary

This patch separates two semantics of applyUpdates:

  1. User provides an accurate CFG diff and the dominator tree is updated according to the difference of the number of edge insertions and the number of edge deletions to infer the status of an edge before and after the update.
  2. User provides a sequence of hints. Updates mentioned in this sequence might never happened and even duplicated.

Logic changes:

Previously, removing invalid updates is considered a side-effect of deduplication and is not guaranteed to be reliable. To handle the second semantic, applyUpdates does validity checking before deduplication, which can cause updates that have already been applied to be submitted again. Then, different calls to applyUpdates might cause unintended consequences, for example,

DTU(Lazy) and Edge A->B exists.
1. DTU.applyUpdates({{Delete, A, B}, {Insert, A, B}}) // User expects these 2 updates result in a no-op, but {Insert, A, B} is queued
2. Remove A->B
3. DTU.applyUpdates({{Delete, A, B}}) // DTU cancels this update with {Insert, A, B} mentioned above together (Unintended)

But by restricting the precondition that updates of an edge need to be strictly ordered as how CFG changes were made, we can infer the initial status of this edge to resolve this issue.

Interface changes:
The second semantic of applyUpdates is separated to applyUpdatesPermissive.
These changes enable DTU(Lazy) to use the first semantic if needed, which is quite useful in transforms/utils.

Diff Detail

Repository
rL LLVM

Event Timeline

NutshellySima created this revision.Feb 13 2019, 3:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2019, 3:39 AM

Further explanation:
The motivation of this patch is that currently there can be bugs like that mentioned in summary.
But after inspecting that JT and other utils never submit updates that have been applied, we can get the information of whether a specific edge <From, To> exists in the previous CFG.
For example, if we observe a sequence of updates of an edge<From, To> to be

1. Insert --> 2. Delete --> 3. Delete --> 4. Insert --> 5. Delete

Because it is supposed that "never submit updates that have been applied", so we can infer from 1. Insert that <From, To> does not exist in the previous CFG. We can then inspect the current CFG to see if there is an edge<From, To>. If there is one, it is sure that an insertion happens. Otherwise, the 5 operations result in a no-op.

brzycki added inline comments.Feb 13 2019, 11:19 AM
llvm/lib/Analysis/DomTreeUpdater.cpp
284 ↗(On Diff #186620)

@NutshellySima it's probably me but I don't see how the comments match the code here. Let's take this example:

Strategy: Lazy
CFG: edge exists between A->B
Updates: {{Delete, A, B}, {Insert, A, B}}
U: {Delete, A, B}
Edge: {A,B}
isUpdateValid: true

In this case we will enqueue a deletion of edge A->B and not process the second update, the {Insert, A, B} because that Edge already exists in Seen. If a flush() is called immediately after the state of DTU is incorrect.

Am I missing something here that prevents this case?

llvm/unittests/Analysis/DomTreeUpdaterTest.cpp
793 ↗(On Diff #186620)

Nit: please add a newline.

NutshellySima added a comment.EditedFeb 13 2019, 8:16 PM
  1. I found I had assumed some extra limitations: Users need to apply updates in the same order they made to CFG (at least for the same pair of <From, To>). Because the only functionality the Update array applied by users is inferring the existence of a specific edge in the initial CFG.
  1. This patch solves a current drawback of the DTU interface that when it sees both Insert and Delete of a pair of <From, To> in a single call to applyUpdates(), it will always choose one no matter whether they actually result in a no-op in this round. So, when DTU does no-ops finding (whose functionality is now equivalent to legalizeUpdates in DT), its result is not reliable. Though I don't see people submit both Insert and Delete updates of a single edge in a single call to applyUpdates(), it makes the semantic of CFG changes different in Eager and Lazy mode, so I think it's essential to get rid of this.
llvm/lib/Analysis/DomTreeUpdater.cpp
284 ↗(On Diff #186620)

In the case you mentioned, isUpdateValid: true should be isUpdateValid: false.

isUpdateValid is a utility that inspects the current CFG and returns true when

  1. U = {Delete, A, B} and there isn't any A -> B edge now.
  2. U = {Insert, A, B} and there is at least an A -> B edge now.

So, when we see the first update {Delete, A, B}, first, we can infer that the initial CFG which DTU is in sync with has the edge <A, B>; Second, by inspecting the current CFG via isUpdateValid which returns false in this case, we can know that there is still <A, B> now.

So, we can now reach a conclusion that

  1. If there are further actions different from Delete on this edge, they all result in a no-op. Because when you submit the second update {Insert, A, B}, the precondition is that in the CFG produced by (the initial CFG + the first update {Delete, A, B}) [no matter whether the first one is a fake update], edge <A, B> doesn't exist. (Because you cannot Insert an existent edge.) In the case you mentioned, we can infer that the first update actually happened and the second update actually happened too.
  2. Otherwise (i.e., the whole Update array is made up of {Delete, A, B}s), they are all updates that never happened.
brzycki added inline comments.Feb 14 2019, 8:39 AM
llvm/lib/Analysis/DomTreeUpdater.cpp
284 ↗(On Diff #186620)

Ah, got it! I was incorrectly reading isUpdateValid() while looking at the code yesterday.

I'm cautiously optimistic this will work. The real test will be the forthcoming patch to change all the *relaxed() calls and see if we have asserts or breaks.

brzycki added inline comments.Feb 15 2019, 9:50 AM
llvm/lib/Analysis/DomTreeUpdater.cpp
284 ↗(On Diff #186620)

@NutshellySima it might be a good idea to add this reasoning to the comments of applyUpdates(). The interplay is subtle here and really needs to be documented.

NutshellySima retitled this revision from [DTU] Refine the logic of deduplication to [DTU] Refine the interface and logic of applyUpdates.
NutshellySima edited the summary of this revision. (Show Details)
  1. Add explanations and examples on why the invalid updates removing code works.
  2. Add the precondition that the order of an edge in Updates needs to be ordered as CFG changes were made.
  3. Separate applyUpdates into applyUpdatesAccurate and applyUpdatesHinted to decouple different semantics of applyUpdates from Eager/Lazy mode and enable DTU(Lazy) to accept unordered accurate updates.

Summary is also updated to explain why and how logic and interface of applyUpdates changes.

I have also checked all code involving DTU.applyUpdates and is confident that the Updates array provided conforms to "Users need to apply updates in the same order they made to CFG for the same pair of <From, To>".

I would like to test this patch thoroughly before landing, but I don't know if there is other testing utilities besides ninja check-*.

NutshellySima marked 5 inline comments as done.Feb 20 2019, 7:29 AM

@NutshellySima I like the idea of the new naming scheme that makes usage more explicit.

I noticed there are no changes to JumpThreading.cpp, do they need to be altered in a separate patch or are you fine with all of applyUpdates implicitly becoming applyUpdatesHinted in Lazy mode?

I also find the name applyUpdatesHinted a bit confusing and had to read your comments above to understand Hinted meant "we check the CFG for validity". Maybe use applyUpdatesChecked or applyUpdatesValidated as an alternative name that makes it a bit more clear that extra work must be done for this set of updates.

@NutshellySima I like the idea of the new naming scheme that makes usage more explicit.

I noticed there are no changes to JumpThreading.cpp, do they need to be altered in a separate patch or are you fine with all of applyUpdates implicitly becoming applyUpdatesHinted in Lazy mode?

I also find the name applyUpdatesHinted a bit confusing and had to read your comments above to understand Hinted meant "we check the CFG for validity". Maybe use applyUpdatesChecked or applyUpdatesValidated as an alternative name that makes it a bit more clear that extra work must be done for this set of updates.

From my perspective, it is nice to decouple two different updates submitting methods (accurate/inaccurate) from the way to flush these changes (Eager/Lazy).
But concerning most Lazy strategy users use applyUpdatesChecked and most Eager strategy users use applyUpdatesAccurate, I think it is OK to use applyUpdates as an alias of applyUpdatesChecked under Lazy strategy. :)

I'll rename applyUpdatesHinted as applyUpdatesChecked later.

kuhar added a comment.EditedFeb 20 2019, 10:25 AM

I think the default for applyUpdates should be the strict mode; any code that relies on imprecise information about updates should make it clear by using some longer name, like applyUpdatesRelaxed.

I think the default for applyUpdates should be the strict mode; any code that relies on imprecise information about updates should make it clear by using some longer name, like applyUpdatesRelaxed.

I'm fine with making applyUpdates default to strict mode to make it consistent to other parts of the codebase and encourage people to use the strict one which does not have extra overhead on scanning the CFG.

But if the user follows the rules on using applyUpdatesRelaxed/Checked, DTU is also guaranteed to have precise information on updates.

kuhar added a comment.EditedFeb 20 2019, 11:03 AM

I think the default for applyUpdates should be the strict mode; any code that relies on imprecise information about updates should make it clear by using some longer name, like applyUpdatesRelaxed.

I'm fine with making applyUpdates default to strict mode to make it consistent to other parts of the codebase and encourage people to use the strict one which does not have extra overhead on scanning the CFG.

But if the user follows the rules on using applyUpdatesRelaxed/Checked, DTU is also guaranteed to have precise information on updates.

Agreed. Since this turned into yak shaving, how about appplyUpdatesPermissive? To me applyUpdatesRelaxed sounds similar to relaxed memory models, where you can have more interleaving, and applyUpdatesChecked implies that it does more checks, but is not necessarily functionally different.

NutshellySima edited the summary of this revision. (Show Details)
  1. Rename applyUpdatesHinted to applyUpdatesPermissive; Remove applyUpdatesAccurate.
  2. Default to strict mode.
brzycki accepted this revision.Feb 20 2019, 12:43 PM

LGTM.

This revision is now accepted and ready to land.Feb 20 2019, 12:43 PM
This revision was automatically updated to reflect the committed changes.