This is an archive of the discontinued LLVM Phabricator instance.

[NewGVN][LoadCoercion][1/3] Add support for load coercion between store and load instructions
Needs ReviewPublic

Authored by kmitropoulou on Apr 19 2022, 11:46 PM.

Details

Summary

Load coercion is done in two phases:

  1. Collection of loads: First, we collect the loads that have a dependency with stores. The loads along with their dependencies are added in LoadCoercion map.
  1. Code generation: Iterate over the loads of LoadCoercion map and replace the loads with the value that we extract from the stores. This is done in implementLoadCoercion().

New instructions are generated during load coercion. In order to eliminate them, we run value numbering for them. For this reason, the load coercion should be completed before the elimination phase.

Diff Detail

Event Timeline

kmitropoulou created this revision.Apr 19 2022, 11:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2022, 11:46 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
kmitropoulou requested review of this revision.Apr 19 2022, 11:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2022, 11:46 PM
foad added a subscriber: foad.Apr 20 2022, 1:27 AM

FWIW I am not sure if we should be adding new features to NewGVN at the moment. There are multiple correctness issues of varying degrees. To have a credible path for NewGVN to become useful eventually, I think we should first prioritize fixing the known issues.

llvm/lib/Transforms/Scalar/NewGVN.cpp
3954

I don't think modifying the IR directly here and re-running numbering afterwards really fits with the NewGVN model. If the load is congruent to a known expression, couldn't the load be moved to the same congruence class as that expression and then the existing replace and erase logic will take care of the rest?

llvm/test/Transforms/NewGVN/load_coercion_between_store_and_load.ll
2–3

please keep the RUN line so that the diff shows the functional changes in newgvn. As it is now, it is very hard to see the changes with newgvn

kmitropoulou marked an inline comment as done.

Updating D124071: [1/3][NewGVN][LoadCoercion] Add support for load coercion between store and load instructions.

kmitropoulou marked an inline comment as done.Apr 20 2022, 10:12 PM
kmitropoulou added inline comments.
llvm/lib/Transforms/Scalar/NewGVN.cpp
3954

Good point! This was my initial approach too. But, I ended up having some problems. NewGVN adds instructions with similar expressions in the same congruence class. In the elimination phase, the algorithm traverses all the congruence classes and it replaces each member of the congruence with the leader of the congruence class.

Let's assume that we have to do load coercion between a vector store and a scalar load. The current algorithm will not add the store and the load in the same congruence class. We have to do a small hack to do this.

Now, the store and the load are in the same congruence class. In the elimination phase, we can check if any of the members of the congruence class are in LoadCoercion map. If yes, then we can extract the loaded value from the leader of the congruence class (stored value) by generating the right sequence of instructions. This is done at the beginning of the elimination phase. Next, we remove the load from the congruence class because: i. we have to prevent the algorithm from processing this load again and ii. we have to let the algorithm eliminate the other members of the congruence class without a problem. In other words, the load is treated as a special case in the elimination phase. IMO, this does not make much sense since we do not use the elimination phase at all. It is much better to have a separate step that deals with all the loads that need to be optimized with load coercion. After that, we can run the elimination phase without any problem. This is what the current patch does.

Another problem is that it is not expected to emit instructions during the elimination phase. As a result, it is hard to optimize them. Load coercion emits a new sequence of instructions. For example, in test13, the two loads are replaced by %1 = trunc i32 %V1 to i8. The code has already %V4 = trunc i32 %V1 to i8. The elimination phase will not optimize the two truncates. Because, it traverses the congruence classes in reverse order. So, it first processes the %V4 = trunc i32 %V1 to i8. Next, it processes the congruence class with the two loads that should be optimized with load coercion. It emits the truncate and the value numbering adds the newly generated truncate in the same class as %V4. But, this class has already been processed. As a result, we miss to eliminate %V4 by %1 (or vice versa). To deal with this problem, we have to bookkeep the classes that were updated and next, we should run the elimination phase for these classes again. The current patch does not have this problem, because the newly generated instructions are added in the corresponding congruence classes before the elimination phase.

Another problem is that the elimination phase works without problems if the instructions have correct DFS numbers. If we put the load that needs to be optimized with load coercion in a congruence class, then we have to update the DFS numbers because of the newly generated instructions. This means that we might have to update the DFS numbers several times. This patch updates the DFS numbers only once.

IMO, this patch is more explicit and more clean since it tries to do the load coercion without complicating the elimination phase and by doing minimal changes.

kmitropoulou marked an inline comment as done.Apr 20 2022, 10:14 PM
kmitropoulou marked an inline comment as not done.Apr 20 2022, 10:16 PM

FWIW I am not sure if we should be adding new features to NewGVN at the moment. There are multiple correctness issues of varying degrees. To have a credible path for NewGVN to become useful eventually, I think we should first prioritize fixing the known issues.

@fhahn : Can you please point which of these problems affect load coercion? I might try to address them.

kmitropoulou marked an inline comment as done.Apr 20 2022, 10:19 PM

Updating D124071: [1/3][NewGVN][LoadCoercion] Add support for load coercion between store and load instructions.

[NewGVN][LoadCoercion][1/3] Add support for load coercion between store and load instructions.

kmitropoulou edited the summary of this revision. (Show Details)Jun 13 2022, 2:48 AM
kmitropoulou edited the summary of this revision. (Show Details)
kmitropoulou retitled this revision from [1/3][NewGVN][LoadCoercion] Add support for load coercion between store and load instructions. to [NewGVN][LoadCoercion][1/3] Add support for load coercion between store and load instructions.Jun 13 2022, 2:56 AM
xbolva00 added inline comments.
llvm/test/Transforms/NewGVN/load_coercion_between_store_and_load.ll
355

GVN fails to remove dead instructions?

kmitropoulou added inline comments.Jun 13 2022, 3:25 AM
llvm/test/Transforms/NewGVN/load_coercion_between_store_and_load.ll
355

Yes, these instructions are removed by dce. This sometimes happens with NewGVN as well.

kmitropoulou edited the summary of this revision. (Show Details)Jun 13 2022, 4:03 AM

@fhahn Do you we have anyone leading the GVN/NewGVN work these days do you know? I can't see anything in CODE_OWNERS

Updating D124071: [NewGVN][LoadCoercion][1/3] Add support for load coercion between store and load instructions

Updating D124071: [NewGVN][LoadCoercion][1/3] Add support for load coercion between store and load instructions

Updating D124071: [NewGVN][LoadCoercion][1/3] Add support for load coercion between store and load instructions

kmitropoulou edited the summary of this revision. (Show Details)Sep 15 2022, 10:49 PM
kmitropoulou edited the summary of this revision. (Show Details)

Updating D124071: [NewGVN][LoadCoercion][1/3] Add support for load coercion between store and load instructions

Quick note, since updates on this until now were offline.
After the fix in D130910 that Konstantina did, I was able to do some testing for NewGVN and haven't run into any crashes, which was a first. This doesn't mean that the current NewGVN is fully correct, just that in the testing I did nothing was obviously broken. This does mean I can test the correctness of this stack of patches on the same workloads.
I tested the previous iteration, followed up with reproducers for issues found and I'm testing the current diff.

Regarding:

@fhahn Do you we have anyone leading the GVN/NewGVN work these days do you know? I can't see anything in CODE_OWNERS

I'm currently focusing more on testing the GVN&MSSA patches by @momchil.velikov, than getting NewGVN up to parity.
After the above fix, I did some performance testing with NewGVN turned on vs GVN and it's pretty terrible. To give a rough idea, I'm seeing 20-70% regression for running time, and 30% compile-time regression (due to the compiler being bootstrapped; newgvn compiling the compiler makes the new compiler 30% slower). The biggest factor are the PRE optimizations that GVN does, but even comparing a single stage build NewGVN to GVN with PRE disabled, there are still a few test-suites that show up to 60% regressions.
On the flip side, comparing NewGVN vs GVN with no PRE, there are a couple of big improvements on ARM, so it's quite possible there are gains to harness for specific use cases and/or architectures, but there's a lot of work left to make it a replacement for GVN in general.