This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Add separate UseList for each ResNo of SDNode
AbandonedPublic

Authored by kovdan01 on Apr 15 2022, 7:27 AM.

Details

Summary

It is now a common pattern to iterate over SDNode uses corresponsing to a
specific ResNo with code like this:

for (auto UI = N->use_begin(), UE = N->use_end(); UI != UE; ++UI) {
  if (UI.getUse().getResNo() != SomeValue)
    continue;
  // ...
}

It results in performance decrease when we have many ResNo's and have to
iterate over all their uses instead of only needed ones. A perfect example of
that is SDNode::hasNUsesOfValue.

This patch adds new fields to SDNode and SDUse responsible for use lists for
each ResNo separately. Keeping in mind that it increases those classes sizes,
performance testing with llvm-test-suite was done, and no noticeable issues
with compile and link timings were detected. On the contrary, when compiling
IR like this, we get a significant performance enhancement (78.4s -> 14.7s on
my machine).

%struct.large = type { [65535 x i8] }
define void @foo(%struct.large %s, %struct.large* %p) {
  store %struct.large %s, %struct.large* %p, align 1
  ret void
}

This patch will be followed by another one fixing failing assertion when
compiling code with large structs (which was the root cause of investigation
resulting in current change).

Diff Detail

Event Timeline

kovdan01 created this revision.Apr 15 2022, 7:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2022, 7:27 AM
kovdan01 requested review of this revision.Apr 15 2022, 7:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2022, 7:27 AM

IR like this is discouraged. I think it's almost considered a historical design mistake that you can store an entire aggregate in one IR instruction.

%struct.large = type { [65535 x i8] }
define void @foo(%struct.large %s, %struct.large* %p) {
  store %struct.large %s, %struct.large* %p, align 1
  ret void
}
llvm/include/llvm/CodeGen/SelectionDAGNodes.h
583

How many inline values does this SmallVector end up creating? 1 or 2 results is by far the most common so I wonder if we should explicitly use 2 instead of leaving it up to the SmallVector to decide.

How much has @craig.topper's hasOneUse reordering fixes had on your perf numbers?

kovdan01 abandoned this revision.Apr 20 2022, 4:00 AM

That fixed my issue for X86, thanks! Regarding other targets - it turns out that my patch gives no performance enhancement (tested that). So abandon the revision - as for now it seems not needed anymore.