This is an archive of the discontinued LLVM Phabricator instance.

[SeparateConstOffsetFromGEPPass] Added optional modification strategy
AbandonedPublic

Authored by eklepilkina on Jun 14 2022, 2:04 AM.

Details

Summary

This modification strategy tries to understand which GEP instrucions is profitable to modify for register pressure decreasing.

Diff Detail

Event Timeline

eklepilkina created this revision.Jun 14 2022, 2:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2022, 2:04 AM
eklepilkina requested review of this revision.Jun 14 2022, 2:04 AM

Please rebase against precommited tests.

anton-afanasyev edited the summary of this revision. (Show Details)Jun 14 2022, 5:07 AM

Rebased against precommited tests

reames added a subscriber: reames.Jun 14 2022, 7:56 AM

At least for me, I need some context to be able to review this. What is the case which this improves in terms of codegen? And how common are such patterns? Keep in mind, I'm not terribly familiar with the pass here, so this may be pretty basic explanation. Do you have a bug with examples or analyze that lead to this change?

Sorry, I had to provide the context at the beginning.

Now clang for RISC-V doesn't use offset addressing in generated assembly. Example from Dhrystone

 addiw   a0, s1, 5
 slli    a1, a0, 0x2
 add     a2, s4, a1
 sw      s2, 0(a2)
 addiw   a3, s1, 6
 slli    a3, a3, 0x2
 add     a3, a3, s4
 sw      s2, 0(a3)
 addiw   a3, s1, 35
 slli    a3, a3, 0x2
add     a3, a3, s4
sw      a0, 0(a3)

It's inefficient because we can use offsets.
Adding this pass allows to generate the next code

    addiw   a4, a2, 5
    slli    a5, a2, 2
    add a0, a0, a5
    sw  a3, 20(a0)
    sw  a3, 24(a0)
    sw  a4, 140(a0)
...

SeparateConstOffsetFromGEPPass is used to solve this problem in targets with limited addressing modes.
The changes inside pass was made because modification of all GEPs isn't profitable, seems that we need at least 2 GEPs and one value that was used for index that can be removed after modification. Otherwise we don't decrease register pressure.

This should be two patches, one changing the pass and one enabling for RISC-V.

llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
171 ↗(On Diff #436758)

Can we add a command line option to control this like AArch64 and PowerPC have?

llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll
2

This test doesn't exist in the repo. Where is the patch that adds it?

eklepilkina added inline comments.Jun 14 2022, 8:54 AM
llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll
2

I was told in the first comment to rebase on precommited tests. These tests are added as precommited in separate commit. Should I commit them?

craig.topper added inline comments.Jun 14 2022, 8:58 AM
llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll
2

Why is the script NOTE: Assertions have been autogenerated by utils/update_test_checks.py not in the pre-committed version?

This should be two patches, one changing the pass and one enabling for RISC-V.

As far as I want to turn pass with enabled strategy, should I wait approve and merge of the accepting strategy and only after this create the second review? Or create series of patches as mentionedin documentation https://llvm.org/docs/Phabricator.html#creating-a-patch-series?

craig.topper added inline comments.Jun 14 2022, 9:09 AM
llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
368

I think there you should be a std::move on PreviousIndices

371

rhs -> RHS

970–974

PossibleBase.size() == 0 -> PossibleBases.empty()

This should be two patches, one changing the pass and one enabling for RISC-V.

As far as I want to turn pass with enabled strategy, should I wait approve and merge of the accepting strategy and only after this create the second review? Or create series of patches as mentionedin documentation https://llvm.org/docs/Phabricator.html#creating-a-patch-series?

You should create a series of patches.

Separate part with pass modification

eklepilkina retitled this revision from [RISCV] Turn on SeparateConstOffsetFromGEPPass for RISC-V target and added optional modification strategy in it to [SeparateConstOffsetFromGEPPass] Added optional modification strategy.Jun 15 2022, 7:23 AM
eklepilkina edited the summary of this revision. (Show Details)
yakush added a subscriber: yakush.Jun 16 2022, 3:44 AM
  • [SeparateConstOffsetFromGEP] Fix comparator for map with GEP bases

Refactoring

  • [SeparateConstOffsetFromGEP] Fix ignoring condition

Some nits from me. If I may, some advice if you want to make progress here.

First, it seems that some pieces of this patch can be split out as seprate NFC refactorings. If so, please do. It should greatly reduce the code to look at, and smaller patches are generally easier to comprehend and review.

Second, the motivation of this patch is obscure. The structures and algorithms that you are using are not obvious. This either needs a detailed explanation in comments, or maybe incremental approach, when each step is understandable.

Third, the benefit isn't obvious either. In your tests, new IR is bigger than the old IR. This is not necessarily bad, but it requires explanation. If you have some particular scenario where the resulting assembly is better with this change, then it makes sense to provide an LLC test which shows it.

Fourth, this patch claims to improve register pressure. At the same time, it is done in a middle-end pass, which means its impact on register pressure on different platforms might be different. Was it actually tested on other platforms than RICSV? Or you think that the algorithm should be profitable regardless of the platform? Can I see any benefit from this patch in X86 for example?

llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
203

Can this go as a separate NFC?

271–277

Please commit this reformatting separately.

280

\p CheckProfitability ?..

360

This requires more explanation. I could not figure what are indices, which of them is being optimized, and what is precedence in this context. Maybe write a detailed comment on what's going on here and what does this structure represent?

449

Canonize -> Canonicalize

450

I guess it should be "Returns true if a change was made, false otherwise".

547

Maybe rename InstructionsToTransform -> GEPsToTransform?

1083

Pls commit separately if it is needed.

1127

Rename as separate NFC?

1373

To me, this code structure looks counter-intuitive. Why do we print "Try to split GEP "... only when we check profitability, and do it silently when we don't? If possible, please restructure it like

if (CheckProfitability) {
  // Do all required profitability checks
}
// Do common transform logic uniformly

I'm not sure if it's possible here because of this post-processing. If not, then the transform part should be unified somehow else.

1380

More natural way would be

if (!CurrentChanged)
  continue;
for ...
1381

The complexity of this is SortedInstructionsList.size() * SortedInstructionsList.size() * sum(SortedInstructionsList[J]) if I'm reading this correctly. Looks very expensive. Is there a cheaper way of doing this? Imagine you have 10k instructions on your list. It will just be stuck forever.

llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll
86

Why is it a better code tha the old one?

286

This code is bigger than it used to be. Can you explain why is it better?

eklepilkina marked 8 inline comments as done.Aug 10 2022, 8:02 AM

Can I see any benefit from this patch in X86 for example?

This pass was written for targets with limited addressing mode, so it isn't added to X86 pipeline. It's used under the flag on Aarch64 and on RISCV we also suggest to turn off it by default, but this patch helps to make these optimization be useful more often, and remove some regressions that was found if turn these pass on on all test-suite. I'll provide test-suite results on Aarch64 platform with turned this pass with and without this patch. But yes, the main measurements were made for RISCV.

And if you mean that these changes should be done later in pipeline, there is the problem with the current instruction selection that can work only with one BB, so CodeGenPrepare pass need to sunk such GEPs with const to generate adddressing by offset, so I believe this pass was created as middle-end part.

llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
1127

I don't really like the idea to rename in separate NFC patch, because renaming is connected with changes that were made and the old name wasn't suitable any more

1373

I understand your concerns, but I don't see a good solution here, because I don't want to make the unneeded actions for original version without checking profitability.

1381

Imagine you have 10k instructions on your list

I amn't sure we should optimize this case, because it's mostly impossible, because this list is always quite small. I'll think some more, but I amn't sure that the optimization here is more important than readability.

llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll
86

In assembly we use one more register to save the result of new generated GEP instruction, bt we have no profit because registers that are used by adds are also needed as far as these values are used in other instructions.

286

This code is bigger on IR, and it's so becuase of repeating sext opertaions, but sext isn't so critical in assembly, at the same time pass generates 2 new GEP instructions that are used as base and we need registers for them

  • [NFC][SeparateConstOffsetFromGEP] Small refactoring and reformatting
  • [SeparateConstOffsetFromGEPPass] Added optional modification strategy
  • Review fixes (part 1)
  • [SeparateConstOffsetFromGEPPass] Added optional modification strategy
  • Review fixes (part 1)
mkazantsev requested changes to this revision.Aug 11 2022, 2:58 AM

Found some bugs, some style proposals as well. The general point still holds. If the patch is purposed to reduce register pressure on some platform, please provide a test which shows that this actually happens. This can only be shown on a llc test.

llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
247

This comment is obsolete now, Extract does not have these new parameters.

272

And if V is not binop, should it change?

362

This is usually called BasePointer in other parts of optimizer.

363

Shouldn't %b also be a part of it? Or where does it go?
Maybe more elaborate example on how there can be more than one previous index?

388

APInt? Just to make sure this doesn't overflow.

389

Naturaly I'd expect this to be SmallVector<const ConstantInt *>, but the code below suggests there might not be constants. Misleading name?

557

Use DenseMapInfo<Value *>::getTombstoneKey() and same above

561

Why PreviousIndices size but not contents?

709

No { }

710

undef and poison are constants but not ConstantInt. Are you OK with them?

1381

"Mostly impossible" means "possible". We generally bail out on non-linear algorithms with some thresholds. This could also be the case here.

llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll
286

Then please provide a llc test that demonstrates a positive change.

The fact that "sext isn't so critical" is a way not obvious to me. Filling the upper part of the registry may sometimes be an extra operation.

This revision now requires changes to proceed.Aug 11 2022, 2:58 AM
eklepilkina abandoned this revision.Aug 29 2022, 5:17 AM

Sorry for delay. Looked more on different benchmarks from test-suite during searching a good test case. There are such cases. But a deep exploration shows that SeparateConstOffsetFromGEP pass isn't the main reason, it produces better IR, but in some cases later passes can make it worse and cause worse asssembly code. So hacks I have made in the current pass as workarounds for these particular cases don't seem to be the proper decision. As far as this pass isn't the main reason of regressions we got, I decided to abandon this review.

Thank you very much for review and sorry to bother you.