This modification strategy tries to understand which GEP instrucions is profitable to modify for register pressure decreasing.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
At least for me, I need some context to be able to review this. What is the case which this improves in terms of codegen? And how common are such patterns? Keep in mind, I'm not terribly familiar with the pass here, so this may be pretty basic explanation. Do you have a bug with examples or analyze that lead to this change?
Sorry, I had to provide the context at the beginning.
Now clang for RISC-V doesn't use offset addressing in generated assembly. Example from Dhrystone
addiw a0, s1, 5 slli a1, a0, 0x2 add a2, s4, a1 sw s2, 0(a2) addiw a3, s1, 6 slli a3, a3, 0x2 add a3, a3, s4 sw s2, 0(a3) addiw a3, s1, 35 slli a3, a3, 0x2 add a3, a3, s4 sw a0, 0(a3)
It's inefficient because we can use offsets.
Adding this pass allows to generate the next code
addiw a4, a2, 5 slli a5, a2, 2 add a0, a0, a5 sw a3, 20(a0) sw a3, 24(a0) sw a4, 140(a0) ...
SeparateConstOffsetFromGEPPass is used to solve this problem in targets with limited addressing modes.
The changes inside pass was made because modification of all GEPs isn't profitable, seems that we need at least 2 GEPs and one value that was used for index that can be removed after modification. Otherwise we don't decrease register pressure.
This should be two patches, one changing the pass and one enabling for RISC-V.
llvm/lib/Target/RISCV/RISCVTargetMachine.cpp | ||
---|---|---|
171 ↗ | (On Diff #436758) | Can we add a command line option to control this like AArch64 and PowerPC have? |
llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll | ||
1 ↗ | (On Diff #436758) | This test doesn't exist in the repo. Where is the patch that adds it? |
llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll | ||
---|---|---|
1 ↗ | (On Diff #436758) | I was told in the first comment to rebase on precommited tests. These tests are added as precommited in separate commit. Should I commit them? |
llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll | ||
---|---|---|
1 ↗ | (On Diff #436758) | Why is the script NOTE: Assertions have been autogenerated by utils/update_test_checks.py not in the pre-committed version? |
This should be two patches, one changing the pass and one enabling for RISC-V.
As far as I want to turn pass with enabled strategy, should I wait approve and merge of the accepting strategy and only after this create the second review? Or create series of patches as mentionedin documentation https://llvm.org/docs/Phabricator.html#creating-a-patch-series?
Some nits from me. If I may, some advice if you want to make progress here.
First, it seems that some pieces of this patch can be split out as seprate NFC refactorings. If so, please do. It should greatly reduce the code to look at, and smaller patches are generally easier to comprehend and review.
Second, the motivation of this patch is obscure. The structures and algorithms that you are using are not obvious. This either needs a detailed explanation in comments, or maybe incremental approach, when each step is understandable.
Third, the benefit isn't obvious either. In your tests, new IR is bigger than the old IR. This is not necessarily bad, but it requires explanation. If you have some particular scenario where the resulting assembly is better with this change, then it makes sense to provide an LLC test which shows it.
Fourth, this patch claims to improve register pressure. At the same time, it is done in a middle-end pass, which means its impact on register pressure on different platforms might be different. Was it actually tested on other platforms than RICSV? Or you think that the algorithm should be profitable regardless of the platform? Can I see any benefit from this patch in X86 for example?
llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp | ||
---|---|---|
203 | Can this go as a separate NFC? | |
262–274 | Please commit this reformatting separately. | |
271 | \p CheckProfitability ?.. | |
357 | This requires more explanation. I could not figure what are indices, which of them is being optimized, and what is precedence in this context. Maybe write a detailed comment on what's going on here and what does this structure represent? | |
434 | Canonize -> Canonicalize | |
435 | I guess it should be "Returns true if a change was made, false otherwise". | |
531 | Maybe rename InstructionsToTransform -> GEPsToTransform? | |
1066–1067 | Pls commit separately if it is needed. | |
1111 | Rename as separate NFC? | |
1357 | To me, this code structure looks counter-intuitive. Why do we print "Try to split GEP "... only when we check profitability, and do it silently when we don't? If possible, please restructure it like if (CheckProfitability) { // Do all required profitability checks } // Do common transform logic uniformly I'm not sure if it's possible here because of this post-processing. If not, then the transform part should be unified somehow else. | |
1364 | More natural way would be if (!CurrentChanged) continue; for ... | |
1365 | The complexity of this is SortedInstructionsList.size() * SortedInstructionsList.size() * sum(SortedInstructionsList[J]) if I'm reading this correctly. Looks very expensive. Is there a cheaper way of doing this? Imagine you have 10k instructions on your list. It will just be stuck forever. | |
llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll | ||
86 ↗ | (On Diff #445498) | Why is it a better code tha the old one? |
286 ↗ | (On Diff #445498) | This code is bigger than it used to be. Can you explain why is it better? |
Can I see any benefit from this patch in X86 for example?
This pass was written for targets with limited addressing mode, so it isn't added to X86 pipeline. It's used under the flag on Aarch64 and on RISCV we also suggest to turn off it by default, but this patch helps to make these optimization be useful more often, and remove some regressions that was found if turn these pass on on all test-suite. I'll provide test-suite results on Aarch64 platform with turned this pass with and without this patch. But yes, the main measurements were made for RISCV.
And if you mean that these changes should be done later in pipeline, there is the problem with the current instruction selection that can work only with one BB, so CodeGenPrepare pass need to sunk such GEPs with const to generate adddressing by offset, so I believe this pass was created as middle-end part.
llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp | ||
---|---|---|
1111 | I don't really like the idea to rename in separate NFC patch, because renaming is connected with changes that were made and the old name wasn't suitable any more | |
1357 | I understand your concerns, but I don't see a good solution here, because I don't want to make the unneeded actions for original version without checking profitability. | |
1365 |
I amn't sure we should optimize this case, because it's mostly impossible, because this list is always quite small. I'll think some more, but I amn't sure that the optimization here is more important than readability. | |
llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll | ||
86 ↗ | (On Diff #445498) | In assembly we use one more register to save the result of new generated GEP instruction, bt we have no profit because registers that are used by adds are also needed as far as these values are used in other instructions. |
286 ↗ | (On Diff #445498) | This code is bigger on IR, and it's so becuase of repeating sext opertaions, but sext isn't so critical in assembly, at the same time pass generates 2 new GEP instructions that are used as base and we need registers for them |
- [NFC][SeparateConstOffsetFromGEP] Small refactoring and reformatting
- [SeparateConstOffsetFromGEPPass] Added optional modification strategy
- Review fixes (part 1)
- [SeparateConstOffsetFromGEPPass] Added optional modification strategy
- Review fixes (part 1)
Found some bugs, some style proposals as well. The general point still holds. If the patch is purposed to reduce register pressure on some platform, please provide a test which shows that this actually happens. This can only be shown on a llc test.
llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp | ||
---|---|---|
247 | This comment is obsolete now, Extract does not have these new parameters. | |
272 | And if V is not binop, should it change? | |
359 | This is usually called BasePointer in other parts of optimizer. | |
360 | Shouldn't %b also be a part of it? Or where does it go? | |
385 | APInt? Just to make sure this doesn't overflow. | |
386 | Naturaly I'd expect this to be SmallVector<const ConstantInt *>, but the code below suggests there might not be constants. Misleading name? | |
541 | Use DenseMapInfo<Value *>::getTombstoneKey() and same above | |
545 | Why PreviousIndices size but not contents? | |
694 | No { } | |
695 | undef and poison are constants but not ConstantInt. Are you OK with them? | |
1365 | "Mostly impossible" means "possible". We generally bail out on non-linear algorithms with some thresholds. This could also be the case here. | |
llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll | ||
286 ↗ | (On Diff #445498) | Then please provide a llc test that demonstrates a positive change. The fact that "sext isn't so critical" is a way not obvious to me. Filling the upper part of the registry may sometimes be an extra operation. |
Sorry for delay. Looked more on different benchmarks from test-suite during searching a good test case. There are such cases. But a deep exploration shows that SeparateConstOffsetFromGEP pass isn't the main reason, it produces better IR, but in some cases later passes can make it worse and cause worse asssembly code. So hacks I have made in the current pass as workarounds for these particular cases don't seem to be the proper decision. As far as this pass isn't the main reason of regressions we got, I decided to abandon this review.
Thank you very much for review and sorry to bother you.
Can this go as a separate NFC?