Page MenuHomePhabricator

[DON'T MERGE] Jump-threading for finite state automata
Needs ReviewPublic

Authored by eastig on Sep 25 2020, 7:54 AM.

Details

Summary

This is an implementation of jump-threading to optimise switch-based finite state automata.

Diff Detail

Unit TestsFailed

TimeTest
220 mslinux > LLVM.CodeGen/AMDGPU::opt-pipeline.ll
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/bin/opt -O0 -mtriple=amdgcn--amdhsa -disable-output -disable-verify -debug-pass=Structure /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck -check-prefix=GCN-O0 /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll
120 mslinux > LLVM.Other::new-pm-defaults.ll
Script: -- : 'RUN: at line 10'; /mnt/disks/ssd0/agent/llvm-project/build/bin/opt -disable-verify -debug-pass-manager -passes='default<O1>' -S /mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/new-pm-defaults.ll 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/new-pm-defaults.ll --check-prefix=CHECK-O --check-prefix=CHECK-O1 --check-prefix=CHECK-NOEXT
260 mslinux > LLVM.Other::new-pm-thinlto-defaults.ll
Script: -- : 'RUN: at line 11'; /mnt/disks/ssd0/agent/llvm-project/build/bin/opt -disable-verify -debug-pass-manager -passes='thinlto-pre-link<O1>,name-anon-globals' -S /mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/new-pm-thinlto-defaults.ll 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/new-pm-thinlto-defaults.ll --check-prefixes=CHECK-O,CHECK-O1,CHECK-PRELINK-O,CHECK-PRELINK-O-NODIS,CHECK-PRELINK-O1
100 mslinux > LLVM.Other::new-pm-thinlto-postlink-pgo-defaults.ll
Script: -- : 'RUN: at line 4'; /mnt/disks/ssd0/agent/llvm-project/build/bin/opt -disable-verify -debug-pass-manager -passes='thinlto<O1>' -S /mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll --check-prefixes=CHECK-O,CHECK-O1,CHECK-NOEXT --dump-input=fail
120 mslinux > LLVM.Other::new-pm-thinlto-postlink-samplepgo-defaults.ll
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/bin/opt -disable-verify -debug-pass-manager -pgo-kind=pgo-sample-use-pipeline -profile-file='/mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/Inputs/new-pm-thinlto-samplepgo-defaults.prof' -passes='thinlto<O1>' -S /mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll --check-prefixes=CHECK-O,CHECK-O1,CHECK-NOEXT --dump-input=fail
View Full Test Results (6 Failed)

Event Timeline

eastig created this revision.Sep 25 2020, 7:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 25 2020, 7:54 AM
eastig requested review of this revision.Sep 25 2020, 7:54 AM

This patch is not supposed to be used as is. It can be used for experiments and as a proof of concept. A version for reviewing will be submitted soon.

khchen added a subscriber: khchen.Wed, Sep 30, 8:14 AM
amehsan added a subscriber: amehsan.EditedSat, Oct 3, 7:14 PM

@alexey.zhikhar and I had a look at your patch. Some findings that might be interesting:

1- The IR from your patch is shorter than ours. We haven't checked carefully where the difference is, but there is a gap of about 10% (of the size of core_state_transition)
2- Our code from our current implementation is about 2%-3% faster on our platform.
3- While the main idea is exactly the same our implementation looks at all the control flow paths in the loop first and perform the analysis. Then we have a code gen process. Your approach is closer to the existing jump threading in that you make one change at a time.
4- As I said, we are making some significant changes in our implementation to improve the code quality (among other things), etc. So our new code that we will be able to share is not ready yet.

Thanks for sharing your code.

eastig added a comment.Mon, Oct 5, 1:55 AM

@alexey.zhikhar and I had a look at your patch. Some findings that might be interesting:

1- The IR from your patch is shorter than ours. We haven't checked carefully where the difference is, but there is a gap of about 10% (of the size of core_state_transition)
2- Our code from our current implementation is about 2%-3% faster on our platform.
3- While the main idea is exactly the same our implementation looks at all the control flow paths in the loop first and perform the analysis. Then we have a code gen process. Your approach is closer to the existing jump threading in that you make one change at a time.
4- As I said, we are making some significant changes in our implementation to improve the code quality (among other things), etc. So our new code that we will be able to share is not ready yet.

Thanks for sharing your code.

Hi Ehsan,
Thank you for your feedback.

The code was written in early 2015 before LLVM got improvements in DominatorTree and SSAUpdater. The biggest part of the code is to restore SSA form which is not perfect. Some sophisticated CFGs can cause issues.
The code has not been optimised for performance. I think it has opportunities for improvements.
The original idea I had was to support both SwitchInst and BranchInst. However, for BranchInst the implementation was very unstable. As the target was to optimize SwitchInst, support of BranchInst was dropped.

Did you consider to implement more Agressive Jump Threading instead of a new pass for specific use (FSMs)? Like following solutions:

eastig added a comment.Mon, Oct 5, 4:51 AM

Did you consider to implement more Agressive Jump Threading instead of a new pass for specific use (FSMs)? Like following solutions:

Do you mean to improve the existing LLVM Jump Threading pass?

Did you consider to implement more Agressive Jump Threading instead of a new pass for specific use (FSMs)? Like following solutions:

Do you mean to improve the existing LLVM Jump Threading pass?

Yes, either improve current one or write new more powerful JT pass.

Do you mean to improve the existing LLVM Jump Threading pass?

Yes, either improve current one or write new more powerful JT pass.

IMHO, the current LLVM JT pass is a lightweight JT. It tries to balance an amount of duplicated basic blocks and to preserve loops from becoming irreducible.
Making it powerful could break design principles and contracts the pass is based on. The Aggressive JT can easily duplicate a lot of basic blocks and make loops irreducible.
Also AJT benefits more from the analyze-then-transform approach rather than from iterative approach. I saw that no additional jumps were threaded after the first pass.
There are still unanswered questions regarding AJT: what applications benefit most from it (definitely FST), where to put it in the optimizations pipeline, what heuristics to use, etc.

All of this make questionable whether the current JT should be upgraded into AJT.

grandinj removed a subscriber: grandinj.Mon, Oct 5, 10:54 AM

Yeah, I agree that it is quite questionable how useful the AJT is. But JT for FSMs is definitely useful and should be added.

@eastig I tried this experimental patch in our system, and it definitely brings highly significant improvement to coremark. When do you expect to post a version for reviewing -- what will the differences be?

@amehsan If your jump-threading change can bring even more improvement, then I am definitely curious to see it and if it can be integrated with this in some way.

Thank you all for your contributions here!

@eastig I tried this experimental patch in our system, and it definitely brings highly significant improvement to coremark. When do you expect to post a version for reviewing -- what will the differences be?

Hi Alan,
Thank you for the feedback. I need to get an approval from my current employer to continue working on the patch. I'll keep you posted.

@eastig I tried this experimental patch in our system, and it definitely brings highly significant improvement to coremark. When do you expect to post a version for reviewing -- what will the differences be?

Hi Alan,
Thank you for the feedback. I need to get an approval from my current employer to continue working on the patch. I'll keep you posted.

Hi Alan,
Good news. I've got an approval. I cannot say when I post the updated version. I've just started a new job and are completely busy. I don't work anymore at Arm.
The differences will be:

  1. SSAUpdaterBulk will be used to remove custom code restoring SSA. This will reduce amount of code.
  2. The analyze-then-transform approach rather than the iterative approach. This allows to remove some crazy heuristics.
  3. SelectInstrs are unfolded only if needed.
  4. Some constants can be involved into cmp+br. As a result they cannot reach the switch. This causes copied basic blocks which are not needed.
  5. Basic blocks common for some threading opportunities will not be copied if possible.

@alanphipps

I am not sure whether our approach can be integrated, with this patch. The approaches are completely different. But I can write a summary of our algorithm and we can discuss the algorithm first. We are working on rewriting the code and we can post it when it is ready. Since @eastig has also switched to a similar approach (analyze-then-transform), it will be interesting to have a discussion about different algorithms.

ychen added a subscriber: ychen.Wed, Oct 21, 10:37 PM