This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Support target specific isSuitableForJumpTable
Changes PlannedPublic

Authored by jaykang10 on Aug 31 2021, 2:21 AM.

Details

Summary

On TargetLowering::isSuitableForJumpTable, the minimum size is not checked. This patch checks it on AArch64's isSuitableForJumpTable.

We could want to set setMinimumJumpTableEntries in order to adjust the minimum size of jump table. At the beginning of findJumpTables, it is compared to the number of clusters without considering each cluster which might be optimized for adjacent cases with same destination block. I think it would be better to check the minimum size of jump table on isSuitableForJumpTable to consider the each cluster.

As you can see, this patch affects existing tests which are related to jump table on AArch64. If there were certain assumptions of the jump table size on AArch64, please let me know.

I have checked the benchmark scores and the scores are as below.

spec2017 Ofast	score improvement(%)
500.perlbench_r	1.120718061
502.gcc_r	0.21566066
505.mcf_r	-0.377692164
520.omnetpp_r	-0.033963362
523.xalancbmk_r	-0.710515296
525.x264_r	-0.009670442
531.deepsjeng_r	-0.151793551
541.leela_r	0.000412303
548.exchange2_r	-0.005073052
557.xz_r	0.14042818
	
spec2017 O3 lto	score improvement(%)
500.perlbench_r	0.499563704
502.gcc_r	1.243011689
505.mcf_r	-0.448723782
520.omnetpp_r	-0.343839249
523.xalancbmk_r	-0.857326043
525.x264_r	0.1447399
531.deepsjeng_r	0.271570876
541.leela_r	-0.008724735
548.exchange2_r	-0.004679532
557.xz_r	-0.131507228
	
spec2006 O3 lto	score improvement(%)
400.perlbench	0.606229456
401.bzip2	-0.855386753
403.gcc	0.674752856
429.mcf	-0.007256516
445.gobmk	-0.175565964
456.hmmer	0.225698132
458.sjeng	0.016811133
462.libquantum	-0.366765648
464.h264ref	1.29015527
471.omnetpp	1.148017212
473.astar	-0.114839198
483.xalancbmk	0.1115539

Tests: 778
Same hash: 313 (filtered out) 
Remaining: 465
Metric: exec_time

Program                                        results.org results.mod diff 
 test-suite...128UniformDivisor<__uint128_t>    12.37       12.37       0.0%
 test-suite...emCmp<3, GreaterThanZero, Mid>   2737.85     2737.85      0.0% 
 test-suite...emCmp<4, GreaterThanZero, Mid>   1634.42     1634.42      0.0% 
 test-suite...Cmp<4, GreaterThanZero, First>   1627.17     1627.17      0.0% 
 test-suite...mCmp<4, GreaterThanZero, None>   1633.03     1633.03      0.0% 
 test-suite..._MemCmp<4, LessThanZero, Last>   1504.30     1504.30      0.0% 
 test-suite...M_MemCmp<4, LessThanZero, Mid>   1510.02     1510.02      0.0% 
 test-suite...MemCmp<4, LessThanZero, First>   1503.89     1503.89      0.0% 
 test-suite..._MemCmp<4, LessThanZero, None>   1503.94     1503.94      0.0% 
 test-suite...est:BM_MemCmp<4, EqZero, Last>   953.13      953.13       0.0%
 test-suite...test:BM_MemCmp<4, EqZero, Mid>   994.46      994.46       0.0%
 test-suite...st:BM_MemCmp<4, EqZero, First>   953.40      953.40       0.0%
 test-suite...est:BM_MemCmp<4, EqZero, None>   957.15      957.15       0.0%
 test-suite...mCmp<3, GreaterThanZero, Last>   2883.71     2883.71      0.0% 
 test-suite...Cmp<3, GreaterThanZero, First>   2766.46     2766.46      0.0% 
 Geomean difference                                                     0.0%
         results.org    results.mod   diff 
count  465.000000     465.000000     465.0
mean   3433.740927    3433.740927    0.0  
std    33459.989197   33459.989197   0.0  
min    0.052810       0.052810       0.0
25%    25.266975      25.266975      0.0  
50%    334.556472     334.556472     0.0  
75%    1036.344130    1036.344130    0.0  
max    507694.300000  507694.300000  0.0

Diff Detail

Event Timeline

jaykang10 created this revision.Aug 31 2021, 2:21 AM
jaykang10 requested review of this revision.Aug 31 2021, 2:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2021, 2:21 AM
dmgreen added inline comments.Sep 1 2021, 3:44 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10753

Where did the number 16 come from? Just from benchmarks? Unless there are jump tables in hot areas of the code this is more likely to be measuring the knock-on effects of changing code alignment than anything actually to do with the jump tables. It just turns into noise that doesn't stand up over time as other changes get made by the compiler.

It would likely be best to at least have it as an option. Some of the tests below could then keep small testing jumptables.

llvm/test/CodeGen/AArch64/jump-table.ll
6

Any of these test checking for the structure of jump tables are probably best left testing jump tables, perhaps with an option to overwrite the default.

jaykang10 added inline comments.Sep 2 2021, 4:37 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10753

Thanks for comment @dmgreen

As you know, AArch64 target generates jump table as below.

// %bb.0:                               // %entry
	sub	w8, w0, #1
	cmp	w8, #3
	b.hi	.LBB0_6
// %bb.1:                               // %entry
	adrp	x9, .LJTI0_0
	add	x9, x9, :lo12:.LJTI0_0
	adr	x10, .LBB0_2
	ldrb	w11, [x9, x8]
	add	x10, x10, x11, lsl #2
	br	x10
...
.LJTI0_0:
	.byte	(.LBB0_2-.LBB0_2)>>2
	.byte	(.LBB0_3-.LBB0_2)>>2
	.byte	(.LBB0_4-.LBB0_2)>>2
	.byte	(.LBB0_5-.LBB0_2)>>2

Let's say we consider only the number of instructions rather than instruction cycle and latency information. There are usually 9 instructions for jump table. If we do not generate jump table, there are cmp and branch instructions per each case. It means that if there are switch instruction with 4 cases, jump table could be slower than the combination of cmp and branch instructions. For other case, if the cmp and branch instructions at the beginning are taken, the jump table could be slower. If we consider each architecture's cycle information, it will be more complex to decide the number...

At this moment, AArch64 target is using the default setMinimumJumpTableEntries(4) but it does not consider each cluster which might be optimized for adjacent cases with same destination block. It could be good to implement AArch64 specific isSuitableForJumpTable.

I agree with you. It will be better to have option to tune it.

Let me think about this patch more carefully.

llvm/test/CodeGen/AArch64/jump-table.ll
6

Yep, I agree with you.

jaykang10 planned changes to this revision.Sep 6 2021, 10:53 AM