Page MenuHomePhabricator

[Clang] Conversion of a switch table to a bitmap is not profitable for -Os and -Oz compilation
Needs ReviewPublic

Authored by ramred01 on Apr 11 2019, 2:21 PM.

Details

Reviewers
hans
lattner
Summary

When compiling for code size, conversion of a switch table into a bitmap is not profitable. The bitmap requires materializing a constant value of 32-bit or 64-bit width into a register, which usually requires 2 - 3 instructions followed by a either a left-shift followed by a right-shift _or_ a bit extract instruction to get the desired result. Whereas, indexing into a jump table would cost at most 2 instructions, plus the size of the jump table.

The bitmap is profitable when compiling for speed since it removes an indexed load, which is costly. But when compiling for size, it simply bloats the code without adding any real value for embedded targets.

The issue is pronounced when the switch result is a sub-word type (e.g., short or char). With word types, unless a double word type exists on the machine, this issue is not seen.

The following code demonstrates this behaviour:

short test(unsigned a) {

short t;
switch (a) {
case 0:

t = 500;
break;

case 1:

t = 200;
break;

case 2:

t = 17000;    
break;

default:

t = 0;
break;

}
return (t);
}

On AArch64, it uses 3 instructions to materialize 6 bytes of data and then performs a logical shift left followed by a logical shift right to get the result of the switch. Whereas, simply storing the data in a RO table and indexing into it would have generated two instructions to load the table address followed by one instruction to index into the table and load the result.

We fix the issue by adding a new Codegen option called -fno-switch-bitmap (and its converse -fswitch-bitmap) and make the generation of the Bitmap conditional. Then we modify the Clang FE to pass this switch when either the user explicitly passes this switch or the user is compiling for AArch64 with either -Os or -Oz. Currently we restrict this default option to AArch64 alone since we do not know which other architecture may benefit from this. Other architectures can also make this the default for -Os and -Oz once it is verified.

This is the clang part of the patch. The LLVM part of the patch is in the review ID D60584

Diff Detail

Event Timeline

ramred01 created this revision.Apr 11 2019, 2:21 PM
vsk added a subscriber: vsk.Apr 15 2019, 3:26 PM

Why shouldn't llvm simply pick the right behavior for -Os/-Oz automatically?

hans added a comment.Apr 16 2019, 4:56 AM

I've replied on the LLVM side of the patch, https://reviews.llvm.org/D60584