Page MenuHomePhabricator

Conversion of a switch table to a bitmap is not profitable for -Os and -Oz compilation
Needs ReviewPublic

Authored by ramred01 on Apr 11 2019, 2:15 PM.

Details

Reviewers
hans
lattner
Summary

When compiling for code size, conversion of a switch table into a bitmap is not profitable. The bitmap requires materializing a constant value of 32-bit or 64-bit width into a register, which usually requires 2 - 3 instructions followed by a either a left-shift followed by a right-shift _or_ a bit extract instruction to get the desired result. Whereas, indexing into a jump table would cost at most 2 instructions, plus the size of the jump table.

The bitmap is profitable when compiling for speed since it removes an indexed load, which is costly. But when compiling for size, it simply bloats the code without adding any real value for embedded targets.

The issue is pronounced when the switch result is a sub-word type (e.g., short or char). With word types, unless a double word type exists on the machine, this issue is not seen.

The following code demonstrates this behaviour:

short test(unsigned a) {

short t;
switch (a) {
case 0:
  t = 500;
  break;
case 1:
  t = 200;
  break;
case 2:
  t = 17000;    
  break;
default:
  t = 0;
  break;
}
return (t);

}

On AArch64, it uses 3 instructions to materialize 6 bytes of data and then performs a logical shift left followed by a logical shift right to get the result of the switch. Whereas, simply storing the data in a RO table and indexing into it would have generated two instructions to load the table address followed by one instruction to index into the table and load the result.

We fix the issue by adding a new Codegen option called -fno-switch-bitmap (and its converse -fswitch-bitmap) and make the generation of the Bitmap conditional. Then we modify the Clang FE to pass this switch when either the user explicitly passes this switch or the user is compiling for AArch64 with either -Os or -Oz. Currently we restrict this default option to AArch64 alone since we do not know which other architecture may benefit from this. Other architectures can also make this the default for -Os and -Oz once it is verified.

The clang patch is given in a separate revision.

Diff Detail

Event Timeline

ramred01 created this revision.Apr 11 2019, 2:15 PM
asl added a subscriber: asl.Apr 11 2019, 3:24 PM

How many such switches are seen in e.g. LLVM test suite? Could you please share the size reduction statistics?

vsk added a subscriber: vsk.Apr 15 2019, 3:24 PM
hans added a comment.Apr 16 2019, 4:55 AM

First I was confused because your message mentions jump tables, but this is only about the switch-to-lookup table transformation in SimplifyCFG. So there are no jump tables involved, this is just about whether packing the lookup table into a "bitmap" scalar is a good idea or not.

Adding a -fno-switch-bitmap option doesn't seem like a great idea to me. Would it be possible to tweak the heuristic for when to emit the "bitmap lookup table" in some way instead? Currently we always do it if the table fits in a register. Maybe it should be more conservative, at least when optimizing for size, maybe depending on target?