When compiling for code size, conversion of a switch table into a bitmap is not profitable. The bitmap requires materializing a constant value of 32-bit or 64-bit width into a register, which usually requires 2 - 3 instructions followed by a either a left-shift followed by a right-shift _or_ a bit extract instruction to get the desired result. Whereas, indexing into a jump table would cost at most 2 instructions, plus the size of the jump table.
The bitmap is profitable when compiling for speed since it removes an indexed load, which is costly. But when compiling for size, it simply bloats the code without adding any real value for embedded targets.
The issue is pronounced when the switch result is a sub-word type (e.g., short or char). With word types, unless a double word type exists on the machine, this issue is not seen.
The following code demonstrates this behaviour:
short test(unsigned a) {
short t; switch (a) { case 0: t = 500; break; case 1: t = 200; break; case 2: t = 17000; break; default: t = 0; break; } return (t);
}
On AArch64, it uses 3 instructions to materialize 6 bytes of data and then performs a logical shift left followed by a logical shift right to get the result of the switch. Whereas, simply storing the data in a RO table and indexing into it would have generated two instructions to load the table address followed by one instruction to index into the table and load the result.
We fix the issue by adding a new Codegen option called -fno-switch-bitmap (and its converse -fswitch-bitmap) and make the generation of the Bitmap conditional. Then we modify the Clang FE to pass this switch when either the user explicitly passes this switch or the user is compiling for AArch64 with either -Os or -Oz. Currently we restrict this default option to AArch64 alone since we do not know which other architecture may benefit from this. Other architectures can also make this the default for -Os and -Oz once it is verified.
The clang patch is given in a separate revision.