This is an archive of the discontinued LLVM Phabricator instance.

Add -fswitch-tables and -fno-switch-tables flags
AbandonedPublic

Authored by sgundapa on Jul 18 2017, 1:15 PM.

Download Raw Diff

Details

Reviewers

kparzysz
hans
joerg
echristo
• ddunbar

Summary

These flags control the lowering of switch statements to either lookup tables
or jump tables.

Diff Detail

Event Timeline

This patch relies on https://reviews.llvm.org/D35577

sgundapa added a subscriber: cfe-commits.Jul 18 2017, 1:36 PM

What exactly is the difference between -fno-jump-tables, -fno-switch-tables, and -fno-lookup-tables?

The switch-case statements generate two kinds of tables.

Jump tables
Lookup tables.

While the general assumption is that switch-case statements generate jump tables, the below case generates a lookup table by late simplifycfg

int foo(int x) {

switch (x) {
case 0: return 9;
case 1: return 20;
case 2: return 14;
case 3: return 22;
case 4: return 12;
default: return 19;
}

}
generates a
@switch.table.foo = private unnamed_addr constant [5 x i32] [i32 9, i32 20, i32 14, i32 22, i32 12]
The lookup table is an array of return values as opposed to an array of pointers in jump table.

The "-fno-XXX-flags" disable the generation of these tables.
-fno-switch-tables implies both -fno-jump-tables and -fno-lookup-tables

In D35578#813548, @sgundapa wrote:
The switch-case statements generate two kinds of tables.

Jump tables

Lookup tables.

While the general assumption is that switch-case statements generate jump tables, the below case generates a lookup table by late simplifycfg

int foo(int x) {
switch (x) {
case 0: return 9;
case 1: return 20;
case 2: return 14;
case 3: return 22;
case 4: return 12;
default: return 19;
}
}
generates a
@switch.table.foo = private unnamed_addr constant [5 x i32] [i32 9, i32 20, i32 14, i32 22, i32 12]
The lookup table is an array of return values as opposed to an array of pointers in jump table.

IIRC, we lower switches to a combination of binaries trees, jump tables and some bitmagic when there are 3 or fewer cases in the subtree. Somewhat beside the point, but just a bit of clarification... ...but before lowering the switch, simplifycfg may come along and introduce a lookup table if the cases are returning a constant value.

The "-fno-XXX-flags" disable the generation of these tables.
-fno-switch-tables implies both -fno-jump-tables and -fno-lookup-tables

Okay, now I understand the differences between the flags.

What exactly is the motivation? I'm trying to narrow down the justification for adding yet more flags.

If the goal is fine-grained control over the heuristics for compiling switch statements, perhaps one should enumerate all the possible ways to lower switch statements --- jump-tables, lookup-tables, if-trees, if-chains, (more?) --- and add a separate flag for each of them.
...Although I'm not sure what purpose there'd really be in saying "This switch statement *must* be compiled into an if-else tree" or "this one *must* be a lookup table"; couldn't that end up being a pessimization one day, as the optimizer gets smarter?

Either way, it sounds like "-fno-switch-tables" is just a synonym for the (soon-to-be-)existing options "-fno-jump-tables -fno-lookup-tables" and therefore doesn't need to exist as a separate option.

Question: Is the intention really specifically about switch heuristics? Or are you trying to prevent the compiler from embedding data into the text section and/or taking computed jumps? Because Clang *can* generate a lookup table in .rodata even if the input code contains no "switch" constructs --- e.g. a long enough chain of "if"s will do the trick. https://godbolt.org/g/ZQqkaB

I'm also curious about the motivation for this.

LLVM backends can opt out of these kinds of tables if they're not suitable for the target, but why would a Clang user want to do it?

asl added a subscriber: asl.Jul 19 2017, 1:25 AM

In D35578#813629, @Quuxplusone wrote:

If the goal is fine-grained control over the heuristics for compiling switch statements, perhaps one should enumerate all the possible ways to lower switch statements --- jump-tables, lookup-tables, if-trees, if-chains, (more?) --- and add a separate flag for each of them.

In general, I would argue against such an approach without good justification. More is not always better as exposing such fine grain control is going to place a maintenance burden on the compiler developers with minimal improvement from the users perspective.

I will try to address the concerns here:

What exactly is the motivation? I'm trying to narrow down the justification for adding yet more flags.

(I just typed this message in D35579)
For backends with "tightly coupled memory", in scenarios where the data is far away from text pays a good amount of penalty in terms of latency.
Hexagon is one such backend. The tables (both lookup and jump) which are being generated are treated as globals with internal linkage and by default
will be placed in read only data.

Interestingly when programmers specify the command line flag "-fno-jump-tables", they assume there is no data that goes in to other sections.
In case of llvm, the attribute "no-jump-tables" has no effect on simplifyCFG which generates the lookup table. This leads me to introduce "no-lookup-tables"

Either way, it sounds like "-fno-switch-tables" is just a synonym for the (soon-to-be-)existing options "-fno-jump-tables -fno-lookup-tables" and therefore doesn't need to exist as a separate option

Ideally I want to rename fno-jump-tables to fno-switch-tables.

LLVM backends can opt out of these kinds of tables if they're not suitable for the target, but why would a Clang user want to do it?

Often TCM memory is small enough and this needs support for both cases(generate tables and do not generate tables)

In D35578#814590, @sgundapa wrote:

I will try to address the concerns here:

What exactly is the motivation? I'm trying to narrow down the justification for adding yet more flags.

(I just typed this message in D35579)
For backends with "tightly coupled memory", in scenarios where the data is far away from text pays a good amount of penalty in terms of latency.
Hexagon is one such backend. The tables (both lookup and jump) which are being generated are treated as globals with internal linkage and by default
will be placed in read only data.

Wouldn't the fix be to make the backend deal with this, then? Either by putting the table with the function text, or or opting out of lookup tables? It seems that might be a better experience for the user.

Wouldn't the fix be to make the backend deal with this, then? Either by putting the table with the function text, or or opting out of lookup tables? It seems that might be a better experience for the user.

That is perfectly reasonable and in fact i have committed a hexagon change recently to that effect . The llvm flag hexagon-emi-lookup-tables controls the generation of lookup table for hexagon.
The problem is, I don't want the users of the compiler to use a combination of front end and back end flags to get the desired result.
"-fno-jump-tables -mllvm -hexagon-emit-lookup-tables=false". This could be much neater with a "-fno-jump-tables -fno-lookup-tables" or better just "-fno-switch-tables"

In D35578#814817, @sgundapa wrote:

Wouldn't the fix be to make the backend deal with this, then? Either by putting the table with the function text, or or opting out of lookup tables? It seems that might be a better experience for the user.

That is perfectly reasonable and in fact i have committed a hexagon change recently to that effect . The llvm flag hexagon-emi-lookup-tables controls the generation of lookup table for hexagon.
The problem is, I don't want the users of the compiler to use a combination of front end and back end flags to get the desired result.

First, command-line flags (i.e., those predicated with -mllvm) were never designed to be customer facing. They have no guarantee that they will persist and are generally undocumented. They're designed to be used by those working on LLVM to simplify tuning/testing. Any other use is likely a misuse.

"-fno-jump-tables -mllvm -hexagon-emit-lookup-tables=false". This could be much neater with a "-fno-jump-tables -fno-lookup-tables" or better just "-fno-switch-tables"

I'm still very far from convinced adding new flags and proposing that users switch from using the long-standing -fno-jump-tables flag to the -fno-switch-tables flag is the right approach. Are you going to convince the GCC community to do the same?

It sounds like to me you have some cases when you do want lookup tables and other cases that you don't. What exactly is the determining factor here? Moreover, why can't this determining factor be built into the compiler so the user doesn't even have to bother. That would be a more ideal user experience.

As an alternative solution, why not just disable the transformation in SimplifyCFG when -fno-jump-tables is used? The underlying issue seems to be the same (i.e., you want to avoid generating more relocations) and AFAICT that's what -fno-jump-tables is all about.. (Admittedly, I don't know the full history of -fno-jump-tables, so others might disagree with this suggestion.)

In D35578#814919, @mcrosier wrote:

It sounds like to me you have some cases when you do want lookup tables and other cases that you don't. What exactly is the determining factor here?

The determining factor is whether the customers want it or not. Each case has its own specifics, which we do not want to hardcode into the compiler. At one point, the customers have reported to us that memory loads coming from switch expansion is an undesirable outcome. We want to provide them with an option to prevent that from happening.

Moreover, why can't this determining factor be built into the compiler so the user doesn't even have to bother. That would be a more ideal user experience.

Here is a use case : For the code that stays in TCM, the customer doesn't want the data that the code refers to be outside of TCM. As kparzysz mentioned, the loads cause a huge latency and is not intended. Disabling table generation is the right thing to do here. For code that stays in regular memory, generating tables is far more efficient than a bunch of if-elses.

As an alternative solution, why not just disable the transformation in SimplifyCFG when -fno-jump-tables is used? The underlying issue seems to be the same (i.e., you want to avoid generating more relocations) and AFAICT that's what -fno-jump-tables is all about.. (Admittedly, I don't know the full history of -fno-jump-tables, so others might disagree with this suggestion.)

Jump tables are not supported by all targets but lookup tables are. Jump tables need indirect addressing mode where as a lookup table is just an array of values.

This is from "man gcc"
-fno-jump-tables

Do not use jump tables for switch statements even where it would be more efficient than other code generation strategies.  This option is of use in conjunction with -fpic or -fPIC for building code that forms part of a
dynamic linker and cannot reference the address of a jump table.  On some targets, jump tables do not require a GOT and this option is not needed.

This will throw some background on why this option was introduced.

Adding @echristo and @ddunbar who have been the primary owners of the driver for the past decade or so.

Refer to https://reviews.llvm.org/D35577 as we decided to disable lookup tables under -fno-jump-tables

Revision Contents

Path

Size

include/

clang/

Driver/

Options.td

3 lines

lib/

Driver/

ToolChains/

Clang.cpp

21 lines

Frontend/

CompilerInvocation.cpp

6 lines

test/

CodeGen/

nouseswitchtable.c

8 lines

Diff 107164

include/clang/Driver/Options.td

	Show First 20 Lines • Show All 789 Lines • ▼ Show 20 Lines
	def fsignaling_math : Flag<["-"], "fsignaling-math">, Group<f_Group>;			def fsignaling_math : Flag<["-"], "fsignaling-math">, Group<f_Group>;
	def fno_signaling_math : Flag<["-"], "fno-signaling-math">, Group<f_Group>;			def fno_signaling_math : Flag<["-"], "fno-signaling-math">, Group<f_Group>;
	def fjump_tables : Flag<["-"], "fjump-tables">, Group<f_Group>;			def fjump_tables : Flag<["-"], "fjump-tables">, Group<f_Group>;
	def fno_jump_tables : Flag<["-"], "fno-jump-tables">, Group<f_Group>, Flags<[CC1Option]>,			def fno_jump_tables : Flag<["-"], "fno-jump-tables">, Group<f_Group>, Flags<[CC1Option]>,
	HelpText<"Do not use jump tables for lowering switches">;			HelpText<"Do not use jump tables for lowering switches">;
	def flookup_tables : Flag<["-"], "flookup-tables">, Group<f_Group>;			def flookup_tables : Flag<["-"], "flookup-tables">, Group<f_Group>;
	def fno_lookup_tables : Flag<["-"], "fno-lookup-tables">, Group<f_Group>, Flags<[CC1Option]>,			def fno_lookup_tables : Flag<["-"], "fno-lookup-tables">, Group<f_Group>, Flags<[CC1Option]>,
	HelpText<"Do not use lookup tables for lowering switches">;			HelpText<"Do not use lookup tables for lowering switches">;
				def fswitch_tables : Flag<["-"], "fswitch-tables">, Group<f_Group>;
				def fno_switch_tables : Flag<["-"], "fno-switch-tables">, Group<f_Group>, Flags<[CC1Option]>,
				HelpText<"Do not use jump tables and lookup tables for lowering switches">;

	// Begin sanitizer flags. These should all be core options exposed in all driver			// Begin sanitizer flags. These should all be core options exposed in all driver
	// modes.			// modes.
	let Flags = [CC1Option, CoreOption] in {			let Flags = [CC1Option, CoreOption] in {

	def fsanitize_EQ : CommaJoined<["-"], "fsanitize=">, Group<f_clang_Group>,			def fsanitize_EQ : CommaJoined<["-"], "fsanitize=">, Group<f_clang_Group>,
	MetaVarName<"<check>">,			MetaVarName<"<check>">,
	HelpText<"Turn on runtime checks for various forms of undefined "			HelpText<"Turn on runtime checks for various forms of undefined "
	▲ Show 20 Lines • Show All 1,824 Lines • Show Last 20 Lines

lib/Driver/ToolChains/Clang.cpp

Show First 20 Lines • Show All 2,256 Lines • ▼ Show 20 Lines	#endif

if (Arg *A = Args.getLastArg(options::OPT_Wframe_larger_than_EQ)) {		if (Arg *A = Args.getLastArg(options::OPT_Wframe_larger_than_EQ)) {
StringRef v = A->getValue();		StringRef v = A->getValue();
CmdArgs.push_back("-mllvm");		CmdArgs.push_back("-mllvm");
CmdArgs.push_back(Args.MakeArgString("-warn-stack-size=" + v));		CmdArgs.push_back(Args.MakeArgString("-warn-stack-size=" + v));
A->claim();		A->claim();
}		}

if (!Args.hasFlag(options::OPT_fjump_tables, options::OPT_fno_jump_tables,		// fno-switch-tables disables the generation of both lookup and jump tables.
true))		if (Arg *A = Args.getLastArg(
		options::OPT_fjump_tables, options::OPT_fno_jump_tables,
		options::OPT_fswitch_tables, options::OPT_fno_switch_tables)) {
		if (A->getOption().matches(options::OPT_fno_jump_tables) \|\|
		A->getOption().matches(options::OPT_fno_switch_tables))
CmdArgs.push_back("-fno-jump-tables");		CmdArgs.push_back("-fno-jump-tables");
		}

if (!Args.hasFlag(options::OPT_flookup_tables, options::OPT_fno_lookup_tables,		if (Arg *A = Args.getLastArg(
true))		options::OPT_flookup_tables, options::OPT_fno_lookup_tables,
		options::OPT_fswitch_tables, options::OPT_fno_switch_tables)) {
		if (A->getOption().matches(options::OPT_fno_lookup_tables) \|\|
		A->getOption().matches(options::OPT_fno_switch_tables))
CmdArgs.push_back("-fno-lookup-tables");		CmdArgs.push_back("-fno-lookup-tables");
		}

if (!Args.hasFlag(options::OPT_fpreserve_as_comments,		if (!Args.hasFlag(options::OPT_fpreserve_as_comments,
options::OPT_fno_preserve_as_comments, true))		options::OPT_fno_preserve_as_comments, true))
CmdArgs.push_back("-fno-preserve-as-comments");		CmdArgs.push_back("-fno-preserve-as-comments");

if (Arg *A = Args.getLastArg(options::OPT_mregparm_EQ)) {		if (Arg *A = Args.getLastArg(options::OPT_mregparm_EQ)) {
CmdArgs.push_back("-mregparm");		CmdArgs.push_back("-mregparm");
CmdArgs.push_back(A->getValue());		CmdArgs.push_back(A->getValue());
▲ Show 20 Lines • Show All 2,993 Lines • Show Last 20 Lines

lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 644 Lines • ▼ Show 20 Lines	Opts.FunctionSections = Args.hasFlag(OPT_ffunction_sections,
OPT_fno_function_sections, false);		OPT_fno_function_sections, false);
Opts.DataSections = Args.hasFlag(OPT_fdata_sections,		Opts.DataSections = Args.hasFlag(OPT_fdata_sections,
OPT_fno_data_sections, false);		OPT_fno_data_sections, false);
Opts.UniqueSectionNames = Args.hasFlag(OPT_funique_section_names,		Opts.UniqueSectionNames = Args.hasFlag(OPT_funique_section_names,
OPT_fno_unique_section_names, true);		OPT_fno_unique_section_names, true);

Opts.MergeFunctions = Args.hasArg(OPT_fmerge_functions);		Opts.MergeFunctions = Args.hasArg(OPT_fmerge_functions);

Opts.NoUseJumpTables = Args.hasArg(OPT_fno_jump_tables);		Opts.NoUseJumpTables =
		(Args.hasArg(OPT_fno_jump_tables) \|\| Args.hasArg(OPT_fno_switch_tables));

Opts.NoUseLookupTables = Args.hasArg(OPT_fno_lookup_tables);		Opts.NoUseLookupTables = (Args.hasArg(OPT_fno_lookup_tables) \|\|
		Args.hasArg(OPT_fno_switch_tables));

Opts.PrepareForLTO = Args.hasArg(OPT_flto, OPT_flto_EQ);		Opts.PrepareForLTO = Args.hasArg(OPT_flto, OPT_flto_EQ);
Opts.EmitSummaryIndex = false;		Opts.EmitSummaryIndex = false;
if (Arg *A = Args.getLastArg(OPT_flto_EQ)) {		if (Arg *A = Args.getLastArg(OPT_flto_EQ)) {
StringRef S = A->getValue();		StringRef S = A->getValue();
if (S == "thin")		if (S == "thin")
Opts.EmitSummaryIndex = true;		Opts.EmitSummaryIndex = true;
else if (S != "full")		else if (S != "full")
▲ Show 20 Lines • Show All 2,189 Lines • Show Last 20 Lines

test/CodeGen/nouseswitchtable.c

This file was added.

				// RUN: %clang_cc1 -S -fno-switch-tables %s -emit-llvm -o - \| FileCheck %s

				// CHECK-LABEL: main
				// CHECK: attributes #0 = {{.}}"no-jump-tables"="true"{{.}}"no-lookup-tables"="true"

				int main() {
				return 0;
				}