This is an archive of the discontinued LLVM Phabricator instance.

New clang option -fno-plt to avoid PLT for external calls
ClosedPublic

Authored by tmsriram on Oct 18 2017, 9:47 PM.

Download Raw Diff

Details

Reviewers

rnk
davidxl

Commits

rG5c65148565d9: New clang option -fno-plt which avoids the PLT and lazy binding while making…
rC317605: New clang option -fno-plt which avoids the PLT and lazy binding while making…
rL317605: New clang option -fno-plt which avoids the PLT and lazy binding while making…

Summary

New clang option -fno-plt which avoids the PLT and lazy binding while making external calls.

GCC supports -fno-plt, https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00001.html. This patch adds this to clang which marks all externally defined functions with the "nolazybind" attribute. This LLVM patch skips the PLT for calls to these functions, https://reviews.llvm.org/D39065.

Diff Detail

Repository: rL LLVM

Event Timeline

tmsriram created this revision.Oct 18 2017, 9:47 PM

Ping.

No tests?

In D39079#904050, @lebedev.ri wrote:

No tests?

+1, there should be an -emit-llvm test in clang/test/CodeGen/.

lib/CodeGen/CGCall.cpp
1859 ↗	(On Diff #119543)	Remind me what happens when the definition is within the current DSO after linking. I seem to recall that the call through memory is 6 bytes and the direct pcrel call is 5 bytes, and the linker is required to know to rewrite the indirect call to a nop. Is that accurate?

lebedev.ri removed a subscriber: lebedev.ri.Oct 23 2017, 1:59 PM

Added test test/CodeGen/noplt.c

tmsriram added inline comments.Oct 23 2017, 3:38 PM

lib/CodeGen/CGCall.cpp
1859 ↗	(On Diff #119543)	That's accurate, the linker rewrites it with a nop equivalent.

lgtm

This revision is now accepted and ready to land.Oct 23 2017, 3:44 PM

hfinkel mentioned this in D38554: Fixed ppc32 function relocations in non-pic mode.Oct 23 2017, 5:03 PM

Noting that, as @vit9696 pointed out in D38554, this does not suppress uses of the PLT that occur from backend/optimizer-generated functions (e.g., calls into compiler-rt and similar).

In D39079#905353, @hfinkel wrote:

Noting that, as @vit9696 pointed out in D38554, this does not suppress uses of the PLT that occur from backend/optimizer-generated functions (e.g., calls into compiler-rt and similar).

Can I work on this as a follow-up?

Why again is this a good idea? This is an even worse hack than -Bsymbolic, the latter at least is visible in ELF header without code inspection. This is breaking core premises of ELF.

In D39079#905372, @joerg wrote:

Why again is this a good idea? This is an even worse hack than -Bsymbolic, the latter at least is visible in ELF header without code inspection. This is breaking core premises of ELF.

Could you elaborate a bit more on what ELF promises this is breaking? I haven't fully read through D38554 yet.

In D39079#905371, @tmsriram wrote:

In D39079#905353, @hfinkel wrote:

Noting that, as @vit9696 pointed out in D38554, this does not suppress uses of the PLT that occur from backend/optimizer-generated functions (e.g., calls into compiler-rt and similar).

Can I work on this as a follow-up?

I have no objection to that, although adding a mechanism to fix this (which I imagine would be an attribute tied to the caller, not the callee, would probably end up replacing this mechanism).

Let me phrase it differently. What is this patch (and the matching backend PR) supposed to achieve? There are effectively two ways to get rid of PLT entries:
(1) Bind references locally. This is effectively what -Bsymbolic does and what is breaking the ELF interposition rules.
(2) Do an indirect call via the GOT. Requires knowing what an external symbol is, making it non-attractive for anything but LTO, since it will create performance issues for all non-local accesses (i.e. anything private).

In D39079#905372, @joerg wrote:

Why again is this a good idea?

It saves the direct jump to the PLT, reducing icache pressure, which is a major cost in some workloads.

This is an even worse hack than -Bsymbolic,

Personally, I would like to build LLVM with -Bsymbolic so that we can build LLVM as a DSO and load it from clang without regressing startup time, so I don't see what's so terrible about -Bsymbolic, especially for C++ programs.

the latter at least is visible in ELF header without code inspection. This is breaking core premises of ELF.

What are you talking about?

Anyway, LLVM already has an attribute, nonlazybind, and this just provides a flag to apply it to all declarations. It gives the user access to the GOTPCREL relocations that we, and loaders, already support.

In D39079#905395, @joerg wrote:

Let me phrase it differently. What is this patch (and the matching backend PR) supposed to achieve? There are effectively two ways to get rid of PLT entries:
(1) Bind references locally. This is effectively what -Bsymbolic does and what is breaking the ELF interposition rules.
(2) Do an indirect call via the GOT. Requires knowing what an external symbol is, making it non-attractive for anything but LTO, since it will create performance issues for all non-local accesses (i.e. anything private).

This patch does 2. According to @tmsriram, clever linkers can turn the indirect call back into a nop+call_pcrel32. If this isn't universal, the user must know what their linker supports. I don't see how it causes performance issues for non-local calls, since the PLT will do a jump through the GOT anyway.

what is breaking the ELF interposition rules

Frankly, in retrospect, ELF's interposition rules (or at least the defaults for those rules), seem suboptimal on several fronts. While breaking them has some unfortunate consistency implications, I don't consider breaking them a bad thing.

In D39079#905396, @rnk wrote:

In D39079#905372, @joerg wrote:

Why again is this a good idea?

It saves the direct jump to the PLT, reducing icache pressure, which is a major cost in some workloads.

It also increases the pressure on the branch predictor, so it is not really black and white.

This is an even worse hack than -Bsymbolic,

Personally, I would like to build LLVM with -Bsymbolic so that we can build LLVM as a DSO and load it from clang without regressing startup time, so I don't see what's so terrible about -Bsymbolic, especially for C++ programs.

Qt5 tries that. Requires further hacks as the main binary must be compiled as fully position independent code to not run into fun latter. Fun with copy relocations is only part of it.

Anyway, LLVM already has an attribute, nonlazybind, and this just provides a flag to apply it to all declarations. It gives the user access to the GOTPCREL relocations that we, and loaders, already support.

The loader doesn't see GOTPCREL anymore. It also requires a linker that disassembles instructions, because it can't distinguish between a normal pointer load and a call, to be able to optimize it.

In D39079#905423, @rnk wrote:

In D39079#905395, @joerg wrote:

Let me phrase it differently. What is this patch (and the matching backend PR) supposed to achieve? There are effectively two ways to get rid of PLT entries:
(1) Bind references locally. This is effectively what -Bsymbolic does and what is breaking the ELF interposition rules.
(2) Do an indirect call via the GOT. Requires knowing what an external symbol is, making it non-attractive for anything but LTO, since it will create performance issues for all non-local accesses (i.e. anything private).

This patch does 2. According to @tmsriram, clever linkers can turn the indirect call back into a nop+call_pcrel32. If this isn't universal, the user must know what their linker supports. I don't see how it causes performance issues for non-local calls, since the PLT will do a jump through the GOT anyway.

Yes, please see this for GOLD linkers: https://sourceware.org/ml/binutils/2016-05/msg00322.html

In D39079#905454, @joerg wrote:

In D39079#905396, @rnk wrote:

In D39079#905372, @joerg wrote:

Why again is this a good idea?

It saves the direct jump to the PLT, reducing icache pressure, which is a major cost in some workloads.

It also increases the pressure on the branch predictor, so it is not really black and white.

My experiments show that doing this improves performance of some our large workloads by upto 1% and it happens with a reduction in iTLB misses.

This is an even worse hack than -Bsymbolic,

Personally, I would like to build LLVM with -Bsymbolic so that we can build LLVM as a DSO and load it from clang without regressing startup time, so I don't see what's so terrible about -Bsymbolic, especially for C++ programs.

Qt5 tries that. Requires further hacks as the main binary must be compiled as fully position independent code to not run into fun latter. Fun with copy relocations is only part of it.

Anyway, LLVM already has an attribute, nonlazybind, and this just provides a flag to apply it to all declarations. It gives the user access to the GOTPCREL relocations that we, and loaders, already support.

The loader doesn't see GOTPCREL anymore. It also requires a linker that disassembles instructions, because it can't distinguish between a normal pointer load and a call, to be able to optimize it.

The linker can replace indirect calls via GOTPCREL with direct calls, both GOLD and BFD linker support this today.

In D39079#905454, @joerg wrote:

It also increases the pressure on the branch predictor, so it is not really black and white.

I don't understand this objection. I'm assuming that the PLT stub is an indirect jump through the PLTGOT, not a hotpatched stub that jumps directly to the definition chosen by the loader. This is the ELF model that I'm familiar with, especially since calls to code more than 2GB away generally need to be indirect anyway.

Qt5 tries that. Requires further hacks as the main binary must be compiled as fully position independent code to not run into fun latter. Fun with copy relocations is only part of it.

I'm not sure I understand, but this patch isn't introducing copy relocations, to be clear.

The loader doesn't see GOTPCREL anymore. It also requires a linker that disassembles instructions, because it can't distinguish between a normal pointer load and a call, to be able to optimize it.

Well, yes. The user needs to know that they have an x86-encoding-aware linker, or using this flag is probably going to slow their code down. From my perspective, this is a performance tuning flag, so that's reasonable.

In D39079#905468, @rnk wrote:

In D39079#905454, @joerg wrote:

It also increases the pressure on the branch predictor, so it is not really black and white.

I don't understand this objection. I'm assuming that the PLT stub is an indirect jump through the PLTGOT, not a hotpatched stub that jumps directly to the definition chosen by the loader. This is the ELF model that I'm familiar with, especially since calls to code more than 2GB away generally need to be indirect anyway.

Yes, this is correct. A PLT stub for x86_64 looks like this:

jmpq   *0x2ada(%rip)        # 403000 <_GLOBAL_OFFSET_TABLE_+0x18>
pushq  $0x0
jmpq   400510 <_init+0x30>

It has three instructions and the last two are only useful if lazy binding is done. With early binding, the last two instructions is dead code. What this patch does is to take that first instruction and put it at the point where the call is made to the PLT, that's it. Really, with early binding, the PLT stub is a completely redundant piece of code. I can't see how you argue with this.

Qt5 tries that. Requires further hacks as the main binary must be compiled as fully position independent code to not run into fun latter. Fun with copy relocations is only part of it.

I'm not sure I understand, but this patch isn't introducing copy relocations, to be clear.

The loader doesn't see GOTPCREL anymore. It also requires a linker that disassembles instructions, because it can't distinguish between a normal pointer load and a call, to be able to optimize it.

Well, yes. The user needs to know that they have an x86-encoding-aware linker, or using this flag is probably going to slow their code down. From my perspective, this is a performance tuning flag, so that's reasonable.

In D39079#905468, @rnk wrote:

In D39079#905454, @joerg wrote:

It also increases the pressure on the branch predictor, so it is not really black and white.

I don't understand this objection. I'm assuming that the PLT stub is an indirect jump through the PLTGOT,
not a hotpatched stub that jumps directly to the definition chosen by the loader. This is the ELF model
that I'm familiar with, especially since calls to code more than 2GB away generally need to be indirect anyway.

Correct, so all local calls to the same function go via the same location and share the predication of the indirect jump.

Qt5 tries that. Requires further hacks as the main binary must be compiled as fully position independent code to not run into fun latter. Fun with copy relocations is only part of it.

I'm not sure I understand, but this patch isn't introducing copy relocations, to be clear.

That was in reference to using it for clang.

In D39079#905519, @joerg wrote:

In D39079#905468, @rnk wrote:

In D39079#905454, @joerg wrote:

It also increases the pressure on the branch predictor, so it is not really black and white.

I don't understand this objection. I'm assuming that the PLT stub is an indirect jump through the PLTGOT,
not a hotpatched stub that jumps directly to the definition chosen by the loader. This is the ELF model
that I'm familiar with, especially since calls to code more than 2GB away generally need to be indirect anyway.

Correct, so all local calls to the same function go via the same location and share the predication of the indirect jump.

Weigh this against a .plt that is far away from the actual calls, and is causing itlb pressure and is clearly improving itlb behavior by eliminating it on our larger workloads. This is an optimization feature and if this does not improve performance in your case, do not enable it.

Qt5 tries that. Requires further hacks as the main binary must be compiled as fully position independent code to not run into fun latter. Fun with copy relocations is only part of it.

I'm not sure I understand, but this patch isn't introducing copy relocations, to be clear.

That was in reference to using it for clang.

rnk added a reviewer: ruiu.Oct 25 2017, 12:38 PM

rnk removed a reviewer: ruiu.

rnk added a subscriber: ruiu.

Ping. How do we take this forward?

I'd also like to highlight that LLVM has many users with different needs. Sometimes we need to "disagree and commit". Flags are one of the primary ways that we have to do that.

Closed by commit rL317605: New clang option -fno-plt which avoids the PLT and lazy binding while making… (authored by tmsriram). · Explain WhyNov 7 2017, 11:38 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

cfe/

trunk/

include/

clang/

Driver/

Options.td

4 lines

Frontend/

CodeGenOptions.def

2 lines

lib/

CodeGen/

CGCall.cpp

10 lines

Driver/

ToolChains/

Clang.cpp

4 lines

Frontend/

CompilerInvocation.cpp

1 line

test/

CodeGen/

noplt.c

9 lines

Diff 121947

cfe/trunk/include/clang/Driver/Options.td

Show First 20 Lines • Show All 1,378 Lines • ▼ Show 20 Lines	def fpascal_strings : Flag<["-"], "fpascal-strings">, Group<f_Group>, Flags<[CC1Option]>,
HelpText<"Recognize and construct Pascal-style string literals">;		HelpText<"Recognize and construct Pascal-style string literals">;
def fpcc_struct_return : Flag<["-"], "fpcc-struct-return">, Group<f_Group>, Flags<[CC1Option]>,		def fpcc_struct_return : Flag<["-"], "fpcc-struct-return">, Group<f_Group>, Flags<[CC1Option]>,
HelpText<"Override the default ABI to return all structs on the stack">;		HelpText<"Override the default ABI to return all structs on the stack">;
def fpch_preprocess : Flag<["-"], "fpch-preprocess">, Group<f_Group>;		def fpch_preprocess : Flag<["-"], "fpch-preprocess">, Group<f_Group>;
def fpic : Flag<["-"], "fpic">, Group<f_Group>;		def fpic : Flag<["-"], "fpic">, Group<f_Group>;
def fno_pic : Flag<["-"], "fno-pic">, Group<f_Group>;		def fno_pic : Flag<["-"], "fno-pic">, Group<f_Group>;
def fpie : Flag<["-"], "fpie">, Group<f_Group>;		def fpie : Flag<["-"], "fpie">, Group<f_Group>;
def fno_pie : Flag<["-"], "fno-pie">, Group<f_Group>;		def fno_pie : Flag<["-"], "fno-pie">, Group<f_Group>;
		def fplt : Flag<["-"], "fplt">, Group<f_Group>, Flags<[CC1Option]>,
		HelpText<"Use the PLT to make function calls">;
		def fno_plt : Flag<["-"], "fno-plt">, Group<f_Group>, Flags<[CC1Option]>,
		HelpText<"Do not use the PLT to make function calls">;
def fropi : Flag<["-"], "fropi">, Group<f_Group>;		def fropi : Flag<["-"], "fropi">, Group<f_Group>;
def fno_ropi : Flag<["-"], "fno-ropi">, Group<f_Group>;		def fno_ropi : Flag<["-"], "fno-ropi">, Group<f_Group>;
def frwpi : Flag<["-"], "frwpi">, Group<f_Group>;		def frwpi : Flag<["-"], "frwpi">, Group<f_Group>;
def fno_rwpi : Flag<["-"], "fno-rwpi">, Group<f_Group>;		def fno_rwpi : Flag<["-"], "fno-rwpi">, Group<f_Group>;
def fplugin_EQ : Joined<["-"], "fplugin=">, Group<f_Group>, Flags<[DriverOption]>, MetaVarName<"<dsopath>">,		def fplugin_EQ : Joined<["-"], "fplugin=">, Group<f_Group>, Flags<[DriverOption]>, MetaVarName<"<dsopath>">,
HelpText<"Load the named plugin (dynamic shared object)">;		HelpText<"Load the named plugin (dynamic shared object)">;
def fpreserve_as_comments : Flag<["-"], "fpreserve-as-comments">, Group<f_Group>;		def fpreserve_as_comments : Flag<["-"], "fpreserve-as-comments">, Group<f_Group>;
def fno_preserve_as_comments : Flag<["-"], "fno-preserve-as-comments">, Group<f_Group>, Flags<[CC1Option]>,		def fno_preserve_as_comments : Flag<["-"], "fno-preserve-as-comments">, Group<f_Group>, Flags<[CC1Option]>,
▲ Show 20 Lines • Show All 1,341 Lines • Show Last 20 Lines

cfe/trunk/include/clang/Frontend/CodeGenOptions.def

	Show First 20 Lines • Show All 291 Lines • ▼ Show 20 Lines
	CODEGENOPT(DebugInfoForProfiling, 1, 0)			CODEGENOPT(DebugInfoForProfiling, 1, 0)

	/// Whether 3-component vector type is preserved.			/// Whether 3-component vector type is preserved.
	CODEGENOPT(PreserveVec3Type, 1, 0)			CODEGENOPT(PreserveVec3Type, 1, 0)

	/// Whether to emit .debug_gnu_pubnames section instead of .debug_pubnames.			/// Whether to emit .debug_gnu_pubnames section instead of .debug_pubnames.
	CODEGENOPT(GnuPubnames, 1, 0)			CODEGENOPT(GnuPubnames, 1, 0)

				CODEGENOPT(NoPLT, 1, 0)

	#undef CODEGENOPT			#undef CODEGENOPT
	#undef ENUM_CODEGENOPT			#undef ENUM_CODEGENOPT
	#undef VALUE_CODEGENOPT			#undef VALUE_CODEGENOPT

cfe/trunk/lib/CodeGen/CGCall.cpp

Show First 20 Lines • Show All 1,849 Lines • ▼ Show 20 Lines	void CodeGenModule::ConstructAttributeList(
}		}

ConstructDefaultFnAttrList(Name, HasOptnone, AttrOnCallSite, FuncAttrs);		ConstructDefaultFnAttrList(Name, HasOptnone, AttrOnCallSite, FuncAttrs);

if (CodeGenOpts.EnableSegmentedStacks &&		if (CodeGenOpts.EnableSegmentedStacks &&
!(TargetDecl && TargetDecl->hasAttr<NoSplitStackAttr>()))		!(TargetDecl && TargetDecl->hasAttr<NoSplitStackAttr>()))
FuncAttrs.addAttribute("split-stack");		FuncAttrs.addAttribute("split-stack");

		// Add NonLazyBind attribute to function declarations when -fno-plt
		// is used.
		if (TargetDecl && CodeGenOpts.NoPLT) {
		if (auto *Fn = dyn_cast<FunctionDecl>(TargetDecl)) {
		if (!Fn->isDefined() && !AttrOnCallSite) {
		FuncAttrs.addAttribute(llvm::Attribute::NonLazyBind);
		}
		}
		}

if (!AttrOnCallSite) {		if (!AttrOnCallSite) {
bool DisableTailCalls =		bool DisableTailCalls =
CodeGenOpts.DisableTailCalls \|\|		CodeGenOpts.DisableTailCalls \|\|
(TargetDecl && (TargetDecl->hasAttr<DisableTailCallsAttr>() \|\|		(TargetDecl && (TargetDecl->hasAttr<DisableTailCallsAttr>() \|\|
TargetDecl->hasAttr<AnyX86InterruptAttr>()));		TargetDecl->hasAttr<AnyX86InterruptAttr>()));
FuncAttrs.addAttribute("disable-tail-calls",		FuncAttrs.addAttribute("disable-tail-calls",
llvm::toStringRef(DisableTailCalls));		llvm::toStringRef(DisableTailCalls));

▲ Show 20 Lines • Show All 2,528 Lines • Show Last 20 Lines

cfe/trunk/lib/Driver/ToolChains/Clang.cpp

Show First 20 Lines • Show All 3,417 Lines • ▼ Show 20 Lines	#endif
}		}

if (Args.hasFlag(options::OPT_mpie_copy_relocations,		if (Args.hasFlag(options::OPT_mpie_copy_relocations,
options::OPT_mno_pie_copy_relocations,		options::OPT_mno_pie_copy_relocations,
false)) {		false)) {
CmdArgs.push_back("-mpie-copy-relocations");		CmdArgs.push_back("-mpie-copy-relocations");
}		}

		if (Args.hasFlag(options::OPT_fno_plt, options::OPT_fplt, false)) {
		CmdArgs.push_back("-fno-plt");
		}

// -fhosted is default.		// -fhosted is default.
// TODO: Audit uses of KernelOrKext and see where it'd be more appropriate to		// TODO: Audit uses of KernelOrKext and see where it'd be more appropriate to
// use Freestanding.		// use Freestanding.
bool Freestanding =		bool Freestanding =
Args.hasFlag(options::OPT_ffreestanding, options::OPT_fhosted, false) \|\|		Args.hasFlag(options::OPT_ffreestanding, options::OPT_fhosted, false) \|\|
KernelOrKext;		KernelOrKext;
if (Freestanding)		if (Freestanding)
CmdArgs.push_back("-ffreestanding");		CmdArgs.push_back("-ffreestanding");
▲ Show 20 Lines • Show All 1,984 Lines • Show Last 20 Lines

cfe/trunk/lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 647 Lines • ▼ Show 20 Lines	static bool ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args, InputKind IK,
Opts.NoExecStack = Args.hasArg(OPT_mno_exec_stack);		Opts.NoExecStack = Args.hasArg(OPT_mno_exec_stack);
Opts.FatalWarnings = Args.hasArg(OPT_massembler_fatal_warnings);		Opts.FatalWarnings = Args.hasArg(OPT_massembler_fatal_warnings);
Opts.EnableSegmentedStacks = Args.hasArg(OPT_split_stacks);		Opts.EnableSegmentedStacks = Args.hasArg(OPT_split_stacks);
Opts.RelaxAll = Args.hasArg(OPT_mrelax_all);		Opts.RelaxAll = Args.hasArg(OPT_mrelax_all);
Opts.IncrementalLinkerCompatible =		Opts.IncrementalLinkerCompatible =
Args.hasArg(OPT_mincremental_linker_compatible);		Args.hasArg(OPT_mincremental_linker_compatible);
Opts.PIECopyRelocations =		Opts.PIECopyRelocations =
Args.hasArg(OPT_mpie_copy_relocations);		Args.hasArg(OPT_mpie_copy_relocations);
		Opts.NoPLT = Args.hasArg(OPT_fno_plt);
Opts.OmitLeafFramePointer = Args.hasArg(OPT_momit_leaf_frame_pointer);		Opts.OmitLeafFramePointer = Args.hasArg(OPT_momit_leaf_frame_pointer);
Opts.SaveTempLabels = Args.hasArg(OPT_msave_temp_labels);		Opts.SaveTempLabels = Args.hasArg(OPT_msave_temp_labels);
Opts.NoDwarfDirectoryAsm = Args.hasArg(OPT_fno_dwarf_directory_asm);		Opts.NoDwarfDirectoryAsm = Args.hasArg(OPT_fno_dwarf_directory_asm);
Opts.SoftFloat = Args.hasArg(OPT_msoft_float);		Opts.SoftFloat = Args.hasArg(OPT_msoft_float);
Opts.StrictEnums = Args.hasArg(OPT_fstrict_enums);		Opts.StrictEnums = Args.hasArg(OPT_fstrict_enums);
Opts.StrictReturn = !Args.hasArg(OPT_fno_strict_return);		Opts.StrictReturn = !Args.hasArg(OPT_fno_strict_return);
Opts.StrictVTablePointers = Args.hasArg(OPT_fstrict_vtable_pointers);		Opts.StrictVTablePointers = Args.hasArg(OPT_fstrict_vtable_pointers);
Opts.UnsafeFPMath = Args.hasArg(OPT_menable_unsafe_fp_math) \|\|		Opts.UnsafeFPMath = Args.hasArg(OPT_menable_unsafe_fp_math) \|\|
▲ Show 20 Lines • Show All 2,280 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/noplt.c

				// RUN: %clang_cc1 -emit-llvm -fno-plt %s -o - \| FileCheck %s -check-prefix=CHECK-NOPLT

				// CHECK-NOPLT: Function Attrs: nonlazybind
				// CHECK-NOPLT-NEXT: declare i32 @foo
				int foo();

				int bar() {
				return foo();
				}