This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Passes/
-
lib/
-
Passes/
1
PassBuilderPipelines.cpp

Differential D142725

[DFAJumpThreading] Enable DFAJT with LTO
AcceptedPublic

Authored by martong on Jan 27 2023, 7:43 AM.

Download Raw Diff

Details

Reviewers

alexey.zhikhar
SjoerdMeijer
efriedma
xbolva00
amehsan
fhahn

Summary

This patch adds the DFAJumpThreading pass into the LTO default pipeline
after the peephole optimization passes. This increases the Coremark
score when LTO is used. I've measured an increase of the score with
roughly 11% on Linux-x86_64 (and we've experienced a similar growth in
case of different RISCV architectures at Codasip).

Here is the primitive Makefile I used with the latest Coremark:

SOURCES=core_list_join.c core_main.c core_matrix.c core_state.c core_util.c posix/core_portme.c
DEFINITIONS=-DHAS_PRINTF=1 -DHAS_STDIO=1 -DTIMES=1 -DITERATIONS=1

CC=/home/gabor.marton/WORK/llvm2/build/release/bin/clang
OBJS=$(SOURCES:%.c=build_x86_64/%.o)

CFLAGS=-I./posix -I. -DCOMPILER_FLAGS=\"N/A\" \
-O3\
-flto\

LDFLAGS=-O3\
-flto\
-Wl,--plugin-opt=-enable-dfa-jump-thread\

build_x86_64/coremark.xexe: $(OBJS)
	$(CC) -o $@ $(OBJS) \
	 $(LDFLAGS)

build_x86_64/%.o: %.c
	mkdir -p build_x86_64
	mkdir -p build_x86_64/posix
	$(CC) -c -o $@ $< $(CFLAGS) $(DEFINITIONS)

Co-author: Martin Ministr (Codasip.com)
Co-author: Pavel Snobl (Codasip.com)

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,060 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
	60,070 ms	x64 debian > libFuzzer.libFuzzer::minimize_crash.test
	60,040 ms	x64 debian > libFuzzer.libFuzzer::value-profile-load.test

Event Timeline

martong created this revision.Jan 27 2023, 7:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2023, 7:43 AM

Herald added subscribers: steakhal, luismarques, pengfei and 10 others. · View Herald Transcript

martong requested review of this revision.Jan 27 2023, 7:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2023, 7:43 AM

Herald added subscribers: llvm-commits, • pcwang-thead. · View Herald Transcript

Thanks for the patch.

Question: why LTO? I'm no longer with the company where I worked on this and my recollections are starting to get fuzzy but AFAIR this optimization doesn't need the whole program. If so, you'll want to place this opt into the regular pipeline, where compilation of different translation units is parallelized.

LTO is not needed, but this patch adds this pass to the list of LTO passes - reasonable (maybe you should update some tests?)

This revision is now accepted and ready to land.Jan 27 2023, 8:24 AM

In D142725#4085822, @alexey.zhikhar wrote:

Thanks for the patch.

Question: why LTO? I'm no longer with the company where I worked on this and my recollections are starting to get fuzzy but AFAIR this optimization doesn't need the whole program. If so, you'll want to place this opt into the regular pipeline, where compilation of different translation units is parallelized.

Good question. We experienced that some of our users turn on LTO as the first step and then they experiment (and measure) by adding new extra flags via the linker. I.e. they want whole program optimization from the beginning. For instance, first iteration:

CFLAGS=...
LDFLAGS=-O3 -flto

Second iteration:

CFLAGS=...
LDFLAGS=-O3 -flto -Wl,--plugin-opt=-inline-threshold=10000

Third iteration:

CFLAGS=...
LDFLAGS=-O3 -flto -Wl,--plugin-opt=-inline-threshold=10000 -Wl,--plugin-opt=-enable-dfa-jump-thread

@martong Would the optimization also work if -Xclang -enable-dfa-jump-thread were specified in CFLAGS instead? If "yes", then I'd say this is firmly in the realm of user error and we should not make this change.

In D142725#4085940, @nikic wrote:

@martong Would the optimization also work if -Xclang -enable-dfa-jump-thread were specified in CFLAGS instead? If "yes", then I'd say this is firmly in the realm of user error and we should not make this change.

Yes, that would work with -mllvm -enable-dfa-jump-thread. And the Coremark score is roughly the same (within measurement error). I agree that it would be better if the user added the DFAJT optimization to CFLAGS rather than to the LDFLAGS, but do we really expect the users to have such a subtle knowledge about the opt passes? In particular, it might be surprising that this opt does not have any effect when it is used with LTO at the moment.

Harbormaster completed remote builds in B210362: Diff 492756.Jan 27 2023, 9:32 AM

In D142725#4086159, @martong wrote:

In D142725#4085940, @nikic wrote:

@martong Would the optimization also work if -Xclang -enable-dfa-jump-thread were specified in CFLAGS instead? If "yes", then I'd say this is firmly in the realm of user error and we should not make this change.

Yes, that would work with -mllvm -enable-dfa-jump-thread. And the Coremark score is roughly the same (within measurement error). I agree that it would be better if the user added the DFAJT optimization to CFLAGS rather than to the LDFLAGS, but do we really expect the users to have such a subtle knowledge about the opt passes? In particular, it might be surprising that this opt does not have any effect when it is used with LTO at the moment.

This is an internal LLVM option, it is not intended for end-user consumption. If somebody does want to use it, they had better know what they are doing.

The DFA jump threading pass is intended to be enabled by default in the future, at which point no option will be necessary, and this will just leave behind an (to our knowledge) unnecessary pass run in the LTO pipeline.

In D142725#4086632, @nikic wrote:

In D142725#4086159, @martong wrote:

In D142725#4085940, @nikic wrote:

@martong Would the optimization also work if -Xclang -enable-dfa-jump-thread were specified in CFLAGS instead? If "yes", then I'd say this is firmly in the realm of user error and we should not make this change.

Yes, that would work with -mllvm -enable-dfa-jump-thread. And the Coremark score is roughly the same (within measurement error). I agree that it would be better if the user added the DFAJT optimization to CFLAGS rather than to the LDFLAGS, but do we really expect the users to have such a subtle knowledge about the opt passes? In particular, it might be surprising that this opt does not have any effect when it is used with LTO at the moment.

This is an internal LLVM option, it is not intended for end-user consumption. If somebody does want to use it, they had better know what they are doing.

The DFA jump threading pass is intended to be enabled by default in the future, at which point no option will be necessary, and this will just leave behind an (to our knowledge) unnecessary pass run in the LTO pipeline.

Okay, that makes sense to me. There is, however, something that still bothers me. There is the old JumpThreadingPass which is added both to the function simplification pipeline and to the LTO pipeline. So, adding DFAJumpThreading to the LTO pipeline would make it consistent with the way how the old JumpThreadingPass is added to these pipelines, isn't it?

xbolva00 added inline comments.Jan 27 2023, 11:27 AM

llvm/lib/Passes/PassBuilderPipelines.cpp
1736	And CE is here as well…

Hey guys, thank you for the review so far. But I am not sure if we have a full consensus yet.
@nikic Do you think that the arguments above are reasonable for this change?

In D142725#4086788, @martong wrote:

Okay, that makes sense to me. There is, however, something that still bothers me. There is the old JumpThreadingPass which is added both to the function simplification pipeline and to the LTO pipeline. So, adding DFAJumpThreading to the LTO pipeline would make it consistent with the way how the old JumpThreadingPass is added to these pipelines, isn't it?

These passes aren't really comparable, even if they both have jump threading in their name. JumpThreading is a basic control-flow optimization pass. It's going to be able to do some useful work pretty much anytime additional inlining opportunities arise. DFAJumpThreading is a pass targeting a very specific pattern -- for LTO to be useful, part of the DFA switch structure must only become visible as a result of LTO. Not impossible, but probably also not particularly common?

That said, I do find the argument compelling in abstract. However, it's not really clear to me how we decide which passes get rerun during LTO and which don't. Maybe @fhahn has some insight there.

If we run the pass during LTO, I'm not sure the proposed position is the right one (the same applies to the position of the ConstraintElimination pass next to it). The FPM this is being added to is commented as "Run a few AA driver optimizations here and now to cleanup the code". It's an FPM that gets run before FuncAttrs and GlobalsAA. There is a MainFPM after it, where most of the passes are scheduled.

Generally, after looking at it just now, I find the way the fat LTO pipeline is structured extremely confusing. Normally we structure the pipeline into a simplification pipeline that gets run as part of a CGSCC pass manager with the inliner, and an optimization pipeline that runs afterwards. However, fat LTO appears to schedule the inliner without any other passes.

Herald added a subscriber: StephenFan. · View Herald TranscriptFeb 1 2023, 1:36 AM

Ping.
@fhahn Could you please take a look, if you have some time? Would be great to hear your thoughts.

Revision Contents

Path

Size

llvm/

lib/

Passes/

PassBuilderPipelines.cpp

2 lines

Diff 492756

llvm/lib/Passes/PassBuilderPipelines.cpp

Show First 20 Lines • Show All 1,723 Lines • ▼ Show 20 Lines	PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,

// Remove unused arguments from functions.		// Remove unused arguments from functions.
MPM.addPass(DeadArgumentEliminationPass());		MPM.addPass(DeadArgumentEliminationPass());

FunctionPassManager FPM;		FunctionPassManager FPM;
// The IPO Passes may leave cruft around. Clean up after them.		// The IPO Passes may leave cruft around. Clean up after them.
FPM.addPass(InstCombinePass());		FPM.addPass(InstCombinePass());
invokePeepholeEPCallbacks(FPM, Level);		invokePeepholeEPCallbacks(FPM, Level);
		if (EnableDFAJumpThreading && Level.getSizeLevel() == 0)
		FPM.addPass(DFAJumpThreadingPass());

if (EnableConstraintElimination)		if (EnableConstraintElimination)
FPM.addPass(ConstraintEliminationPass());		FPM.addPass(ConstraintEliminationPass());
		xbolva00Unsubmitted Not Done Reply Inline Actions And CE is here as well… xbolva00: And CE is here as well…

FPM.addPass(JumpThreadingPass());		FPM.addPass(JumpThreadingPass());

// Do a post inline PGO instrumentation and use pass. This is a context		// Do a post inline PGO instrumentation and use pass. This is a context
// sensitive PGO pass.		// sensitive PGO pass.
if (PGOOpt) {		if (PGOOpt) {
if (PGOOpt->CSAction == PGOOptions::CSIRInstr)		if (PGOOpt->CSAction == PGOOptions::CSIRInstr)
addPGOInstrPasses(MPM, Level, /* RunProfileGen */ true,		addPGOInstrPasses(MPM, Level, /* RunProfileGen */ true,
▲ Show 20 Lines • Show All 256 Lines • Show Last 20 Lines