This is an archive of the discontinued LLVM Phabricator instance.

Generate extra .ll files before/after optimization when using -save-temps.
AbandonedPublic

Authored by jgorbe on May 11 2017, 3:49 PM.

Download Raw Diff

Details

Reviewers

Summary

When -save-temps is specified, add two actions to the compilation to generate two new .ll outputs: one containing unoptimized IR and another one containing optimized IR.

Note that these new additional outputs will not be generated in compilations that target multiple architectures, because in that case the compiler generates an error when not all outputs generated by Actions can be processed by lipo.

Diff Detail

Event Timeline

jgorbe created this revision.May 11 2017, 3:49 PM

jgorbe edited the summary of this revision. (Show Details)May 11 2017, 4:00 PM

Only generate extra .ll files when not targeting multiple archs. Updated related tests.

jgorbe added a reviewer: chandlerc.May 31 2017, 5:12 PM

jgorbe added a subscriber: cfe-commits.

tra added a subscriber: tra.May 31 2017, 5:28 PM

tra added inline comments.

lib/Driver/Driver.cpp
2588–2591	I'm not sure I understand why unoptimized IR can't be written out for multi-arch builds like CUDA. Could you elaborate?

Clarify that we only skip saving IR for multiarch Mach-O universal builds, not other multi-arch builds like CUDA.

lib/Driver/Driver.cpp
2588–2591	The error I was trying to work around here is this one. It's not really specific to all multi-arch builds, but to Mach-O multi-arch builds. For those, BuildUniversalActions is called, which checks that all actions produce outputs with lipo-able types (see also the types::canLipoType() function) or errors out otherwise. I'm not sure the check is still necessary, but I got no answers when I asked about it in cfe-dev, so I decided to just not do it for that kind of builds. I have updated the patch to make it clear that we only skip generating these IR files in that particular case, not every multi-arch build. Thanks for pointing this out!

tra added inline comments.Jun 1 2017, 9:42 AM

lib/Driver/Driver.cpp
2592–2603	Can this be moved below addHostDependenceToDeviceActions() on line 2626? See my comment in cuda-options.cu below for the reasons why it may be necessary.
test/Driver/cuda-options.cu
197–202 ↗	(On Diff #100954)	It appears that the new actions you've pushed trigger at least parts of host-side compilation to happen before device-side compilation is done. That, at the very least, will not work well for CUDA. Compilation will probably succeed, but it will be missing device-side code and will fail at runtime. If it's only host-side preprocessor that happens ahead of device-side actions, then may be OK, but in general host actions must be done after device's.

jgorbe added inline comments.Jun 1 2017, 5:00 PM

lib/Driver/Driver.cpp
2592–2603	I have tried but it didn't cause the test to go back to the previous ordering where all device-side actions are executed before all host-side actions. I have noticed that, before applying my patch, the action graph produced by clang with -ccc-print-phases doesn't seem to introduce any explicit dependency to guarantee that device-side actions are executed before host-side actions: 0: input, "cuda-options.cu", cuda, (host-cuda) 1: preprocessor, {0}, cuda-cpp-output, (host-cuda) 2: compiler, {1}, ir, (host-cuda) 3: input, "cuda-options.cu", cuda, (device-cuda, sm_20) 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_20) 5: compiler, {4}, ir, (device-cuda, sm_20) 6: backend, {5}, assembler, (device-cuda, sm_20) 7: assembler, {6}, object, (device-cuda, sm_20) 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_20)" {7}, object 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_20)" {6}, assembler 10: linker, {8, 9}, cuda-fatbin, (device-cuda) 11: offload, "host-cuda (x86_64--linux-gnu)" {2}, "device-cuda (nvptx64-nvidia-cuda)" {10}, ir 12: backend, {11}, assembler, (host-cuda) 13: assembler, {12}, object, (host-cuda) I'm trying to figure out now if there's something else that enforces that restriction, or the current compilation order being the right one is a happy coincidence that my patch happened to disturb. I'm new to the project, so I'm working under the assumption that I'm missing something, any hints will be appreciated :)

jgorbe abandoned this revision.Feb 14 2019, 11:29 AM

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 14 2019, 11:29 AM

Revision Contents

Path

Size

lib/

Driver/

Driver.cpp

16 lines

Diff 98693

lib/Driver/Driver.cpp

Show First 20 Lines • Show All 2,579 Lines • ▼ Show 20 Lines	for (SmallVectorImpl<phases::ID>::iterator i = PL.begin(), e = PL.end();
// Queue linker inputs.		// Queue linker inputs.
if (Phase == phases::Link) {		if (Phase == phases::Link) {
assert((i + 1) == e && "linking must be final compilation step.");		assert((i + 1) == e && "linking must be final compilation step.");
LinkerInputs.push_back(Current);		LinkerInputs.push_back(Current);
Current = nullptr;		Current = nullptr;
break;		break;
}		}

		// When saving temps, add extra actions to write unoptimized and optimized
		// IR besides the normal bitcode outputs.
		if (isSaveTempsEnabled() && Phase == phases::Compile) {
		Actions.push_back(
		traUnsubmitted Not Done Reply Inline Actions I'm not sure I understand why unoptimized IR can't be written out for multi-arch builds like CUDA. Could you elaborate? tra: I'm not sure I understand why unoptimized IR can't be written out for multi-arch builds like…
		jgorbeAuthorUnsubmitted Not Done Reply Inline Actions The error I was trying to work around here is this one. It's not really specific to all multi-arch builds, but to Mach-O multi-arch builds. For those, BuildUniversalActions is called, which checks that all actions produce outputs with lipo-able types (see also the types::canLipoType() function) or errors out otherwise. I'm not sure the check is still necessary, but I got no answers when I asked about it in cfe-dev, so I decided to just not do it for that kind of builds. I have updated the patch to make it clear that we only skip generating these IR files in that particular case, not every multi-arch build. Thanks for pointing this out! jgorbe: The error I was trying to work around here is [[ https://reviews.llvm.
		C.MakeAction<CompileJobAction>(Current, types::TY_LLVM_IR));
		}
		if (isSaveTempsEnabled() && Phase == phases::Backend) {
		Actions.push_back(
		C.MakeAction<BackendJobAction>(Current, types::TY_LLVM_IR));
		}

// Otherwise construct the appropriate action.		// Otherwise construct the appropriate action.
auto *NewCurrent = ConstructPhaseAction(C, Args, Phase, Current);		auto *NewCurrent = ConstructPhaseAction(C, Args, Phase, Current);

// We didn't create a new action, so we will just move to the next phase.		// We didn't create a new action, so we will just move to the next phase.
if (NewCurrent == Current)		if (NewCurrent == Current)
		traUnsubmitted Not Done Reply Inline Actions Can this be moved below addHostDependenceToDeviceActions() on line 2626? See my comment in cuda-options.cu below for the reasons why it may be necessary. tra: Can this be moved below addHostDependenceToDeviceActions() on line 2626? See my comment in cuda…
		jgorbeAuthorUnsubmitted Not Done Reply Inline Actions I have tried but it didn't cause the test to go back to the previous ordering where all device-side actions are executed before all host-side actions. I have noticed that, before applying my patch, the action graph produced by clang with -ccc-print-phases doesn't seem to introduce any explicit dependency to guarantee that device-side actions are executed before host-side actions: 0: input, "cuda-options.cu", cuda, (host-cuda) 1: preprocessor, {0}, cuda-cpp-output, (host-cuda) 2: compiler, {1}, ir, (host-cuda) 3: input, "cuda-options.cu", cuda, (device-cuda, sm_20) 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_20) 5: compiler, {4}, ir, (device-cuda, sm_20) 6: backend, {5}, assembler, (device-cuda, sm_20) 7: assembler, {6}, object, (device-cuda, sm_20) 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_20)" {7}, object 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_20)" {6}, assembler 10: linker, {8, 9}, cuda-fatbin, (device-cuda) 11: offload, "host-cuda (x86_64--linux-gnu)" {2}, "device-cuda (nvptx64-nvidia-cuda)" {10}, ir 12: backend, {11}, assembler, (host-cuda) 13: assembler, {12}, object, (host-cuda) I'm trying to figure out now if there's something else that enforces that restriction, or the current compilation order being the right one is a happy coincidence that my patch happened to disturb. I'm new to the project, so I'm working under the assumption that I'm missing something, any hints will be appreciated :) jgorbe: I have tried but it didn't cause the test to go back to the previous ordering where all device…
continue;		continue;

Current = NewCurrent;		Current = NewCurrent;

// Use the current host action in any of the offloading actions, if		// Use the current host action in any of the offloading actions, if
// required.		// required.
if (OffloadBuilder.addHostDependenceToDeviceActions(Current, InputArg))		if (OffloadBuilder.addHostDependenceToDeviceActions(Current, InputArg))
break;		break;
▲ Show 20 Lines • Show All 928 Lines • ▼ Show 20 Lines	if (MultipleArchs && !BoundArch.empty()) {
Suffixed.append(BoundArch);		Suffixed.append(BoundArch);
}		}
// When using both -save-temps and -emit-llvm, use a ".tmp.bc" suffix for		// When using both -save-temps and -emit-llvm, use a ".tmp.bc" suffix for
// the unoptimized bitcode so that it does not get overwritten by the ".bc"		// the unoptimized bitcode so that it does not get overwritten by the ".bc"
// optimized bitcode output.		// optimized bitcode output.
if (!AtTopLevel && C.getArgs().hasArg(options::OPT_emit_llvm) &&		if (!AtTopLevel && C.getArgs().hasArg(options::OPT_emit_llvm) &&
JA.getType() == types::TY_LLVM_BC)		JA.getType() == types::TY_LLVM_BC)
Suffixed += ".tmp";		Suffixed += ".tmp";
		// When using -save-temps, append a ".unoptimized" suffix so that the
		// optimized .ll file doesn't overwrite the unoptimized one.
		if (isSaveTempsEnabled() && JA.getType() == types::TY_LLVM_IR &&
		JA.getKind() == Action::CompileJobClass)
		Suffixed += ".unoptimized";
Suffixed += '.';		Suffixed += '.';
Suffixed += Suffix;		Suffixed += Suffix;
NamedOutput = C.getArgs().MakeArgString(Suffixed.c_str());		NamedOutput = C.getArgs().MakeArgString(Suffixed.c_str());
}		}

// Prepend object file path if -save-temps=obj		// Prepend object file path if -save-temps=obj
if (!AtTopLevel && isSaveTempsObj() && C.getArgs().hasArg(options::OPT_o) &&		if (!AtTopLevel && isSaveTempsObj() && C.getArgs().hasArg(options::OPT_o) &&
JA.getType() != types::TY_PCH) {		JA.getType() != types::TY_PCH) {
▲ Show 20 Lines • Show All 393 Lines • Show Last 20 Lines