This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Manually unroll the argument copy loop
ClosedPublic

Authored by jhuber6 on Sep 2 2021, 9:16 AM.

Download Raw Diff

Details

Reviewers

ggeorgakoudis
jdoerfert

Commits

rGa619072c6189: [OpenMP] Manually unroll the argument copy loop

Summary

The unroll pragma did not properly work as the loop bound was not known
when we optimize the runtime and we then added a "unroll disable"
metadata which prevented unrolling later when the bounds were known.
For now we manually unroll to make sure up to 16 elements are handled
nicely. This helps optimizations to look through the argument passing.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jdoerfert created this revision.Sep 2 2021, 9:16 AM

Herald added subscribers: guansong, bollu, yaxunl. · View Herald TranscriptSep 2 2021, 9:16 AM

jdoerfert requested review of this revision.Sep 2 2021, 9:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 2 2021, 9:16 AM

Herald added a subscriber: sstefan1. · View Herald Transcript

LGTM

This revision is now accepted and ready to land.Sep 2 2021, 9:42 AM

Harbormaster completed remote builds in B122321: Diff 370290.Sep 2 2021, 10:25 AM

• hafixo added a commit: rCRT373035: hwasan: Compatibility fixes for short granules..Sep 6 2021, 12:44 AM

• hafixo added a commit: rGc336557f0238: hwasan: Compatibility fixes for short granules..Sep 6 2021, 12:47 AM

thopre removed a commit: rGc336557f0238: hwasan: Compatibility fixes for short granules..Sep 7 2021, 2:47 AM

thopre removed a commit: rCRT373035: hwasan: Compatibility fixes for short granules..Sep 7 2021, 2:51 AM

jhuber6 commandeered this revision.Mar 21 2022, 4:57 PM

jhuber6 edited reviewers, added: jdoerfert; removed: jhuber6.

This revision now requires review to proceed.Mar 21 2022, 4:57 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2022, 4:57 PM

Update for new runtime.

This revision is now accepted and ready to land.Mar 21 2022, 4:58 PM

Harbormaster completed remote builds in B155522: Diff 417135.Mar 21 2022, 5:04 PM

Closed by commit rGa619072c6189: [OpenMP] Manually unroll the argument copy loop (authored by jhuber6). · Explain WhyMar 21 2022, 5:54 PM

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rGa619072c6189: [OpenMP] Manually unroll the argument copy loop.

Revision Contents

Path

Size

openmp/

libomptarget/

DeviceRTL/

src/

Parallelism.cpp

59 lines

Diff 417144

openmp/libomptarget/DeviceRTL/src/Parallelism.cpp

Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	if (!IsActiveParallelRegion) {
state::ValueRAII LevelRAII(icv::Level, 1u, 0u, true, ident);		state::ValueRAII LevelRAII(icv::Level, 1u, 0u, true, ident);
invokeMicrotask(TId, 0, fn, args, nargs);		invokeMicrotask(TId, 0, fn, args, nargs);
return;		return;
}		}

void **GlobalArgs = nullptr;		void **GlobalArgs = nullptr;
if (nargs) {		if (nargs) {
__kmpc_begin_sharing_variables(&GlobalArgs, nargs);		__kmpc_begin_sharing_variables(&GlobalArgs, nargs);
#pragma unroll		switch (nargs) {
		default:
for (int I = 0; I < nargs; I++)		for (int I = 0; I < nargs; I++)
GlobalArgs[I] = args[I];		GlobalArgs[I] = args[I];
		break;
		case 16:
		GlobalArgs[15] = args[15];
		// FALLTHROUGH
		case 15:
		GlobalArgs[14] = args[14];
		// FALLTHROUGH
		case 14:
		GlobalArgs[13] = args[13];
		// FALLTHROUGH
		case 13:
		GlobalArgs[12] = args[12];
		// FALLTHROUGH
		case 12:
		GlobalArgs[11] = args[11];
		// FALLTHROUGH
		case 11:
		GlobalArgs[10] = args[10];
		// FALLTHROUGH
		case 10:
		GlobalArgs[9] = args[9];
		// FALLTHROUGH
		case 9:
		GlobalArgs[8] = args[8];
		// FALLTHROUGH
		case 8:
		GlobalArgs[7] = args[7];
		// FALLTHROUGH
		case 7:
		GlobalArgs[6] = args[6];
		// FALLTHROUGH
		case 6:
		GlobalArgs[5] = args[5];
		// FALLTHROUGH
		case 5:
		GlobalArgs[4] = args[4];
		// FALLTHROUGH
		case 4:
		GlobalArgs[3] = args[3];
		// FALLTHROUGH
		case 3:
		GlobalArgs[2] = args[2];
		// FALLTHROUGH
		case 2:
		GlobalArgs[1] = args[1];
		// FALLTHROUGH
		case 1:
		GlobalArgs[0] = args[0];
		// FALLTHROUGH
		case 0:
		break;
		}
}		}

{		{
// Note that the order here is important. `icv::Level` has to be updated		// Note that the order here is important. `icv::Level` has to be updated
// last or the other updates will cause a thread specific state to be		// last or the other updates will cause a thread specific state to be
// created.		// created.
state::ValueRAII ParallelTeamSizeRAII(state::ParallelTeamSize, NumThreads,		state::ValueRAII ParallelTeamSizeRAII(state::ParallelTeamSize, NumThreads,
1u, true, ident);		1u, true, ident);
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines