Page MenuHomePhabricator

Introduce a hybrid target to generate code for either the GPU or CPU
ClosedPublic

Authored by singam-sanjay on Jun 9 2017, 8:12 AM.

Details

Summary

Introduce a "hybrid" -polly-target option to optimise code for either the GPU or CPU.

When this target is selected, PPCGCodeGeneration will attempt first to optimise a Scop. If the Scop isn't modified, it is then sent to the passes that form the CPU pipeline, i.e. IslScheduleOptimizerPass, IslAstInfoWrapperPass and CodeGeneration.

In case the Scop is modified, it is marked to be skipped by the subsequent CPU optimisation passes.

Diff Detail

Repository
rL LLVM

Event Timeline

singam-sanjay created this revision.Jun 9 2017, 8:12 AM

I'll soon be adding a test-case. Do you have suggestions for kernels that would not be optimised by PPCGCodeGen but by IslScheduleOptimizer ?

bollu edited edge metadata.Jun 12 2017, 4:08 AM

@singam-sanjay: nitpick: polly-update-format changes the format of the file in a couple of places. For next time, could you please update the diff after running ninja/make polly-update-format? I've made a couple of stylistic comments that are upto personal choice :)

include/polly/ScopInfo.h
1624 ↗(On Diff #102032)

nit: can this be done in the constructor to maintain uniformity? Perhaps we could change the stuff that's constant to use the initialisation syntax in another patch?

2336 ↗(On Diff #102032)

nit: can this be called multiple times on a Scop? If not, can we check for this invariant with:

invariant-only-called-once.cpp
void markAsToBeSkipped() {
    assert(!SkipScop && "scop is marked as to be skipped multiple times.");
    SkipScop = true;
}
lib/CodeGen/CodeGeneration.cpp
271 ↗(On Diff #102032)

nit: . at the end of the comment.

lib/CodeGen/IslAst.cpp
628 ↗(On Diff #102032)

nit: . at the end of the comment.

lib/CodeGen/PPCGCodeGeneration.cpp
2690 ↗(On Diff #102032)

nit: can we continue to use S instead of CurrentScop as is done by S->hasInvariantAccesses()?

We could remove the take-a-reference in another patch.

lib/Support/RegisterPasses.cpp
94 ↗(On Diff #102032)

nit: I feel uncomfortable due to the mix of ALL_CAPS with SOME_Caps. This is better done in a separate patch, but can we consider Target_ rather than TARGET_ as the prefix?

@singam-sanjay: nitpick: polly-update-format changes the format of the file in a couple of places. For next time, could you please update the diff after running ninja/make polly-update-format? I've made a couple of stylistic comments that are upto personal choice :)

Thanks for pointing it out !

Do you find any reason that this might not work ?

include/polly/ScopInfo.h
1624 ↗(On Diff #102032)

Instead of using the SkipScop, can I set isOptimised to true and skip the Scop in the CPU optimization passes by checking`(S.isOptimized()==true)` ?

lib/CodeGen/PPCGCodeGeneration.cpp
2622 ↗(On Diff #102032)

TODO: prevent code-gen when (GPUNodeBuilder::BuildSuccessful==true) && (NodeBuilder.DeepestSequential <= NodeBuilder.DeepestParallel)

2690 ↗(On Diff #102032)

TODO : markAsToBeSkipped only when (PPCGGen->tree != NULL) && (GPUNodeBuilder::BuildSuccessful==true) && (NodeBuilder.DeepestSequential <= NodeBuilder.DeepestParallel)
ALONG WITH : "prevent code-gen when (GPUNodeBuilder::BuildSuccessful==true) && (NodeBuilder.DeepestSequential <= NodeBuilder.DeepestParallel)"

I have pointed out in this polly-dev post that code-generating instructions required to launch the kernel, even when the PTX kernel isn't available, will invalidate Analysis information regarding the CurrentScop making it impossible for subsequent CPU opt passes to do anything. So, marking the Scop to be skipped is a safe thing to do, but we would've lost optimization opportunities.

singam-sanjay added inline comments.Jun 12 2017, 5:17 AM
include/polly/ScopInfo.h
1624 ↗(On Diff #102032)

Sure.

2336 ↗(On Diff #102032)

I can think of justifications for both cases, calling markAsToBeSkipped once Vs. many times.

Call only once : if a Scop is marked to be skipped by a previous pass, the subsequent passes needn't process the Scop and have to check with isToBeSkipped before marking it themselves.

Call multiple times : A Scop might be marked multiple times within a pass, like GPUNodeBuilder::BuildSuccessful.

What do you think is more consistent ?

lib/CodeGen/PPCGCodeGeneration.cpp
2690 ↗(On Diff #102032)

nit: can we continue to use S instead of CurrentScop as is done by S->hasInvariantAccesses()?

Sure. I thought using CurrentScop might be more appropriate because the Scop was passed through CurrentScop and not S and this is communicating information relevant to not just this pass.

We could remove the take-a-reference in another patch.

I didn't understand this.

lib/Support/RegisterPasses.cpp
94 ↗(On Diff #102032)

nit: I feel uncomfortable due to the mix of ALL_CAPS with SOME_Caps. This is better done in a separate patch, but can we consider Target_ rather than TARGET_ as the prefix

I felt it was better to keep the "TARGET_" uniform. Using "Target_" could be inconsistent for enums because they're generally ALL_CAPS and camel case is used to declare llvm::cl variables and template types e.g. DumpAfter and TargetChoice rsply.

SOME_Caps is better since "Hybrid" isn't a short-form any word and also conveys its distinction visually from plain "CPU" and "GPU" targets. 😁

I'll change it if you still feel the need.

  1. Punctuation consistency in comments.
  2. Align code to LLVM's C/C++ format.
  3. Initialized SkipScop at constructor like all other members.
  4. Include code for GPU targeting only when GPU_CODEGEN is defined.
singam-sanjay marked 3 inline comments as done.Jun 12 2017, 8:32 AM
grosser accepted this revision.Jun 13 2017, 5:04 AM

Hi Sanjay,

the general direction of this patch looks good. I just have some minor comments!

Best,
Tobias

lib/Support/RegisterPasses.cpp
362 ↗(On Diff #102186)

I feel as if we have a some code duplication here. Would it make sense to write it as:

if (GPU || Hybrid)
  PPCG

if (CPU || Hybrid)
  ScheduleOptimizer
   IslAst
  Codegen

?

94 ↗(On Diff #102032)

Enums are written UPPERCASE here. See CODEGEN_FULL, CODEGEN_NONE above. Please change.

This revision is now accepted and ready to land.Jun 13 2017, 5:04 AM
  1. Rename TARGET_Hybrid to TARGET_HYBRID
  2. Reduce code redundancy.
singam-sanjay marked an inline comment as done.Jun 16 2017, 5:15 AM
singam-sanjay added inline comments.
lib/CodeGen/PPCGCodeGeneration.cpp
2690 ↗(On Diff #102032)

@bollu ping.

singam-sanjay added inline comments.Jun 16 2017, 8:39 AM
lib/Support/RegisterPasses.cpp
326 ↗(On Diff #102805)

F.Y.I.
The JSCOP exported here would either be the original Scop in case of TARGET_GPU and the IslScheduleOptimised Scop in case of TARGET_CPU.

330 ↗(On Diff #102805)

F.Y.I.
The previous behaviour is still preserved here for TARGET_CPU, TARGET_GPU and TARGET_HYBRID when the Scop is optimised by IslScheduleOptimizer.

The Schedule exported however seems to remain the same when PPCGCodeGen transforms the Scop when TARGET_HYBRID, with the exception of the boundary of the associated Region being "polly.merge_new_and_old".

94 ↗(On Diff #102032)

@grosser ping.

Hi Sanjay,

this looks already very good. Just a minor thing.

Best,
Tobias

lib/Support/RegisterPasses.cpp
362 ↗(On Diff #102186)

Instead of adding here ppcg again, can you just broaden the condition that calls the first createPPCGCodeGenerationPass

singam-sanjay added inline comments.Jun 28 2017, 9:32 PM
lib/Support/RegisterPasses.cpp
362 ↗(On Diff #102186)

I'm adding PPCGCodeGen here to preserve the order in which passes were sequenced when Target==TARGET_GPU. If you look at the the code on the left, createJSONExporterPass is sequenced to run before createPPCGCodeGenerationPass.

To sequence JSON Exporter after all transformation passes,

  1. A Scop's StmtSet must be updated by PPCGCodeGeneration as well, because JSONExporter uses ScopStsmts and not isl_schedule_tree to generate the file.
    • We should also annotate the ScopStmts to differentiate those that run on the GPU from those that run on the CPU.
  2. Scop::getNameStr must return the original name of the Scop, and not the one based on its Module's changing boundary basic blocks during IRBuilding. I have addressed this in D34565

If we're able to do this, we can finally export the JSCOP of even the Schedule produced by ppcg.

singam-sanjay added inline comments.Jun 30 2017, 10:21 AM
lib/Support/RegisterPasses.cpp
362 ↗(On Diff #102186)
grosser accepted this revision.Jun 30 2017, 10:25 AM

LGTM.

This revision was automatically updated to reflect the committed changes.