Page MenuHomePhabricator

Allow BB duplication threshold to be adjusted through JumpThreading's ctor
ClosedPublic

Authored by hliao on Sep 22 2014, 12:38 PM.

Details

Summary

Some targets may not want to duplicate BB when threading jumps or use different threshold. Adding threshold parameter in that pass's ctor and initializer allows customization on BB duplication.

Diff Detail

Event Timeline

hliao updated this revision to Diff 13947.Sep 22 2014, 12:38 PM
hliao retitled this revision from to Allow BB duplication threshold to be adjusted through JumpThreading's ctor.
hliao updated this object.
hliao edited the test plan for this revision. (Show Details)
hliao added a reviewer: hfinkel.
hliao added a subscriber: Unknown Object (MLST).
spatel added a subscriber: spatel.Sep 22 2014, 2:45 PM
spatel added inline comments.
lib/Transforms/Scalar/JumpThreading.cpp
50

I have no opinion about the functionality of this patch, but how about giving a name to that magic '6'...like:

static unsigned BBDuplicateThresholdDefault = 6;

Then you can use that constant as the default parameter assignment, and you don't have to introduce yet another magic number (-1) and cast signed to unsigned.

PING again

lib/Transforms/Scalar/JumpThreading.cpp
50

'-1' here is used to tell whether a value other that default threshold is specified or not. However, that default threshold would be changed through command line option. Adding a name for that magic '6' and using it as default parameter does work correctly when that default threshold is changed through command option but no value is specified through pass creator.

nadav edited edge metadata.Sep 23 2014, 5:37 PM

The functionality of the patch looks okay. Please add a comment that says that non-negative values override the internal default.

hfinkel edited edge metadata.Sep 27 2014, 7:02 PM

If this is supposed to allow for target customization, should the number come from TTI? What is the underlying hardware constraint(s) you're trying to model? Also, a test case would be nice.

nadav added a comment.Sep 27 2014, 7:42 PM

I think that he is trying to disable code duplication because GPU languages may contain barriers that must not be duplicated.

I think that he is trying to disable code duplication because GPU languages may contain barriers that must not be duplicated.

If that's the motivation, then I wonder why our "noduplicate" function attribute is not already sufficient.

hliao added a comment.Sep 29 2014, 4:34 PM

Hi Hal

Yeah, "noduplicate" could prevent duplicating of barrier calls but that
patch wants to address the potential issue on processors with divergent
control flow, commonly found in GPUs, e.g. AMD/NVIDIA ones. The
scenario is that, if BB is duplicated to exploit more jump threading,
targets with divergent CF may execute more instructions if the
condition is a divergent one.

For updating that threshold from TTI, yeah, if we are interested in
that case. I could come another patch considering both TTI and
user-specified threshold.

Yours

  • Michael
  • Original Message -----

From: "Michael Liao" <michael.liao@intel.com>
To: "michael liao" <michael.liao@intel.com>, nrotem@apple.com, hfinkel@anl.gov
Cc: spatel@rotateright.com, llvm-commits@cs.uiuc.edu
Sent: Monday, September 29, 2014 6:34:36 PM
Subject: Re: [PATCH] Allow BB duplication threshold to be adjusted through JumpThreading's ctor

Hi Hal

Yeah, "noduplicate" could prevent duplicating of barrier calls but
that
patch wants to address the potential issue on processors with
divergent
control flow, commonly found in GPUs, e.g. AMD/NVIDIA ones. The
scenario is that, if BB is duplicated to exploit more jump threading,
targets with divergent CF may execute more instructions if the
condition is a divergent one.

For updating that threshold from TTI, yeah, if we are interested in
that case. I could come another patch considering both TTI and
user-specified threshold.

I suppose that I don't understand what you mean by "if we are interested." Generally speaking, ctor parameters are useful only for clients who are not using the standard optimization pipeline, and we'd like the standard optimization pipeline to generally work well for a wide range of targets. Thus, a TTI interface is preferred.

From a cost modeling perspective, how can you tell whether the instruction duplication will be worthwhile. Can this be something like 2*(instruction costs) <= (branch cost)?

Thanks again,
Hal

Yours

sebpop accepted this revision.Oct 19 2016, 9:25 AM
sebpop added a reviewer: sebpop.
sebpop added a subscriber: sebpop.

This patch has been committed as https://reviews.llvm.org/rL218375

This revision is now accepted and ready to land.Oct 19 2016, 9:25 AM
sebpop closed this revision.Oct 19 2016, 9:25 AM