This is an archive of the discontinued LLVM Phabricator instance.

A bit-tracking DCE Pass
ClosedPublic

Authored by hfinkel on Feb 10 2015, 3:48 AM.

Details

Summary

This patch introduces a bit-tracking dead code elimination pass, it is based on ADCE (the "aggressive DCE" pass), with the added capability to track dead bits of integer valued instructions and remove those instructions when all of the bits are dead.

Currently, it does not actually do this all-bits-dead removal, but rather replaces the instruction's uses with a constant zero, and lets instcombine (and the later run of ADCE) do the rest. Because we essentially get a run of ADCE "for free" while tracking the dead bits, we also do what ADCE does and removes actually-dead instructions as well.

The motivation for this is a case like:

int __attribute__((const)) foo(int i);
int bar(int x) {
  x |= (4 & foo(5));
  x |= (8 & foo(3));
  x |= (16 & foo(2));
  x |= (32 & foo(1));
  x |= (64 & foo(0));
  x |= (128& foo(4));
  return x >> 4;
}

As it turns out, if you order the bit-field insertions so that all of the dead ones come last, then instcombine will remove them. However, if you pick some other order (such as the one above), the fact that some of the calls to foo() are useless is not locally obvious, and we don't remove them. I could not think of any current pass that "should" remove them, and so I created this one.

It is also possible to do other things with this pass, like "shrink" constant operands to or/and/xor when not all of the bits are required (this might not always be profitable, and we'd probably need to use TTI to make sure we were actually making the constant cheaper).

I did a quick compile-time overhead check using sqlite from the test suite (Release+Asserts). BDCE took ~0.4% of the compilation time (making it about twice as expensive as ADCE).

I've not looked at why yet, but we eliminate instructions due to having all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark

P.S. I hacked this together in one day; criticism is expected. ;) [and more regression tests are certainly needed].

Diff Detail

Event Timeline

hfinkel updated this revision to Diff 19660.Feb 10 2015, 3:48 AM
hfinkel retitled this revision from to A bit-tracking DCE Pass.
hfinkel updated this object.
hfinkel edited the test plan for this revision. (Show Details)
hfinkel added a subscriber: Unknown Object (MLST).
sanjoy edited edge metadata.Feb 10 2015, 12:29 PM

Quick first pass comments inline. I'm okay with this landing in tree with the comments addressed in some way or another, but I think I'll wait for someone else to give an explicit LGTM.

lib/Transforms/Scalar/BDCE.cpp
97

But if you're not the one removing the side-effecting instructions, they should not be removed anyway, right, even if their result is unused? In fact, I think it will be desirable to rewrite the result of foo() to be 0 if possible, even if foo() is a side-effecting call that cannot be removed.

143

[Minor] This is just my opinion, and does not stem form the LLVM style guide; but it'd be easier to tell where the for loop ended if you put braces around the body.

150

I don't think this is correct for side-effecting instructions.

214

Shouldn't this be AOut & ...?

267

Again, I think this should be AOut & ...

304

Am I right in understanding that this will delete %z in the
following case?

;; all i8
%z = add %x, %y
%z0 = and 0x0f, %z
%z1 = and 0xf0, %z0
side_effect(%z0)

Since you'll never visit %z as AliveBits[%z] will be 0?

In any case, a one-liner on how this is better than trivial DCE will be helpful.

majnemer added inline comments.Feb 10 2015, 12:53 PM
lib/Transforms/Scalar/BDCE.cpp
145

Pointer on the right hand side.

hfinkel added inline comments.Feb 10 2015, 3:19 PM
lib/Transforms/Scalar/BDCE.cpp
97

I had been using the setting of all bits to be live as the way to communicate that they'd not be removed.

You're right, however, that even if the call can't be removed, having an unnecessary dependency on the output is also not optimal. I'll change this.

143

I'll refactor this, I think.

150

Good point. Currently, all such instructions have all of their alive bits set, but if I change that, I'll need to change this too.

214

Heh, yes.

267

Yes.

304

Since you'll never visit %z as AliveBits[%z] will be 0?

No, because we set all of its alive bits in the first loop (as with all of the side-effecting instructions).

hfinkel added inline comments.Feb 10 2015, 6:09 PM
lib/Transforms/Scalar/BDCE.cpp
145

I don't understand. The RHS has OI, which is an Instruction::op_iterator, which is a Use*.

hfinkel updated this revision to Diff 19725.Feb 10 2015, 6:15 PM
hfinkel added a reviewer: jmolloy.

Updated to reflect Sanjoy's suggestions. We can now remove the dependence on a dead function return value even if we can't remove the function itself. I've added a test case for that.

Also, I chatted earlier with James, who suggested looking at how this relates to the implementation of InstCombiner::SimplifyDemandedUseBits. I did not see any obvious way of refactoring the logic there so that it could be reused here, but looking at the implementation did point out that I needed a few more lines of code to correctly handle nsw/nuw/exact flags on the shift operators. That's now been added too.

hfinkel updated this revision to Diff 19741.Feb 11 2015, 3:22 AM

Added a few more regression tests.

chandlerc edited edge metadata.Feb 11 2015, 8:00 AM

FYI, my review is in progress.

majnemer added inline comments.Feb 12 2015, 10:05 PM
lib/Transforms/Scalar/BDCE.cpp
145

I was referring to your Instruction* I, I think it should be Instruction *I.

hfinkel added inline comments.Feb 12 2015, 10:09 PM
lib/Transforms/Scalar/BDCE.cpp
145

Agreed. (I think this was one of the copy-and-paste lines from ADCE; I'll need to remember to clean this up in both places).

reames edited edge metadata.Feb 13 2015, 10:31 AM

A few comments inline. As with Sanjoy, happy to see this evolve in tree.

lib/Transforms/Scalar/BDCE.cpp
95

You could you inst_range and a range for loop here

105

I don't get this line. Isn't the worklist of instructions, not of iterators?

111

Shouldn't you be adding I to Visited here?

143

This needs broken up into helper functions. It's hard to follow as is.

159

Having this switch pulled out into it's own function would help readability a lot.

326

Having this up near the top of the loop with an early might be more readable.

348

range-for?

368

Given we're not invalidating iterators here, can't these two loops be merged?

Also, why not do this step first and then delete the dead instruction in the previous loop?

Also, if the bits are dead, why not use undef?

FYI: A number of the review comments here apply to code copied from the ADCE pass (formatting, use of range-based fors, etc.). I've committed those changed to the ADCE pass (r229315 - r229319). I'll also incorporate those changes into the next version of this patch.

jmolloy edited edge metadata.Feb 15 2015, 8:44 AM

Hi Hal,

I'm looking at using very similar information in a pass I'm writing (live bits information). Would it be easy enough to move the AliveBits stuff into a utility class so it can be called from elsewhere?

Cheers,

James

Hi Hal,

I'm looking at using very similar information in a pass I'm writing (live bits information). Would it be easy enough to move the AliveBits stuff into a utility class so it can be called from elsewhere?

Yes, I think we could separate the analysis part from the part that removes the completely-dead parts. If you can tell me, what are you working on?

Cheers,

James

hfinkel added inline comments.Feb 15 2015, 3:39 PM
lib/Transforms/Scalar/BDCE.cpp
95

Good point, will do.

105

This is left over from ADCE, which I've now fixed there. The iterators are convertable to Instruction*s. It is, however, unnecessary (doubly so after converting to a range-based for loop).

111

No need, we need to check isAlwaysLive on the instructions again anyway to determine whether or not to delete them. I've added a comment.

159

Fair enough; I'll pull this out.

326

Okay.

348

Yes.

368

We could do that (although, maybe we should also remove the all-dead-bits instructions).

Regarding undef, I considered that, but I'm somewhat afraid of that given our discussions recently regarding poison and undef.

hfinkel updated this revision to Diff 19989.Feb 15 2015, 3:47 PM
hfinkel edited edge metadata.

Updated in response to review comments (using range-based fors, adjusting other formatting, refactoring, fusing ending loops, etc.).

[Still all one pass, analysis has not been split out into a separate pass yet].

Hi Hal,

I'm looking at using very similar information in a pass I'm writing (live bits information). Would it be easy enough to move the AliveBits stuff into a utility class so it can be called from elsewhere?

Yes, I think we could separate the analysis part from the part that removes the completely-dead parts. If you can tell me, what are you working on?

Upon further reflection, I'm not sure what the right refactoring for this would be. It could be an analysis pass, but it could also be a utility class exported from here. An actual analysis pass may not be particularly useful here because I don't think it would be likely to be preserved by any non-trivial transformation. I don't think there is any real way to "locally repair" the analysis after making a change (you could update it to get conservative answer, but because you never remove live bits under iteration, you'd not get the same answer as you might running the analysis from the beginning).

As a result, we might want to get this functionality in-tree, and then refactor it there.

Cheers,

James

chandlerc accepted this revision.Feb 15 2015, 10:04 PM
chandlerc edited edge metadata.

Lots of nit-picky comments. Generally, this looks really nice. Have you profiled it? Have you tried to synthesize extreme test cases with very large integers to make sure this doesn't become *too* expensive?

I'm generally fine just landing this and sorting out these issues after the fact. Please commit, and sorry for the delay reviewing it.

lib/Transforms/Scalar/BDCE.cpp
45

clang-format doesn't indent things in an anonymous namespace, and I prefer that.

46

clang-format won't indent this, and that's my preference as well.

74

Accept 'Instruction &'?

94

Why accept const inputs just to cast it way?

96

'*I'?

97–99

This behavior doesn't really make sense. Maybe add a comment to the lambda to clarify what V1 and V2 are and why they would work this way?

258–259

Just dig it out of the module, simpler than using the analysis layer.

278–279

continue after this?

341

std::move?

Probably a few other places you might want to move here so that large APInt values don't trigger too many copies of a heap allocated set of bits.

361–363

Invert the conditions, use continue, and un-indent this?

365–367

Why not replace with undef?

380–382

Why not just erase the instructions? You can use the two-level iterators and pre-increment the iterator to avoid invalidation...

test/Transforms/BDCE/dce-pure.ll
6

The indentation (and weird comments) make my OCD-tendencies twitch.

This revision is now accepted and ready to land.Feb 15 2015, 10:04 PM

Lots of nit-picky comments. Generally, this looks really nice. Have you profiled it? Have you tried to synthesize extreme test cases with very large integers to make sure this doesn't become *too* expensive?

I've not profiled it in detail; I've checked compile-time overhead on a few key test cases (like sqlite), and saw nothing significant.

I'm generally fine just landing this and sorting out these issues after the fact. Please commit, and sorry for the delay reviewing it.

lib/Transforms/Scalar/BDCE.cpp
45

Will do (corresponding change made to ADCE in r229412).

94

To keep the casts localized to one place.

97–99

Will do. The general idea is that, we don't want to repeat the known-bits calculation for both operands at each operand.

258–259

Will do.

278–279

Sure.

341

Good point.

365–367

That's probably the right thing to do. However, given some of the discussion on poison/undef semantics, I'm not sure I want to do that. I've tried to stay away from cases here where the instructions themselves are trivial (like icmp sgt x, INT_MAX), but we *could* handle them here, and I'd like to test that change separately.

380–382

For the same reason that ADCE doesn't. The instructions themselves might not be trivially dead (there might be cyclic dependencies on other instructions that we'll visit here later, and the entire cycle will be removed). That's why this is done in two stages.

hfinkel closed this revision.Feb 16 2015, 5:39 PM

r229462; thanks everyone!