This is an archive of the discontinued LLVM Phabricator instance.

Prototype fix for replacing the SETcc/MOVZX idiom with XOR/SETcc for a 32-bit conditional set of 0,1
AbandonedPublic

Authored by DavidKreitzer on Jun 17 2016, 1:38 PM.

Details

Reviewers
None
Summary

I am just posting this for your reference. This was my initial attempt to fix the SETcc problem back in February. As you can see, the changes weren't all that extensive. The basic approach was to define new SETCC pseudo-opcodes that return 32-bit/64-bit values along with patterns that cause a MOV32r0/SETCC sequence to be generated. The pseudo-ops then get lowered after RA to the "real" SETCC opcodes.

The approach seems sound to me, and it fixed the performance kernel that I wrote. But there are a few things that I don't like.

(1) The opcode explosion is kind of gross: we should really have a condition field rather than building the condition into the opcode.
(2) I couldn't figure out a good way to handle the different register constraints in 32-bit vs. 64-bit mode, so I created separate opcodes for them.

During testing, I got lots of failures in our test suites and never got around to debugging. It could be something obvious/trivial.

So, feel free to take this code and run with it or just give me feedback on the basic approach.

Diff Detail

Event Timeline

DavidKreitzer retitled this revision from to Prototype fix for replacing the SETcc/MOVZX idiom with XOR/SETcc for a 32-bit conditional set of 0,1.
DavidKreitzer updated this object.
DavidKreitzer added a reviewer: mkuper.
mkuper edited edge metadata.Jun 17 2016, 1:49 PM

Hi Dave,

I need to look at this more carefully (I'm not entirely sure it works as intended), but regardless, I'm not a huge fan of having two different opcodes, one for SETCC with a source, and one for SETCC without a source. I mean, imagine we were designing this from scratch. I don't think we'd define two opcodes.

What I've wanted to do - and started doing last year - was add a source dependency to the regular SETCC opcode, and feeding it an IMPLICIT_DEF value when it doesn't matter. The problem is that would require touching every piece of code that creates a SETCC, and probably a lot of places that match on it, hence the extensive changes I was referring to.

Thanks for the quick response, Michael!

I don't think it's unreasonable to have two opcodes - the existing opcode is an 8-bit definition. The new opcode is a 32-bit definition. It is not much different than having two opcodes for addb and addl. The fact that the 32-bit form takes a source operands and the 8-bit form doesn't is just an artifact of the architecture.

FWIW, this is how we implemented it in the Intel Compiler (i.e. two opcodes). Of course, in the Intel Compiler, we don't have the same opcode explosion problem, because the condition is separate from the opcode. So it's really just 2 opcodes, not 2xN opcodes.

I would not be surprised at all if the changes do not work as intended, especially the pattern specifications. I understand that code better than I did 6 months ago, but I'm still pretty new to it.

I don't think the add analogy really holds. addb and addl really are two different opcodes in the ISA, with different semantics. And that also means that they are different all the way down to the MC layer.

What really worries me functionally speaking isn't the patterns - I'm not an expert on those either :-). It's whether after ExpandPostRA you're guaranteed the xor and the (now regular) setcc will actually stay where they need to be w.r.t each other, without the setcc depending on the output of the xor.
Are you sure this is true? And I don't mean in the sense that it works now, but as a general invariant? I'm not saying this isn't true, we may in fact already be relying on this, I just don't know.

This is really main reason why I would really prefer having a single opcode - that guarantees that we'll have an explicit dependency all the way through. (Or we concievably could have two different opcodes all the way through, but, yikes).

Your question is a good one, and I don't know the answer. Who knows? That might be the cause of all the failures I was getting.

So what you are basically proposing is to have a single opcode that produces a 32-bit result, right? I can buy into that proposal. But yes, that will require much more extensive changes, I expect.

After thinking about this a bit more, what I wrote before is probably nonsense, and you're likely right.

I mean, on the DAG level, you really have to have the explicit dependence. But once you're on the MI level, any instruction that clobbers the low GR8 has to have an implicit dependency on the GR32, otherwise nothing would work right. (E.g. MOV8ri is also modeled as only having a GR8 out operand, but no sources). I'll need to check how that really works.

Anyway, I'm still not a fan of having separate opcodes, but the objections I have right now are much weaker than "it won't work". :-)

Ok, I'm an idiot.

The dependence we have on the MI level is between the xor and the eventual user of the GR32 (with the setcc clobbering the low GR8).
The fact the setcc doesn't depend on the xor doesn't matter at the MI stage.

mkuper resigned from this revision.Sep 12 2016, 2:39 PM
mkuper removed a reviewer: mkuper.

Just clearing this out of my review queue. :-)

DavidKreitzer abandoned this revision.Jan 19 2017, 5:38 AM