This is an archive of the discontinued LLVM Phabricator instance.

[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables.
ClosedPublic

Authored by aymanmus on Apr 30 2017, 7:33 AM.

Details

Summary

X86 backend holds huge tables in order to map between the register and memory forms of each instruction.
This TableGen Backend automatically generated all these tables with the appropriate flags for each entry.

Diff Detail

Repository
rL LLVM

Event Timeline

aymanmus created this revision.Apr 30 2017, 7:33 AM
RKSimon added a subscriber: RKSimon.
RKSimon added inline comments.
test/CodeGen/X86/stack-folding-fp-avx1.ll
1654 ↗(On Diff #97233)

Please don't delete tests, especially if they've regressed - show the new codegen and add a todo comment

RKSimon edited edge metadata.May 1 2017, 6:15 AM

A few style comments

utils/TableGen/X86FoldTablesEmitter.cpp
69 ↗(On Diff #97233)

Is there a better way to avoid having to strip the trailing " | "? If not, at least comment what this is doing.

265 ↗(On Diff #97233)

Use llvm::any_of ?

275 ↗(On Diff #97233)

Use llvm::any_of ?

329 ↗(On Diff #97233)
if (IntrinsicSensitive &&
    (MemRec->getName() == "sdmem" || MemRec->getName() == "ssmem")) 
  return 128;
360 ↗(On Diff #97233)

llvm::any_of

397 ↗(On Diff #97233)
return (Op->getName().find("_NOREX") != std::string::npos);
craig.topper added inline comments.May 1 2017, 7:55 PM
utils/TableGen/X86FoldTablesEmitter.cpp
69 ↗(On Diff #97233)

Couldn't we just always emit a 0 at the end? We do this in other generated tables already.

110 ↗(On Diff #97233)

Can these be StringRef instead of std::string?

118 ↗(On Diff #97233)

Why does this need to be vector? Why not a regular array or std::array?

215 ↗(On Diff #97233)

This is a string copy. Did you omit an '&'?

craig.topper added inline comments.May 1 2017, 7:55 PM
utils/TableGen/X86FoldTablesEmitter.cpp
223 ↗(On Diff #97233)

string copy

234 ↗(On Diff #97233)

Don't use 'else' after 'return' per coding standards.

372 ↗(On Diff #97233)

Can we extract the X86Local namespace from X86RecognizableInstr.cpp and do this by encodings rather than by string match names?

397 ↗(On Diff #97233)

Isn't Op->getName() returning a StringRef? Shouldn't that be String::npos?

654 ↗(On Diff #97233)

Again I'd like to see if we can use enum encodings from X86Local here.

691 ↗(On Diff #97233)

Why is this uint16_t? Isn't opcode only 8-bits?

720 ↗(On Diff #97233)

You use uint64_t for Opc here, but uint16_t on line 729. I think really it should be uint8_t for both right?

aymanmus marked 15 inline comments as done.May 3 2017, 5:12 AM
aymanmus added inline comments.
test/CodeGen/X86/stack-folding-fp-avx1.ll
1654 ↗(On Diff #97233)

The test checks that specific instructions are memory folded.
In the removed test cases, the instructions were removed from the memory folding tables so testing that these instructions are NOT folded in this test, is not the best idea.

utils/TableGen/X86FoldTablesEmitter.cpp
69 ↗(On Diff #97233)

Thanks for the suggestion.

118 ↗(On Diff #97233)

Defining an array as a class member requires stating it's size (ManualMapSet[18]), while std::vector doesn't. I thought this table might be dynamic and changed from time to time, so this way can be more friendly for future tuning.
What do you think?

372 ↗(On Diff #97233)

This is very helpful, thanks for the suggestion.

aymanmus updated this revision to Diff 97610.May 3 2017, 5:13 AM
aymanmus marked an inline comment as done.
craig.topper added inline comments.May 3 2017, 8:37 AM
utils/TableGen/X86FoldTablesEmitter.cpp
109 ↗(On Diff #97610)

Fix indentation.

118 ↗(On Diff #97233)

The number is required even if you put the array contents with it?

215 ↗(On Diff #97233)

Should those be String::npos too?

aymanmus marked 2 inline comments as done.May 4 2017, 12:34 AM
aymanmus added inline comments.
utils/TableGen/X86FoldTablesEmitter.cpp
118 ↗(On Diff #97233)

Yes, ManualMapSet[] =... would result in defining 0 sized array, and the compiler would give the following messages:

  • warning: ISO C++ forbids zero-size array ManualMapSet
  • error: too many initializers for const {anonymous}::X86FoldTablesEmitter::ManualMapEntry [0]
aymanmus updated this revision to Diff 97784.May 4 2017, 12:35 AM
RKSimon added inline comments.May 4 2017, 3:37 AM
test/CodeGen/X86/stack-folding-fp-avx1.ll
1654 ↗(On Diff #97233)

OK - so just leave TODO comments for the 4 cases:

; TODO stack_fold_sqrtsd
; TODO stack_fold_sqrtsd_int
; TODO stack_fold_sqrtss
; TODO stack_fold_sqrtss_int
aymanmus added inline comments.May 4 2017, 5:31 AM
test/CodeGen/X86/stack-folding-fp-avx1.ll
1654 ↗(On Diff #97233)

Sorry if i'm being too picky about this, but I cannot understand why to add TODOs for these test cases. what should be done with them later on? what would it indicate?

RKSimon added inline comments.May 4 2017, 8:03 AM
test/CodeGen/X86/stack-folding-fp-avx1.ll
1654 ↗(On Diff #97233)

The SSE scalar math memory folding has always been tricky - I had wondered if these cases could be handled by X86InstrInfo::foldMemoryOperandCustom? The _Int variants will probably have to be treated quite differently but it'd be great to get pass-through of the upper bits from the other operand being properly supported some day.

I notice that you didn't have to alter the SSE versions, nor the rsqrt/rcp equivalents, so in the medium term I think we can get this fixed, which is why I say we keep the TODOs there to help keep track. You can strip the actual test cases, just leave the 4 TODO comments.

filcab added a subscriber: filcab.May 4 2017, 9:29 AM

I'd prefer to have the tables, string tables, IsMatch, etc, outside of the X86FoldTablesEmitter.
Most (all?) of them don't really anything from there, and it would make the class smaller and easier to understand its workings (the tables and the small utility functions are all self-contained).

test/CodeGen/X86/stack-folding-fp-avx1.ll
1654 ↗(On Diff #97233)

Adding TODO tests and comments allows anyone reading the source code/tests to notice that there's an optimization that should be done but isn't being done.
It's especially helpful in cases like this. From your comments (and other tests in this diff), it looks like we're not folding some sqrts? instructions after this patch.
We should keep these tests and add a TODO mentioning that folding this instructions is not easy to do with tables and they should be folded elsewhere.

utils/TableGen/X86FoldTablesEmitter.cpp
48 ↗(On Diff #97784)

Please specify the default initialization value in the declaration of the member variables.

88 ↗(On Diff #97784)

Lose the typedef, just enum UnfoldStrategy { ... };.

105 ↗(On Diff #97784)

This isn't needed if you pull the IsMatch class outside of this one.

108 ↗(On Diff #97784)

No need to specify the size of the array.

I know Craig mentioned StringRef (which is strictly better than string), but why not a simple const char *? For ExplicitUnalign too. These are only used as parameters to StringRef::find().

113 ↗(On Diff #97784)

Array size.

160 ↗(On Diff #97784)

D28744 was abandoned, and this comment made me think that we were doing something special when optimizing for size, which doesn't seem to be the case, after reading the patch.
Is this a "this should be done" comment?

204 ↗(On Diff #97784)

Nit: I'd go with either just S or a full Strategy

248 ↗(On Diff #97784)
268 ↗(On Diff #97784)

Same comment as above?
(Remove inline from below, too)

277 ↗(On Diff #97784)

Why end the anonymous namespace here if you end up making a bunch of the following functions static inline?

281 ↗(On Diff #97784)

assert(B->getNumBits() <= sizeof(uint64_t)*8);

297 ↗(On Diff #97784)

You should probably use cast<BitInit>.... It doesn't seem like we should expect getBit to return something that's not a BitInit.

310 ↗(On Diff #97784)

You should probably comment that something ensures size == alignment.

436 ↗(On Diff #97784)

You already know the index before calling addEntryWithFlags. Maybe just pass the index along?

634 ↗(On Diff #97784)

Please run clang-format on the file before committing.

714 ↗(On Diff #97784)

Maybe this?

auto FoundRegInst = RegInsts.find(Opc);
if (FoundRegInst == RegInsts.end())
  continue;
std::vector<const CodeGenInstruction *> &OpcRegInsts = FoundRegInst->second;
728 ↗(On Diff #97784)

Should we also do a OpcRegInsts.erase(Match); here?

735 ↗(On Diff #97784)

Nit: No need for this empty line

118 ↗(On Diff #97233)

Why? Not specifying the first (outermost) dimension is acceptable.
Example: https://godbolt.org/g/HTnCIh
Did I miss something?

aymanmus updated this revision to Diff 98155.May 8 2017, 5:17 AM
aymanmus marked 14 inline comments as done.
aymanmus added inline comments.
test/CodeGen/X86/stack-folding-fp-avx1.ll
1654 ↗(On Diff #97233)

Fair enough. Thanks for the detailed answer.

utils/TableGen/X86FoldTablesEmitter.cpp
160 ↗(On Diff #97784)

Yes, some of the listed instructions have DAG patterns for memory folding which are marked with OptForSize predicate. A more "sophisticated" dealing should be added to VEX and EVEX instructions (which is detailed in the abandoned review comments).

310 ↗(On Diff #97784)

Thanks for noticing, I planned to change that and somehow forgot. Updated with better approach.

714 ↗(On Diff #97784)

The previous check:

if (RegInsts.count(Opc) == 0)
      continue;

makes sure RegInsts has an element with the required key.

118 ↗(On Diff #97233)

In your example you defined the array to be global, in this case it works.
It doesn't work when the array is defined as a class member.
Thanks to your previous comment I moved this table outside the class and updated it accordingly.

Where did the rest of the diff go?

aymanmus updated this revision to Diff 98171.May 8 2017, 8:23 AM

Correcting last update (uploaded wrong patch).
@RKSimon - Thanks

craig.topper edited edge metadata.May 9 2017, 9:44 AM

Can we try not to define so many of the methods inside the class itself? It makes for a very large class and an extra indentation level. I'm thinking if you do that most of the static methods in the class can just be static methods at file scope.

utils/TableGen/X86FoldTablesEmitter.cpp
213 ↗(On Diff #98171)

This method should be static now right?

235 ↗(On Diff #98171)

shouldn't this be static?

243 ↗(On Diff #98171)

Shouldn't this be static?

aymanmus updated this revision to Diff 98594.May 11 2017, 1:28 AM
aymanmus marked 3 inline comments as done.
RKSimon added inline comments.May 11 2017, 2:47 PM
utils/TableGen/X86FoldTablesEmitter.cpp
134 ↗(On Diff #98594)

Convert to any_of ?

142 ↗(On Diff #98594)

Convert to any_of ?

455 ↗(On Diff #98594)
for (unsigned i = 0, e = MemInst->Operands.size(); i < e; i++) {
598 ↗(On Diff #98594)
for (unsigned i = RegOutSize, e = RegInstr->Operands.size(); i < e; i++) {
craig.topper added inline comments.May 12 2017, 11:15 AM
utils/TableGen/X86FoldTablesEmitter.cpp
76 ↗(On Diff #98594)

Can this be an array now instead of a vector?

172 ↗(On Diff #98594)

Do we need the temporary string? Can we just stream as we go?

228 ↗(On Diff #98594)

This is a copy, can we just do

for (const X86FoldTableEntry &E : Table)

350 ↗(On Diff #98594)

Should this be X86RecognizableInstr.h?

562 ↗(On Diff #98594)

Can we use the OpEncBits field and compare against the encoding from X86Local?

588 ↗(On Diff #98594)

Is the best way we have to detect these instructions?

RKSimon added inline comments.May 13 2017, 8:11 AM
utils/TableGen/X86FoldTablesEmitter.cpp
257 ↗(On Diff #98594)

Shouldn't this be dyn_cast if you want it to fail gracefully? Otherwise possibly replace that PrintFatalError with an assertion at the start of the loop?

268 ↗(On Diff #98594)

Are we better off with an assertion?

271 ↗(On Diff #98594)

Again, use dyn_cast and assertions?

aymanmus updated this revision to Diff 98911.May 14 2017, 1:03 AM
aymanmus marked 6 inline comments as done.
aymanmus added inline comments.
utils/TableGen/X86FoldTablesEmitter.cpp
76 ↗(On Diff #98594)

We only want to search this list for a match, so I think defining a vector and using the llvm::find function is more suitable.

257 ↗(On Diff #98594)

It was a dynamic cast, but then @filcab noted that we can't expect getBit() to return something that is not a BitInit.
I personally agree with him.

268 ↗(On Diff #98594)

In this context I think this behavior is more suitable. Here we always try to compare the same fields of two different records, where the types of the fields must be the same.
Trying to compare two BitsInit with different sizes (thus different types) should alert that something wrong is happening.

588 ↗(On Diff #98594)

I think it is.

RKSimon added inline comments.May 19 2017, 6:54 AM
utils/TableGen/X86FoldTablesEmitter.cpp
257 ↗(On Diff #98594)

OK - but those cast<BitInit> can't fail so the if()'s should be removed

aymanmus updated this revision to Diff 99690.May 21 2017, 4:25 AM
aymanmus marked an inline comment as done.

Ping. (For dependent reviews also D32683, D32685)

This revision is now accepted and ready to land.May 26 2017, 9:31 AM
This revision was automatically updated to reflect the committed changes.

Folks, I know this has gone back and forth a few times, but I'm afraid I have to revert it again.

Ultimately, this patch does two things at once, and I think that caught a lot of people by surprise:

  1. It changes *how* the fold tables are built to use tablegen
  2. It changes *what* is in the fold tables

#1 should be a no-op change in terms of behavior. But #2 is a really big behavior change that has already caused us to track down three really painful miscompiles due to inappropriate folding. Essentially, the new tables aren't correct yet, and the fact that this new functionality came all at once and coupled with a more benign refactoring makes it hard to cope with.

I also have some design concerns about the mechanisms used here. The only way to fix these is to grow an ever larger table of instructions in the C++ source code for tablegen, which is (IMO) even worse than a big table of instructions in the backend. No one will even think to look here to find out that they should add instructions to this. There is no checking that you have named an instruction correctly. And it isn't even clear what the correct name is in some cases.

All of this is also being done without properly addressing the glaring TODOs in the memory folding tests in the X86 backend. For example, in test/CodeGen/X86/stack-folding-int-sse42.ll we have ; TODO: stack_fold_pextrw. This means we don't currently have any coverage of this instruction when doing folding. I think we need to have this kind of coverage before attempting to do #2, otherwise, we will run into bugs.

As it happens, I just debugged exactly this instruction. If I copy and paste the pextrd test like so:

define i16 @stack_fold_pextrw(<8 x i16> %a0) {
  ;CHECK-LABEL: stack_fold_pextrw
  ;CHECK:       pextrw $1, {{%xmm[0-9][0-9]*}}, {{-?[0-9]*}}(%rsp) {{.*#+}} 4-byte Folded Spill
  ;CHECK:       movl    {{-?[0-9]*}}(%rsp), %eax {{.*#+}} 4-byte Reload
  ; add forces execution domain
  %1 = add <8 x i16> %a0, <i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7, i16 8>
  %2 = extractelement <8 x i16> %1, i32 1
  %3 = tail call <2 x i64> asm sideeffect "nop", "=x,~{rax},~{rbx},~{rcx},~{rdx},~{rsi},~{rdi},~{rbp},~{r8},~{r9},~{r10},~{r11},~{r12},~{r13},~{r14},~{r15}"()
  ret i16 %2
}

It passes. Note that we do pextrw to a memory location and then reload it with movl. This will load garbage from the high 16 bits! We shouldn't be folding a 32-bit memory access into the memory operand of pextrw, but I can't even find where the pattern is that defines this lowering...

I could just blindly guess based on knowing how instructions are encoded in the X86 backend and probably disable folding for it, but it would be the third such disabling that I'm aware of. We can't just play whack-a-mole here.

So my concrete suggestions are:

  1. Redesign how the folding works so that it is enabled or disabled by the instruction pattern itself rather than by a table in this file with string literals in it. Or at least by something else in the X86 tablegen files rather than in this C++ code.
  2. Add the new tablegen emitter then, with just the flags in instruction patterns that generate the exact same table as was hard coded before.
  3. Audit the outstanding TODOs (I'll submit the above test case though as soon as I finish reverting) and fill in the obvious ones so we have better coverage of folding.
  4. Then start incrementally enabling more things with commensurate test updates as you go.

Does that make sense?

It'll probably take me an hour just to construct the revert, so shout if folks are really vehemently opposed to this. But it seems better for me to revert back to green than try to guess at how to stamp out yet another miscompile derived from this patch. =[

Reverted in r304762 and added another test case in r304763.

See the revert commit log, but I've carefully tried to preserve the added testing which shouldn't change when this comes back in, and I've also preserved as much of the cleanups to other code that also happened to touch this file.

Let me know if I can help in any way with getting this patch re-configured and back in place!

lsaba added a subscriber: lsaba.Jun 6 2017, 2:19 AM

Hi @chandlerc,
Sorry for the mess caused by this patch. It indeed included many changes and a big behavior changing.
Thanks for the gentle handling in reverting and adding test for future use.
We will reconsider our approach of solving this issue taking into account the currently known and other possible (yet not known) exceptions.