This is an archive of the discontinued LLVM Phabricator instance.

[X86] Correct the placement of ReadAfterLd in BEXTR and BZHI
ClosedPublic

Authored by craig.topper on Mar 23 2018, 11:28 AM.

Details

Summary

These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Mar 23 2018, 11:28 AM

Hi Craig.

The change looks good. The ReadAdvance is now applied to the right operand.

However, Haswell model should be fixed too.
At the moment, an InstrRW definition is overriding the SchedRW list for BZHI(32|64)rm. So, we don't get the ReadAdvance value at all.
You can see this if you look at scheduling class id "BZHI32rm_BZHI64rm" (which should be index #1246) in X86GenSubtargetInfo.inc. That scheduling class declares 0 read-advance entries.

Same for BEXTR (see the InstrRW at line 1609 in X86SchedHaswell.td). I didn't check other Intel models. However I expect that the same issue affects all intel processors for which the model extends haswell model.

If you test the following code in llvm-mca (-mcpu=haswell -iterations=1 -timeline):

add    %edi, %esi
bzhil   %esi, (%rdi), %eax

You get the following output:

Timeline view:

Index   0123456789

[0,0]   DeER .   .      addl    %edi, %esi
[0,1]   D=eeeeeeER      bzhil   %esi, (%rdi), %eax
         ^

The bzhil should be able to start immediately at cycle 1. However, it waits until the addl reaches the write-back stage.
The right timeline should have been:

Timeline view:

Index   0123456789

[0,0]   DeER .   .      addl    %edi, %esi
[0,1]   DeeeeeeER.      bzhil   %esi, (%rdi), %eax

You can see the same problem if you replace bzhil with bextrl.

andreadb added inline comments.Mar 29 2018, 8:54 AM
lib/Target/X86/X86ScheduleZnver1.td
573 ↗(On Diff #139622)

You can just enumerate opcodes using (instrs BEXTR32rm, BEXTR64rm).

580 ↗(On Diff #139622)

Same. You can use (instrs BZHI32rm, BZHI64rm).

So basically the Intel per instruction overrides have broken every instruction with a folded load because we lost the ReadAfterLd on all of them?

For example
[0,0] DeER . . addl %edi, %esi
[0,1] D=eeeeeeER addl (%rdi), %esi

Add dedicated SchedRW for BZHI and BEXTR and remove all of the regexes.

So basically the Intel per instruction overrides have broken every instruction with a folded load because we lost the ReadAfterLd on all of them?

For example
[0,0] DeER . . addl %edi, %esi
[0,1] D=eeeeeeER addl (%rdi), %esi

Looks like it... Sounds scary.

I just tried that same code snippet.
For btver2 I get the expected result:

Index   0123456

[0,0]   DeER .. addl    %edi, %esi
[0,1]   DeeeeER addl    (%rdi), %esi
`

Unfortunately I don't get the correct ReadAdvance on Haswell (I get the same output that you posted).

On Haswell, the SchedClassID for the add with the folded memory operand is #912.
According to X86GenSubtargetInfo.inc

{DBGFIELD("ADD16rm_ADD32rm_ADD64rm_ADD8rm_AND16rm_AND32rm_AND64rm_AND8rm_CMP16mi_CMP16mi8_CMP32mi_CMP32mi8_CMP64mi32_CMP64mi8_CMP8mi_CMP8mi8_CMP16mr_CMP32mr_CMP64mr_CMP8mr_CMP16rm_CMP32rm_CMP64rm_CMP8rm_OR16rm_OR32rm_OR64rm_OR8rm_SUB16rm_SUB32rm_SUB64rm_SUB8rm_TEST16mr_TEST32mr_TEST64mr_TEST8mr_TEST16mi_TEST32mi_TEST64mi32_TEST8mi_XOR16rm_XOR32rm_XOR64rm_XOR8rm") 2, false, false, 12, 4,  3, 1,  0, 0}, // #912

The last two numbers are respectively the "ReadAdvanceIdx", and the number of read-advance entries. Both are zero for Haswell, which means we lost that information.
Basically, all the instructions that use HWWriteResGroup18 had lost the ReadAdvance info.

andreadb accepted this revision.Mar 29 2018, 1:27 PM

About the bzhi/bextr change:
Your new patch looks good to me.

I didn't check the new timeline with llvm-mca for the bzhi/bextr. However I trust you that the read-advance is now there.

Thanks,
-Andrea

This revision is now accepted and ready to land.Mar 29 2018, 1:27 PM
This revision was automatically updated to reflect the committed changes.