These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd
Details
Diff Detail
Event Timeline
Hi Craig.
The change looks good. The ReadAdvance is now applied to the right operand.
However, Haswell model should be fixed too.
At the moment, an InstrRW definition is overriding the SchedRW list for BZHI(32|64)rm. So, we don't get the ReadAdvance value at all.
You can see this if you look at scheduling class id "BZHI32rm_BZHI64rm" (which should be index #1246) in X86GenSubtargetInfo.inc. That scheduling class declares 0 read-advance entries.
Same for BEXTR (see the InstrRW at line 1609 in X86SchedHaswell.td). I didn't check other Intel models. However I expect that the same issue affects all intel processors for which the model extends haswell model.
If you test the following code in llvm-mca (-mcpu=haswell -iterations=1 -timeline):
add %edi, %esi bzhil %esi, (%rdi), %eax
You get the following output:
Timeline view: Index 0123456789 [0,0] DeER . . addl %edi, %esi [0,1] D=eeeeeeER bzhil %esi, (%rdi), %eax ^
The bzhil should be able to start immediately at cycle 1. However, it waits until the addl reaches the write-back stage.
The right timeline should have been:
Timeline view: Index 0123456789 [0,0] DeER . . addl %edi, %esi [0,1] DeeeeeeER. bzhil %esi, (%rdi), %eax
You can see the same problem if you replace bzhil with bextrl.
So basically the Intel per instruction overrides have broken every instruction with a folded load because we lost the ReadAfterLd on all of them?
For example
[0,0] DeER . . addl %edi, %esi
[0,1] D=eeeeeeER addl (%rdi), %esi
Looks like it... Sounds scary.
I just tried that same code snippet.
For btver2 I get the expected result:
Index 0123456 [0,0] DeER .. addl %edi, %esi [0,1] DeeeeER addl (%rdi), %esi `
Unfortunately I don't get the correct ReadAdvance on Haswell (I get the same output that you posted).
On Haswell, the SchedClassID for the add with the folded memory operand is #912.
According to X86GenSubtargetInfo.inc
{DBGFIELD("ADD16rm_ADD32rm_ADD64rm_ADD8rm_AND16rm_AND32rm_AND64rm_AND8rm_CMP16mi_CMP16mi8_CMP32mi_CMP32mi8_CMP64mi32_CMP64mi8_CMP8mi_CMP8mi8_CMP16mr_CMP32mr_CMP64mr_CMP8mr_CMP16rm_CMP32rm_CMP64rm_CMP8rm_OR16rm_OR32rm_OR64rm_OR8rm_SUB16rm_SUB32rm_SUB64rm_SUB8rm_TEST16mr_TEST32mr_TEST64mr_TEST8mr_TEST16mi_TEST32mi_TEST64mi32_TEST8mi_XOR16rm_XOR32rm_XOR64rm_XOR8rm") 2, false, false, 12, 4, 3, 1, 0, 0}, // #912
The last two numbers are respectively the "ReadAdvanceIdx", and the number of read-advance entries. Both are zero for Haswell, which means we lost that information.
Basically, all the instructions that use HWWriteResGroup18 had lost the ReadAdvance info.
About the bzhi/bextr change:
Your new patch looks good to me.
I didn't check the new timeline with llvm-mca for the bzhi/bextr. However I trust you that the read-advance is now there.
Thanks,
-Andrea
You can just enumerate opcodes using (instrs BEXTR32rm, BEXTR64rm).