This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add IMUL scheduling info on sandybridge, fix it on >=haswell.
ClosedPublic

Authored by courbet on Feb 19 2018, 6:16 AM.

Details

Summary

Only IMUL16rri uses an extra P0156. IMUL32* and IMUL16rr only use P1.
This was computed using https://github.com/google/EXEgesis/blob/master/exegesis/tools/compute_itineraries.cc

This can easily be validated by running perf on the following code:

int main(int argc, char**argv) {
  int a = argc;
  int b = argc;
  int c = argc;
  int d = argc;

  for (int i = 0; i < LOOP_ITERATIONS; ++i) {
    asm volatile(
      R"(
        .rept 10000
        imull $0x2, %%edx, %%eax
        imull $0x2, %%ecx, %%ebx
        imull $0x2, %%eax, %%edx
        imull $0x2, %%ebx, %%ecx
        .endr
      )"
      : "+a"(a), "+b"(b), "+c"(c), "+d"(d)
      :
      :);
  }
  return a+b+c+d;
}

-> test.cc

perf stat -x, -e cycles --pfm-events=uops_executed_port:port_0:u,uops_executed_port:port_1:u,uops_executed_port:port_2:u,uops_executed_port:port_3:u,uops_executed_port:port_4:u,uops_executed_port:port_5:u,uops_executed_port:port_6:u,uops_executed_port:port_7:u test

Fixes #36084

Diff Detail

Event Timeline

courbet created this revision.Feb 19 2018, 6:16 AM
courbet edited the summary of this revision. (Show Details)Feb 19 2018, 6:18 AM

All the checks still pass?
I though there were some tests to dump the current scheduling info.

All the checks still pass?
I though there were some tests to dump the current scheduling info.

They do (check-llvm).

All the checks still pass?
I though there were some tests to dump the current scheduling info.

They do (check-llvm).

The schedule tests only report latency and throughput, this patch doesn't appear to affect those values.

@craig.topper mentioned that IMUL support is missing from SandyBridge as well (PR36084)

courbet added a comment.EditedFeb 19 2018, 7:21 AM

@craig.topper mentioned that IMUL support is missing from SandyBridge as well (PR36084)

I'll try to put my hands on a sandybridge and run the tool there.

@craig.topper mentioned that IMUL support is missing from SandyBridge as well (PR36084)

I'll try to put my hands on a sandybridge and run the tool there.

I've run the tool on snb:

  • IMUL32*, IMUL16rr and IMUL8r use 1xP1
  • IMUL16rri8 uses 1xP1 + 1xP015
  • We don't handle IMUL16rri because the 16-bit value results in an LCP stall, but I will assume it's similar to IMUL16rri8

I'll udpate the diff with the results.

courbet updated this revision to Diff 134919.Feb 19 2018, 8:02 AM

Add scheduling info for IMUL* on sandybridge.

courbet updated this revision to Diff 134923.Feb 19 2018, 8:03 AM

Fix typo.

courbet edited the summary of this revision. (Show Details)Feb 19 2018, 8:04 AM
courbet retitled this revision from [X86] Fix scheduling info for IMUL on haswell onwards. to [X86] Add IMUL scheduling info on sandybridge, fix it on >=haswell..
craig.topper accepted this revision.Mar 6 2018, 9:54 AM

LGTM. This makes sense to me. The extra uop on IMUL16rri/IMUL16rri8 is probably there to preserve the upper bits of the destination register since that operand is not part of the multiply operation. It makes sense that IMUL16rr wouldn't need that.

This revision is now accepted and ready to land.Mar 6 2018, 9:54 AM
This revision was automatically updated to reflect the committed changes.