This is an archive of the discontinued LLVM Phabricator instance.

[X86] Change the tuning settings for pentium4 to be more modern since its the default 32-bit cpu in clang
ClosedPublic

Authored by craig.topper on Jul 15 2020, 3:00 PM.

Details

Summary

Alternative to D83897. I believe the big change here is that I removed slow unaligned memory 16

Down side that it may adversely effect tuning if someone explicitly targets -march=pentium4 and expects pentium4 tuned code. Of course pentium4 is so old our default behavior with the previous settings may not have been the best either.

Diff Detail

Event Timeline

craig.topper created this revision.Jul 15 2020, 3:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2020, 3:00 PM
Herald added subscribers: jfb, hiraditya. · View Herald Transcript
echristo accepted this revision.Jul 15 2020, 3:15 PM

LGTM.

This revision is now accepted and ready to land.Jul 15 2020, 3:15 PM
MaskRay added inline comments.
llvm/lib/Target/X86/X86.td
1080

Typo?

craig.topper marked an inline comment as done.Jul 15 2020, 3:21 PM

I'll probably wait a day to see if @RKSimon or @spatel have any opinion

llvm/lib/Target/X86/X86.td
1080

thanks. I'll fix it before I commit.

RKSimon accepted this revision.Jul 16 2020, 1:29 AM

LGTM - no objections, I don't think we have much that specifically targets optimal codegen on p4 - a few of the TTI costs and thats about it.

llvm/lib/Target/X86/X86.td
1078

Remove this?

I don't object, but this is less sustainable than adding a generic 32-bit model to match the generic 64-bit model (the alternate D83897 patch IIUC).

Maybe out-of-scope for this patch, but there's a bigger problem in that (AFAIK), we have no plan for updating the generic tuning. Ie, the 64-bit generic model was created 6 years ago and grows less relevant daily:

// We currently use the Sandy Bridge model as the default scheduling model as
// we use it across Nehalem, Westmere, Sandy Bridge, and Ivy Bridge which
// covers a huge swath of x86 processors.

Should we create some metrics based on time or popularity that guide us on updating that? For example, are we stuck assuming SSE2-only for the generic model forever, or can we decide that 12+ years past the introduction of SSE4.1 lets us assume that is available/preferred?

I don't object, but this is less sustainable than adding a generic 32-bit model to match the generic 64-bit model (the alternate D83897 patch IIUC).

The "x86-64" CPU has existed for longer than 6 years. It was originally created to be the common subset between "nocona" and "k8" to be a good default CPU for any 64-bit system. It's missing cmpxchg16b for example. It picked up the generic tuning 6 years ago. But it should really have tuning for something closer to nocona/k8. If we had true mtune support, -mtune=x86-64 would tune for k8/nocona. At I think that's how it works in gcc.

Maybe out-of-scope for this patch, but there's a bigger problem in that (AFAIK), we have no plan for updating the generic tuning. Ie, the 64-bit generic model was created 6 years ago and grows less relevant daily:

// We currently use the Sandy Bridge model as the default scheduling model as
// we use it across Nehalem, Westmere, Sandy Bridge, and Ivy Bridge which
// covers a huge swath of x86 processors.

Should we create some metrics based on time or popularity that guide us on updating that? For example, are we stuck assuming SSE2-only for the generic model forever, or can we decide that 12+ years past the introduction of SSE4.1 lets us assume that is available/preferred?

This revision was automatically updated to reflect the committed changes.