This is an archive of the discontinued LLVM Phabricator instance.

[X86][BOLT] Use getOperandType to determine memory access size
ClosedPublic

Authored by Amir on May 20 2022, 11:33 PM.

Details

Summary

Generate INSTRINFO_OPERAND_TYPE table in X86GenInstrInfo.inc.

This diff adds support for instructions that were previously reported as having
memory access size 0. It replaces the heuristic of looking at instruction
register width to determine memory access width by instead checking the memory
operand type using tablegen-provided tables.

Diff Detail

Event Timeline

Amir created this revision.May 20 2022, 11:33 PM
Herald added a reviewer: maksfb. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
Amir updated this revision to Diff 433862.Jun 2 2022, 1:25 PM

clang-format

Amir published this revision for review.Jun 2 2022, 1:26 PM
Amir added a reviewer: skan.
Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2022, 1:26 PM
Amir added inline comments.Jun 2 2022, 1:27 PM
bolt/lib/Target/X86/X86MCPlusBuilder.cpp
1011

getMemDataSize may be moved to Target/X86 files, but I'm not sure of any uses for it there.

maksfb added a comment.Jun 2 2022, 5:25 PM

shrinkwrapping-lock.s is failing. Is this a true NFC?

Amir added a comment.Jun 2 2022, 5:54 PM

shrinkwrapping-lock.s is failing. Is this a true NFC?

It should be, but apparently it's not. I'll investigate.

No, this diff is not NFC. It's actually pretty impactful (and thanks for working on this Amir).

If you change that, shrink wrapping will recognize way more instructions and will analyze more functions, causing the optimization to activate more often.

Because of that, we need to update tests. We may want to coordinate landing this together with my patchset of 5 diffs, so I can update all tests only once, instead of multiple times, one per diff.

Let me try to elaborate on why this diff is important and has functional changes.

Previously, we were looking at register size to try to gauge how wide is the memory access: either source or destination register for a memory access. If we couldn't find one, we would say it is a "size zero" access. Returning "size 0" essentially disables all frame analyses because you have an unrecognized access to the stack. This diff looks at memory operands, which makes it cover a lot more cases.

We should probably add to this to the comments of this function so it becomes clear:

Classifying a stack access as *not* "SIMPLE" here means we don't know how to change this instruction memory access. It will disable any changes to the stack layout, so we can't do the most aggressive form of shrink wrapping. We must do so in a way that keeps the original stack layout. Otherwise you need to adjust the offset of all instructions accessing the stack: we can't do that anymore because there is one instruction that is not simple. There are other implications as well. We have heuristics to detect when a register is callee-saved and thus eligible for shrink wrapping. If you are restoring a register using a non-simple stack access, then it is classified as NOT callee-saved, and it disables shrink wrapping for *that* register (but not for others).

Classifying a stack access as "size 0" or detecting an indexed memory access (to address a vector, for example) here means we know there is a stack access, but we can't quite understand how wide is the access in bytes. This is very serious because we can't understand how memory accesses alias with each other for this function. This will essentially disable not only shrink wrapping but all frame analysis, it will fail it as "we don't understand this function and we give up on it". Because this diff changes the instructions classified as "size 0", it pretty much boosts how general are all frame optimizations.

Turns out that we were failing to classify the size of quite a few instructions, that's why I have the diff on https://reviews.llvm.org/D126112 to try to cover a few of the cases of non-simple stack accesses, but whose size should be different than zero. As you can see, those are instructions that lack a register operand. I also have one extra record for an instruction that does have a register, but it was wrongly classified as non-simple.

Btw from BOLT perspective this diff LGTM. Regarding the changes to X86 target, those look minimal and rather benign to me. It is apparently using tablegen info (GET_INSTRINFO_OPERAND_TYPE) that is not used by any other backend that is in-tree, but TableGen supports generating this info and it is even in their testsuite. Even if it has a bug because it hasn't been used, I think it makes sense to try for us as it obviously won't break any existing code as we are the only users of this information. We should wait for someone responsible for the X86 target to approve this change, though.

Amir added a comment.Jun 3 2022, 1:28 PM
This comment was removed by Amir.
Amir updated this revision to Diff 434131.Jun 3 2022, 1:39 PM

Added Rafael's comment for isStackAccess

@rafaelauler:
Thank you for clarifications. I've added your comment verbatim to isStackAccess.
How do you want to proceed with this diff - should we let D126112 in first, or this one first?

Amir retitled this revision from [X86][BOLT][NFC] Use getOperandType to determine memory access size to [X86][BOLT] Use getOperandType to determine memory access size.Jun 3 2022, 1:45 PM
Amir edited the summary of this revision. (Show Details)

We should time it to land in the same day and then land changes to tests that compensates for the aggregate of all changes. Regarding the order, I don't have any preference, as long as both patchsets hit the repo in the same day.

(I'm talking about a separate test repo, btw. The in-tree test should still be fixed individually for each diff to avoid breaking bisecting tools)

skan added inline comments.Jun 5 2022, 11:08 PM
bolt/lib/Target/X86/X86MCPlusBuilder.cpp
1013–1058

Could we avoid writing this table manually here? We already have similar things e.g X86Disassembler::getMemOperandSize? Maybe we can add one bit for class X86MemAsmOperand, and auto-generate this table.

llvm/lib/Target/X86/X86InstrInfo.cpp
9651 ↗(On Diff #434131)

I am worried about this will increase the build time and memory usage a lot..
Should we put this into a bolt-only file?

Amir added inline comments.Jun 13 2022, 9:21 PM
bolt/lib/Target/X86/X86MCPlusBuilder.cpp
1013–1058

Will do

Amir updated this revision to Diff 436921.Jun 14 2022, 1:56 PM

Address feedback

Amir marked 2 inline comments as done.
skan accepted this revision.Jun 15 2022, 7:02 PM

LGTM

This revision is now accepted and ready to land.Jun 15 2022, 7:02 PM
Amir updated this revision to Diff 437436.Jun 15 2022, 9:11 PM

Use non-expanded memory operands

This revision was landed with ongoing or failed builds.Jun 30 2022, 12:25 AM
This revision was automatically updated to reflect the committed changes.