This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel][Legalizer] More concise and faster widenScalar, NFC
ClosedPublic

Authored by rtereshin on May 3 2018, 5:35 PM.

Details

Summary

Refactoring LegalizerHelper::widenScalar member function reducing its
size by approximately a factor of 2 and (hopefuly) making it more
straightforward and regular by introducing widenScalarSrc and
widenScalarDst helper methods.

The new widenScalar* methods mutate the instructions in place instead
of recreating them from scratch and removing the originals. The
compile time implications of this were measured on sqlite3
amalgamation, targeting AArch64 in -O0:

LegalizerHelper::widenScalar: > 25% faster
Legalizer::runOnMachineFunction: ~ 4.0 - 4.5% faster

Also adding MachineOperand::setCImm and refactoring out
MachineIRBuilder::recordInsertion methods to make the change possible.

Diff Detail

Repository
rL LLVM

Event Timeline

rtereshin created this revision.May 3 2018, 5:35 PM
rtereshin updated this revision to Diff 145301.May 4 2018, 2:50 PM

I have simplified widenScalar{Src,Dst} methods a bit by using a tad more high-level MachineIRBuilder methods.

@rtereshin . Looks good and much cleaner. Thanks for working on this.

lib/CodeGen/GlobalISel/LegalizerHelper.cpp
730 ↗(On Diff #145301)

This bit looks like it's not NFC (but seems more correct). If it's indeed not NFC, then please push this in a separate commit.

rtereshin marked an inline comment as done.May 8 2018, 2:31 PM
rtereshin added inline comments.
lib/CodeGen/GlobalISel/LegalizerHelper.cpp
730 ↗(On Diff #145301)

I believe it is an NFC, take a look at the MachineIRBuilder::buildConstant that was here before: https://github.com/llvm-mirror/llvm/blob/0cb4ac08f0b0388854df5ff419fc56c5e63361dc/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp#L243

sextOrTrunc will always resolve to sext there as the new type is always larger, it's a widenScalar after all.

Also, I have recently committed a MachineVerifier patch that checks that we don't have extends down or truncates up: https://reviews.llvm.org/rL331718 Hopefully that helps a bit.

aditya_nandakumar accepted this revision.May 8 2018, 3:34 PM
aditya_nandakumar added inline comments.
lib/CodeGen/GlobalISel/LegalizerHelper.cpp
730 ↗(On Diff #145301)

Thanks. I hadn't seen the implementation of buildConstant.

Should be good.

This revision is now accepted and ready to land.May 8 2018, 3:34 PM
rtereshin marked 3 inline comments as done.May 8 2018, 3:46 PM

@aditya_nandakumar Yes, I do have my doubts that the original implementation with sign extension is always correct, but it's NFC here anyway. I'm thinking to take a look at it at some point and see if I can come up with a breaking test case for that sign extend.

I'm also worried about how we legalize some "generic-generic" opcodes, like G_PHI - the ones that are supposedly universal and can handle floating point values and integer values both - if we always G_TRUNCate them, for instance, how is it going to inter-operate with floating point values and floating point opcodes?

Thank you for looking into this,
Roman

This revision was automatically updated to reflect the committed changes.

Yes, I do have my doubts that the original implementation with sign extension is always correct, but it's NFC here anyway. I'm thinking to take a look at it at some point and see if I can come up with a breaking test case for that sign extend.

This should be ok. It's really an any-extend until another MI forces it one way or the other. Suppose we have:

%0:_(s32) = G_CONSTANT i32 ...
%1:_(s32) = G_OPCODE %0

after widening G_CONSTANT we get:

%0:_(s64) = G_CONSTANT i64 anyext_from_i32(...)
%1:_(s32) = G_TRUNC %0
%2:_(s32) = G_OPCODE %1

When we legalize G_OPCODE we get one of the following cases:
It consumes the original type, leaving the truncate and removing the undefined bits:

%0:_(s64) = G_CONSTANT i64 anyext_from_i32(...)
%1:_(s32) = G_TRUNC %0
%2:_(s32) = G_OPCODE %1

It widens and doesn't care about the excess bits:

%0:_(s64) = G_CONSTANT i64 anyext_from_i32(...)
%1:_(s32) = G_TRUNC %0
%2:_(s64) = G_ANYEXT %1
%2:_(s64) = G_OPCODE %2

which simplifies to:

%0:_(s64) = G_CONSTANT i64 anyext_from_i32(...)
%2:_(s64) = G_OPCODE %0

It widens and does care about the excess bits:

%0:_(s64) = G_CONSTANT i64 anyext_from_i32(...)
%1:_(s32) = G_TRUNC %0
%2:_(s64) = G_SEXT %1
%2:_(s64) = G_OPCODE %2

which simplifies to:

%0:_(s64) = G_CONSTANT i64 signext_from_i32(...)
%2:_(s64) = G_OPCODE %0

The narrowScalar case is a mixture of the two. The portions that correspond solely to extension bits end up dead and removed while the portions that partially correspond to both extension bits and value bits act like widenScalar