This is an archive of the discontinued LLVM Phabricator instance.

Cost calculation for interleave load/store patterns {v8i8,v16i8,v32i8,v64i8}
ClosedPublic

Authored by m_zuckerman on Oct 10 2017, 1:56 PM.

Details

Summary

This patch adds accurate instructions cost. The formula presents two cases(stride 3 and stride 4) and calculates the cost according to the VF and stride.

Diff Detail

Repository
rL LLVM

Event Timeline

m_zuckerman created this revision.Oct 10 2017, 1:56 PM
delena added inline comments.Oct 10 2017, 10:36 PM
lib/Target/X86/X86TargetTransformInfo.cpp
2552 ↗(On Diff #118468)

returns true

2557 ↗(On Diff #118468)

Why "optimize"? It is just calculation.

2574 ↗(On Diff #118468)

I don't like hard-coded numbers inside functions. I suggest to use lookup tables with types.

dorit edited edge metadata.Oct 10 2017, 11:01 PM

My only main concerns are with respect to interleave-group with gaps (see below), and the fact that we don't distinguish the AVX2 from the AVX512 case (also see below). Just minor comments beyond that.

lib/Target/X86/X86TargetTransformInfo.cpp
2556 ↗(On Diff #118468)

Can use \p when you refer to the function arguments.

Some phrasing suggestions:
"the Interleaved pass" --> the X86InterleavedAccess pass
"supports the interleaved" --> supports the specific interleaved access group at hand.
"If the interleaved is supported by the pass..." --> If it does, the function computes the cost of the optimized load/store+shuffle sequence that the X86InterleavedAccess pass will generate for this interleaved-access group.

2562 ↗(On Diff #118468)

v64i8 is not a legal type for AVX2... will you be getting the right cost here for AVX2?

2563 ↗(On Diff #118468)

Maybe add a comment here "Currently the X86InterleavedAccess pass supports only char accesses,"

2570 ↗(On Diff #118468)

Would be nice if you could add some intuition to the numbers below...

2612 ↗(On Diff #118468)

What if the interleave-group has gaps? does the X86InterleavedAccess pass still support it? (I think not, right?). In that case you want to move the call to the cost function to after this check.

2729 ↗(On Diff #118468)

Why did you place the call to optimizeInterleaveCost specifically here, if it's not using the MemOpCost calculation?

2766 ↗(On Diff #118468)

is this the right indentation? (same for the occurrence below)

m_zuckerman marked 3 inline comments as done.
m_zuckerman marked 3 inline comments as done.
m_zuckerman added inline comments.
lib/Target/X86/X86TargetTransformInfo.cpp
2562 ↗(On Diff #118468)

according to what I saw in the code, only legal types pass into this section and therefore you don't need to care for that.

m_zuckerman marked 2 inline comments as done.
dorit added a comment.Oct 16 2017, 5:57 AM

The AVX2 changes look ok to me now.
A couple comments about the AVX512 changes below.

lib/Target/X86/X86TargetTransformInfo.cpp
2660 ↗(On Diff #119132)

I think the common practice here is to place such cost tables inside the function, exactly where it is used.

Also, does the cost in these two tables include the load/store instructions, or does it account only for the shuffles?
It looks like you don't add the MemOpCost to the cost that the table returns, so probably the load/store cost is already accounted for; is that intentional?
(If so, the semantics of this table is a bit different than that of the AVX2 counterpart above (see the description of the AVX2 tables above), so this should be documented. For example, the comments in the table should not have the "load" and "store" in parenthesis... ).

2708 ↗(On Diff #119132)

Please add a comment here saying that if the X86InterleaveAccessPass supports the specific interleaved access group at hand, we return the hard coded cost of the shuffle sequence that the X86InterleaveAccessPass will generate. Otherwise we compute the cost of the (unoptimized) shuffle sequence (that would be a result of lowering each of the IR shuffles of the interleaved group independently).

m_zuckerman added inline comments.
lib/Target/X86/X86TargetTransformInfo.cpp
2660 ↗(On Diff #119132)

In the mine computation, I check all the instructions include load and store. For the value of the table to be consistent with either avx2 code or avx512 I had to adapt the computation to the user code.

dorit added a comment.Oct 17 2017, 3:32 AM

AVX512 side of things now also looks good to me (with the tiny comments below).

lib/Target/X86/X86TargetTransformInfo.cpp
2692 ↗(On Diff #119277)

you can fit in more text in this line; and while you're touching this, maybe clarify this comment saying that the cost in the table accounts only for the shuffle sequence (the loads/stores are accounted for separately);

2765 ↗(On Diff #119277)

"If entry" --> "[ ]If [an] entry"
(in both occurrences of this comment).

dorit accepted this revision.Oct 17 2017, 3:47 AM

LGTM with the last couple of comments.

This revision is now accepted and ready to land.Oct 17 2017, 3:47 AM
This revision was automatically updated to reflect the committed changes.