This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Implement canCombineStoreAndExtract and provide the missing codegen patterns

Authored by nemanjai on Mar 15 2018, 10:34 AM.



We have VSX store instructions that will store a single element from a vector without modifying it in any way. Previous generation cores can do this for word-sized elements and Power9 can also do it for half-word and byte-sized elements.
The TableGen patterns for the word-sized versions were missing - this patch adds them.

Furthermore, it provides the information about the cost of such a combine - zero cost when the index is the element number that the instruction stores, cost of 3 for other elements. The cost for the other elements is because a vector permute is needed to shift the element and it's only worthwhile if keeping the value as a vector reduces the cost enough to offset the cost of the permute.

Diff Detail


Event Timeline

nemanjai created this revision.Mar 15 2018, 10:34 AM
nemanjai updated this revision to Diff 138591.Mar 15 2018, 11:35 AM

Add the missing predicates on the patterns.

lei added a subscriber: lei.Apr 26 2018, 12:43 PM
lei added inline comments.

nit: line past col 80


Should these 2 lines be CHECK-DAG since they don't necessarily have to happen in this order?



I think we generally allow code in test cases to have lines however long clang produces them.


They certainly don't need to happen in this order, but I find that changing the CHECK directive produced by the tool makes the test case harder to maintain sometimes.

nemanjai updated this revision to Diff 170392.Oct 22 2018, 5:14 AM

Implemented all the patterns for codegen of extract->store.

nemanjai retitled this revision from [PowerPC] Implement canCombineStoreAndExtract and provide the missing pattern for the combination to [PowerPC] Implement canCombineStoreAndExtract and provide the missing codegen patterns.Oct 22 2018, 5:15 AM
nemanjai added reviewers: jsji, steven.zhang.
steven.zhang added inline comments.Oct 25 2018, 12:27 AM

I didn't take deep look at the implementation for this patch. The condition here seems not quite align with the comments. If the bitwidth is 32bit, we will combine the store and extract no matter if it is Power9 or not. I am not sure if this is by intention.

jsji added a comment.Nov 6 2018, 3:33 PM

It is a good idea if we can CombineStoreAndExtract to save one instruction.
But looks like to me that it is not always safe and beneficial to do so, especially for floating points.


This looks like not related to "extract+store" exploitation? Maybe we should do it in another patch?


I would be better if we can split this new testcase into new patch, then just show diff due to this change here.


How about pwr9? Any change due to this patch?

69 ↗(On Diff #170392)

This looks not better than using stfd ? Can we avoid combining this case?

122 ↗(On Diff #170392)

Are we sure that the semantic are equivalent here for stfs-> stfiwx?

With stfs: The contents of register FRS are converted to single format (see page 160) and stored into the word in storage addressed by EA.,

With stfiwx: (FRS)32:63 are stored, without conversion, into the word in storage addressed by EA.

So, is it safe to assume that there is not difference due to conversion?

157 ↗(On Diff #170392)

Similar to above, are we sure stxsiwx will store the save value as stfs for all single precision values?

21 ↗(On Diff #170392)

Looks like not related to "extract+store" exploitation? Maybe we should do it in another patch?

nemanjai abandoned this revision.Dec 31 2018, 9:24 AM
nemanjai marked an inline comment as done.

Upon closer inspection, this actually almost never fires on PPC so spending any more time on it does not seem useful. Abandoning this patch.


That is correct. And I feel that the comment does not suggest otherwise. Perhaps if I change the comment to something like:

// Prior to Power9, we only have an instruction that combines a store and extract
// for i32 (STXSIWX). ISA 3.0 (Power9) introduced instructions that do this for
// subword types (i8, i16). There is also no advantage to doing this for i64 on
// any subtarget.