This is an archive of the discontinued LLVM Phabricator instance.

Remove redundant direct moves when extracting integers and converting to FP
ClosedPublic

Authored by nemanjai on Jun 14 2016, 2:55 PM.

Details

Summary

Since we implemented extraction of integer vector elements using direct moves and we use direct moves for int to fp conversions, we end up with strange code when doing both operations. Namely, extracting an element and then converting that to fp results in a direct move out of a VSR followed immediately by a direct move back to a VSRL. This patch just eliminates both of those direct moves.
Power9 instructions won't help this situation so we would still have this issue with Power9 if we don't fix it with a patch like this.

A similar situation exists for extracting/converting integers of other widths, but the fix is non-trivial and it is not clear that there are alternatives that are much better. This can be addressed in the future.

This patch fixes PR 28117.

Diff Detail

Repository
rL LLVM

Event Timeline

nemanjai updated this revision to Diff 60762.Jun 14 2016, 2:55 PM
nemanjai retitled this revision from to Remove redundant direct moves when extracting integers and converting to FP.
nemanjai updated this object.
nemanjai set the repository for this revision to rL LLVM.
nemanjai added a subscriber: llvm-commits.
amehsan edited edge metadata.Jun 14 2016, 8:54 PM

This is how I think of this problem. Currently, when we encounter extract_element of a vector of integers, we generate shift and direct moves. That causes some inefficient patterns like the example for which I opened this bug.

I prefer to generalize the problem and, if possible, come up with a solution for the most general case, instead of focusing on this particular pattern. The more general problem that I can see is this: When we extract an element of an integer vector there are a number of possibilities.

  1. all uses of the extracted element are in scalar integer operations.
  2. The extracted element is going to be converted to a floating point and used in scalar or vector FP calculations.
  3. The extracted element is going to be used in some subsequent vector integer calculations (for example inserted into a vector of int).
  4. some mix of the above three cases.

Now I am not familiar with all ISD opcodes, so the rest of this comment, is missing some concrete details, but I hope my point is clear.

I think the right solution, will look at all uses of the extract_element and then based on those uses it will make sure the DAG has proper ISD opcodes so the subsequent pattern matching generates the right code.

For example if we only have (1) then we want the current pattern to be generated. If we have (1) and (2) then we need the direct move, but we don't want the users of type (2) to convert it back from integer to fp. I am not sure if (2) and (3) has to be treated as two different cases or not. The only difference here is register classes but probably at that stage we are still not aware of register classes.

I believe it should be possible to have DAG combine to massage the DAG depending on what kind of users extract_element has and then some simple and generic pattern matching will generate efficient code. Hopefully with this approach we do not need to hardcode many different patterns in the target description files.

If I am missing something about DAGCombine, or otherwise, that makes this approach non-practical, please let me know.

It is also possible that we can fix this as a more general problem in DAGCombine (similar to what I said in the previous comment), but instead of having a extract-element-centric solution, we have some other kind of solution. For example something that looks at all converts in the DAG and removes unnecessary ones. So again, consider my previous comment as an example of a more general solution which might be practical. There might be better alternatives.

I am not sure I fully follow what you are suggesting. I understand that you are suggesting that we should look more generally into why we are extracting a vector element to begin with (i.e. what the uses are). However, I do not follow if you feel this patch should be abandoned in the interest of a more general solution (for which we haven't even fully formulated the problem).
In my view, extracting an integer element and converting it to floating point was an outlier in terms of code quality and it is what this patch fixes.

Now, I understand that there are some general opportunities for improvement with operations that convert back and forth between integer vector and scalar/vector floating point values due to the fact that some of the floating point and vector registers overlap and this is unique to PPC. This kind of solution requires considering the best choices for not only each of the uses you mention here, but also for both the source and result types. However, I don't think that a complex, fully generalized solution is likely to be implemented very soon and we should tackle these outliers in code quality in small chunks like this when we spot them. Also, we need to weigh the complexity of the solution against how common we feel the code pattern might be in real code.

So to summarize, while I definitely agree that extractelement, insertelement, scalar_to_vector and build_vector ISD nodes would benefit from a DAG combine to handle various possible uses of the results, I feel that such a solution is rather involved and needs a careful examination of what we want the resulting DAG's to look like. Until we have formulated such a solution, I think we should fix the obviously poor code gen choices with quick fixes such as this patch.

this is fine with me if the alternative is complicated.

kbarton accepted this revision.Jul 13 2016, 12:10 PM
kbarton edited edge metadata.

LGTM

This revision is now accepted and ready to land.Jul 13 2016, 12:10 PM
nemanjai closed this revision.Jul 18 2016, 8:37 AM

Committed revision 275796.