This is an archive of the discontinued LLVM Phabricator instance.

[TableGen] Add support for instruction clusters
Needs ReviewPublic

Authored by evandro on Jul 10 2017, 4:29 PM.

Details

Summary

This change allows the description of instruction clusters by using the new ReadCluster and SchedReadCluster classes, analogously to the ReadAdvance and SchedReadAdvance classes, respectively.

In a subsequent patch, the instruction scheduler is modified to use this information.

The motivation is to allow a target maintainer to specify which instructions pairs should be clustered together in the machine model. Thus the clustering is done as part of instruction scheduling instead of relying in a scheduling mutation and adding code to the backend to match the instructions.

Diff Detail

Event Timeline

evandro created this revision.Jul 10 2017, 4:29 PM
MatzeB added a subscriber: MatzeB.Jul 10 2017, 7:39 PM

Can you give an example of a typical use? Is this just a different way to express macrofusion opportunities or is this more or less powerfull?

javed.absar added inline comments.Jul 11 2017, 1:52 AM
llvm/include/llvm/MC/MCSchedule.h
93

Shouldn't this function be amended as well to equate Cluster?

llvm/include/llvm/Target/TargetSchedule.td
312

Perhaps the order of the parameters should be - cycles, cluster, writes = []

fhahn edited edge metadata.Jul 11 2017, 5:19 AM

Can you give an example of a typical use? Is this just a different way to express macrofusion opportunities or is this more or less powerfull?

I agree it would be helpful to see an example, e.g. how this could be used to cluster AArch64 AES instructions.

If this is intended as a replacement for the MacroFusion pass, could the more complicated constraints for fusing MOVK/MOVZ on AArch64 be expressed (lib/target/AArch64/AArch64MacroFusion.cpp from line 130)?

llvm/include/llvm/MC/MCSchedule.h
91

I assume you added the cluster bit here, because the whole machinery for handling "ReadAdvance" can be easily extended to do add cluster dependencies? It feels that after adding the cluster bit, most related function/class names are slightly misleading, e.g. getReadAdvanceCycles now does not only get the number of cycles but also the cluster bit.

I posted the preliminary patch D35260 to illustrate how this change, along with D35229, can be used to simplify instruction fusion.

fhahn added a comment.Jul 12 2017, 2:16 PM

Thanks @evandro , I think the example is really helpful to judge the impact of the set of changes.

I am not sure I see the clear advantages of the new approach over the macro fusion DAG mutation though. It seems to me that the new approach spreads out the implementation and definitions out over multiple files, whereas the DAG mutation is basically 2 files (the generic pass and the target specific implementation of shouldScheduleAdjecent) and it's relatively easy to see exactly what's going on, even though overall it probably requires slightly more code than the new approach.

As it can be seen from the example, the fusion does not spread definitions over multiple files. In this proposed approach, fusion is specified in the machine model. Moreover, it folds fusion into the machine scheduling and decreases its run-time cost. All the while, yielding the same result as before.

As it can be seen from the example, the fusion does not spread definitions over multiple files. In this proposed approach, fusion is specified in the machine model. Moreover, it folds fusion into the machine scheduling and decreases its run-time cost. All the while, yielding the same result as before.

I agree, it does not spread the information out over multiple files. However, the fact that the clusters are just specified by numbers makes the implementation less self-explanatory (i.e. it is really not clear what the numbers mean in the model files). I suggest that you add separate tablegen classes that represent the clusters so that you can refer to the clusters by name in the scheduling definitions.

evandro updated this revision to Diff 106527.Jul 13 2017, 2:00 PM
evandro edited the summary of this revision. (Show Details)

Address feedback by @hfinkel.

I agree, it does not spread the information out over multiple files. However, the fact that the clusters are just specified by numbers makes the implementation less self-explanatory (i.e. it is really not clear what the numbers mean in the model files). I suggest that you add separate tablegen classes that represent the clusters so that you can refer to the clusters by name in the scheduling definitions.

Agreed, putting the fusion information in the scheduling model makes sense conceptually, as it's a scheduling problem.

With spreading out I meant the changes to the machine scheduler related files (but that's mostly just passing through the cluster info) and that we need to add fusion entries to all relevant machine models , whereas before there was a single file describing which instructions should be fused.

For AArch64, there might be 2 potential problems:

  • Currently machine models are shared by different CPUs. What if one CPU supports fusing more instructions than the other? I'm not sure if that would be a practical issue at the moment, but it may be worth considering.
  • For fusing some instructions, more complicated constraints are used, e.g. CBZ and some arithmetic instructions are only fused if no shifted register is used [1] or MOVZWi should only be fused with MOVKWi, if the immediate used with MOVKWi == 16 [2]. Could such constraints be expressed in the machine model?

[1] https://github.com/llvm-mirror/llvm/blob/master/lib/Target/AArch64/AArch64MacroFusion.cpp#L70
[2] https://github.com/llvm-mirror/llvm/blob/master/lib/Target/AArch64/AArch64MacroFusion.cpp#L140

I agree, it does not spread the information out over multiple files. However, the fact that the clusters are just specified by numbers makes the implementation less self-explanatory (i.e. it is really not clear what the numbers mean in the model files). I suggest that you add separate tablegen classes that represent the clusters so that you can refer to the clusters by name in the scheduling definitions.

Agreed, putting the fusion information in the scheduling model makes sense conceptually, as it's a scheduling problem.

With spreading out I meant the changes to the machine scheduler related files (but that's mostly just passing through the cluster info) and that we need to add fusion entries to all relevant machine models , whereas before there was a single file describing which instructions should be fused.

For AArch64, there might be 2 potential problems:

  • Currently machine models are shared by different CPUs. What if one CPU supports fusing more instructions than the other? I'm not sure if that would be a practical issue at the moment, but it may be worth considering.
  • For fusing some instructions, more complicated constraints are used, e.g. CBZ and some arithmetic instructions are only fused if no shifted register is used [1] or MOVZWi should only be fused with MOVKWi, if the immediate used with MOVKWi == 16 [2]. Could such constraints be expressed in the machine model?

Maybe the TableGen model could support optional C++ predicates (much like we do for isel). We could also have a target callback that can override the td-file clustering for special cases.

[1] https://github.com/llvm-mirror/llvm/blob/master/lib/Target/AArch64/AArch64MacroFusion.cpp#L70
[2] https://github.com/llvm-mirror/llvm/blob/master/lib/Target/AArch64/AArch64MacroFusion.cpp#L140

fhahn resigned from this revision.Dec 15 2017, 12:35 PM
evandro added a subscriber: evandro.Mar 7 2018, 9:02 AM