For a new backend in progress we need more than 16 functional units. In order to support that we need to make DFAInput 128 bits. I'm using a pair of uint64_t to be able to keep most of the logic unchanged. It could easily be extended to a general larger tuple if needed.
LGTM. I had a look at changing this to use APInt instead, because that seems a little nicer, but this turns out to be nontrivial.
I think really this code needs to be able to distinguish between functional units that need packetizing (VLIW slots) and functional units that are just ... functional units (for the purposes of scheduling).
I noticed that this change makes the static table twice as large and slow downs significantly compilation time when for files including it. I'm looking at an alternative solution. I might just work around the problem on my side as a different solution would be much more invasive.
I'll update the review once I get more data.