This is an archive of the discontinued LLVM Phabricator instance.

[globalisel][tablegen] Compute available feature bits correctly.
ClosedPublic

Authored by dsanders on Apr 25 2017, 8:26 AM.

Details

Summary

Predicate<> now has a field to indicate how often it must be recomputed.
Currently, there are two frequencies, per-module (RecomputePerFunction==0)
and per-function (RecomputePerFunction==1). Per-function predicates are
currently recomputed more frequently than necessary since the only predicate
in this category is cheap to test. Per-module predicates are now computed in
getSubtargetImpl() while per-function predicates are computed in selectImpl().

Tablegen now manages the PredicateBitset internally. It should only be
necessary to add the required includes.

Also fixed a problem revealed by the test case where
constrainSelectedInstRegOperands() would attempt to tie operands that
BuildMI had already tied.

Event Timeline

dsanders created this revision.Apr 25 2017, 8:26 AM
rovka edited edge metadata.Apr 26 2017, 2:40 AM

So, we have a new GISelAccessor with a new InstructionSelector for each unique subtarget (module + function attributes, as hashed in getSubtargetImpl). Why do we need to have module features and function features in each InstructionSelector? Isn't there a different InstructionSelector for each kind of function, and don't we magically get the right one when processing each function? What am I missing? It seems to me that we could get away with a single set of availableFeatures per InstructionSelector.

In any case, I think you need a more complex test with several different functions with different attributes to make sure you're computing the correct features for each function and you don't have any stale info lingering between them (something like what I do with function attributes in arm-instruction-select.mir, although that's just for convenience and not for this exact purpose).

include/llvm/Target/Target.td
538

This looks very easy to forget to set when adding a new predicate. Would it make sense to have 2 subclasses of Predicate (ModulePredicate and FunctionPredicate) and define all the predicates based on them? Naturally, it would be a pretty big mechanical change to update all the targets, so it should be a separate patch, but I think it would make things easier to maintain in the long run. What do you think?

lib/Target/AArch64/AArch64InstructionSelector.cpp
53

This doesn't look like it would scale very well if we needed to add more function-level predicates. Is there any significant disadvantage to threading the MachineFunction all the way down here? Then we'd only have to update the constructor, without createXInstructionSelector etc.

lib/Target/AArch64/AArch64TargetMachine.cpp
266

Nit: I guess it doesn't matter much, since this is only used for hashing, but it would be nice to keep the convention used for the target features etc (",+forcodesize").

So, we have a new GISelAccessor with a new InstructionSelector for each unique subtarget (module + function attributes, as hashed in getSubtargetImpl). Why do we need to have module features and function features in each InstructionSelector? Isn't there a different InstructionSelector for each kind of function, and don't we magically get the right one when processing each function? What am I missing? It seems to me that we could get away with a single set of availableFeatures per InstructionSelector.

getSubtargetImp() only has access to the Function* but NotWin64WithoutFP needs the MachineFunction* to evaluate Subtarget->getFrameLowering()->hasFP(*MF). It's possible to go from MachineFunction* to Function* but not the other way. Also, the value of hasFP() changes from pass to pass as the stack layout becomes more concrete.

In any case, I think you need a more complex test with several different functions with different attributes to make sure you're computing the correct features for each function and you don't have any stale info lingering between them (something like what I do with function attributes in arm-instruction-select.mir, although that's just for convenience and not for this exact purpose).

Ok.

include/llvm/Target/Target.td
538

If your predicate references MF you'll get a compile error since MF isn't declared in computeAvailableModuleFeatures. Things implemented via class members (e.g. ForCodeSize but that one is ok because of the cache) will silently do the wrong thing. Even without that, I think renaming to ModulePredicate/FunctionPredicate would be a good change to make since it's clearer.

If we're going to rename Predicate, we should probably take the opportunity to add ISel/CG/CodeGen or similar to the name too.

One thing that's a bit weird here is that predicates using ForCodeSize and similar would be ModulePredicate's because of the cache in getSubtargetImpl() even though they're logically per-function predicates.

lib/Target/AArch64/AArch64InstructionSelector.cpp
53

The caller doesn't have access to MachineFunction so I assume you meant Function. I'm not sure about threading that down since it moves the predicate testing away from the cache key generator. It would be easy to add a predicate in the instruction selector and forget to update the key generator. I agree we need to do something here though. I'll have a think about it.

lib/Target/AArch64/AArch64TargetMachine.cpp
266

Ok. And presumably we'd have "-forcodesize" instead of "" if we're matching that convention.

rovka added a comment.Apr 26 2017, 7:58 AM

Thanks for all the explanations!
LGTM with that extra test, we can keep discussing the remaining points but I think it's ok to iterate in-tree. I'd really, really like to see the default change from 0 to 1 on RecomputePerFunction, it seems safer that way.

include/llvm/Target/Target.td
538

Hmm, actually, in light of your more detailed explanations, I'm not so sure Module vs Function is the right abstraction here. The matter of whether or not a predicate is safe to cache doesn't seem to correlate as well as I was hoping with whether or not it's a function predicate. Maybe we should have something like CacheablePredicate (or StablePredicate? or ConstantPredicate?), and use that everywhere except where it's not safe to cache? That would force people to either decide that something is safe to cache or fall back to the slow-but-correct path of recomputing it every time.

Also, +1 for finding a better name for these predicates in general, I hate having to explain to people that this isn't about flag registers and condition codes every time :)

lib/Target/AArch64/AArch64InstructionSelector.cpp
53

That's a good point.

lib/Target/AArch64/AArch64TargetMachine.cpp
266

Yup.

rovka accepted this revision.Apr 26 2017, 7:58 AM
This revision is now accepted and ready to land.Apr 26 2017, 7:58 AM
dsanders marked 6 inline comments as done.Apr 27 2017, 5:11 AM

I'm having trouble improving the test case since the function-level predicates don't show up in any importable rules yet. I'll see if I can uncover one

include/llvm/Target/Target.td
538

SubtargetPredicate and FunctionPredicate might be a better fit. SubtargetPredicate would cover both module-level predicates and function-level predicates that are part of the subtarget key.

One thing that's a bit weird here is that predicates using ForCodeSize and similar would be
ModulePredicate's because of the cache in getSubtargetImpl() even though they're logically
per-function predicates.

Following on from the same line of thought, I think I've fixed this weirdness. It turns out that all of these can be moved to subtarget accessors without breaking SelectionDAG. This solves the scalability issue you mention on the AArch64InstructionSelector constructor. I've updated this patch with this change.

For naming RecomputePerFunction: How about RequiresMachineFunction? It dodges the issue of caching function-level information in the subtarget and it's a bit more obvious that it's how you get access to MF.

lib/Target/AArch64/AArch64InstructionSelector.cpp
53

I've found it's possible to avoid passing the flags down (see above). I've updated the patch.

dsanders updated this revision to Diff 96902.Apr 27 2017, 5:12 AM
dsanders marked an inline comment as done.

Moved ForCodeSize and similar to the Subtarget.

rovka added a comment.Apr 27 2017, 8:05 AM

I'm having trouble improving the test case since the function-level predicates don't show up in any importable rules yet. I'll see if I can uncover one

Ok, if that's impossible to do now, you should commit as-is and open a PR in bugzilla to add a test for this when we import more rules (otherwise we'll almost surely forget).

include/llvm/Target/Target.td
538

RequiresMachineFunction is not bad, but is the way we access the MF really the issue here? I'm not convinced that the distinction between Function and MachineFunction is that important. Suppose we had the same caching system for MachineFunction as we do for Function - those predicates would still need to be recomputed since, as you explained, they could change by the time we reach our pass.

For pretty much the same reason, I don't think SubtargetPredicate is explicit enough. Predicates that change are very rare, so people will be tempted to just use SubtargetPredicate everywhere without giving it a second thought. We need something that will draw attention to the fact that the predicate may be cached, or at least something that sounds strange enough that it would make people check the comments to see what the big deal is.

Anyway, sorry about bikeshedding this so much :)

lib/Target/AArch64/AArch64InstructionSelector.cpp
53

Cool.

dsanders updated this revision to Diff 97095.Apr 28 2017, 7:24 AM

Add support for patterns consisting solely of an IntInit* and use this to add a
test case to the rule predicate support... almost.

The new test case is currently XFAIL'ed since X86's GlobalISel gives precedence
to the C++ over the tablegenerated code. If you comment out the 'if (selectConstant(...))'
from X86InstructionSelect::select(), it passes the optsize functions but fails the
non-optsize functions (because the support is commented out). Conversely,
leaving it in causes the opposite behaviour. The optsize functions pick the
wrong insn (because C++ beats tablegen) but the non-optsize functions pass.

@rovka: If this updated patch looks good to you then I can commit it and file a
PR about the XFAIL. Otherwise I can commit the previous version and file a PR
about the missing test. Which do you think is best?

dsanders added inline comments.Apr 28 2017, 7:53 AM
include/llvm/Target/Target.td
538

RequiresMachineFunction is not bad, but is the way we access the MF really
the issue here? I'm not convinced that the distinction between Function and
MachineFunction is that important. Suppose we had the same caching system
for MachineFunction as we do for Function - those predicates would still need
to be recomputed since, as you explained, they could change by the time we
reach our pass.

I was thinking it could be about whether MF is declared or not but this doesn't make sense now that I've realized my mistake below.

For pretty much the same reason, I don't think SubtargetPredicate is explicit
enough. Predicates that change are very rare, so people will be tempted to
just use SubtargetPredicate everywhere without giving it a second thought.
We need something that will draw attention to the fact that the predicate may
be cached, or at least something that sounds strange enough that it would
make people check the comments to see what the big deal is.

I see your point. For some reason I was thinking "as long as a predicate is accessing values from the Subtarget it's not important whether it's a cached value from the function" but that's obviously not true if we don't call computeAvailableModuleFeatures() again after swapping out a Subtarget.

Anyway, sorry about bikeshedding this so much :)

No worries. If we don't pick a good enough name then we'll be doomed to explain it every time it confuses someone so I'd prefer to get it right.

rovka added a comment.EditedApr 28 2017, 7:55 AM

I think this patch is already doing a lot of things, I'd prefer committing the support for immediates separately and opening a PR for the test (and you can attach the X86 test to it so we can discuss with Igor). I need to double check this, but I think if I enable this for ARM (which I intend to do after this goes in) the existing ARM tests will already exercise this (for G_SDIV).

In that case I'll commit the previous version of this patch tomorrow and post the immediate support for a separate review. The PR can then come out of that review. Thanks

rovka added a comment.Apr 28 2017, 8:24 AM

I just checked and the existing ARM tests fail without this patch because the available features aren't reset when entering the functions. They work with this patch + enabling the TableGen selector for ARM, so we'll have tests then :)

I just checked and the existing ARM tests fail without this patch because the available features aren't reset when entering the functions. They work with this patch + enabling the TableGen selector for ARM, so we'll have tests then :)

Thanks.

Before this patch the available features weren't computed at all (the code was never called) so just to double check. Are you checking tests that only need module-level predicates or do they include predicates that vary between functions? The former is covered by select-inc.mir, but the latter doesn't have a working test yet.

dsanders closed this revision.Apr 29 2017, 10:43 AM