- User Since
- Aug 10 2016, 1:07 PM (191 w, 8 h)
I'm not sure I understand the problem; can you give a C testcase?
Looks right, generally.
LGTM. I'm surprised this hasn't caused any issues, though.
I'm happy with this at a high level; it's not really any more complicated than BasicBlock sections itself, and the added flexibility makes sense.
Are you intentionally ignoring clobbers?
Being consistent with SIGN_EXTEND_INREG is a good argument. LGTM with clang-format issues fixed.
Mon, Apr 6
Rebased. I landed a bunch of the trivial non-functional changes.
New diagnostics code looks fine. (I didn't try to review the new section logic closely).
Would it make sense to generalize getSVEType() to getSVETypeList()? It seems like the current approach won't generalize well once you're dealing with more than one overloaded type (for example, llvm.aarch64.sve.scvtf.nxv8f16.nxv8i16).
Maybe better to emit llvm.aarch64.sve.sel for now, if you're trying to avoid IR operations.
If I'm following correctly, this should apply on its own. Then you're expecting the following API changes:
That seems fine, yes. LGTM
You can stick a recursive helper function on RecordType/RecordDecl, if that's the concern.
My biggest concern with this patch is that you're blindly rewriting based on the type of the alloca. That type isn't always reliable, particularly when you're dealing with unions. And even in cases where it is reliable, it might not reflect the actual operations the code ends up using in practice. I'm concerned you'll end up preemptively splitting operations that didn't need to be split, and hurt performance by doing that.
In general, it's possible for a pass to preserve GlobalsAA, and not BasicAA. BasicAA depends on a number of different analysis passes, like LoopInfo, which might not be updated. GlobalsAA doesn't depend on any local analysis passes, so it's much simpler to preserve.
This version lost the "constant integer or undef values" bit?
A depth-first search is enough to ensure some predecessor of every block is visited before that block. So the benefit of RPO is to change that to all (non-loop) predecessors, which I guess helps optimizations involving PHI nodes?
Not sure if you have commit access. If you want me to commit this for you, please let me know how to credit you in the "Author" line of the git commit (name and email).
If you want to take this over, sure, go for it. :)
Fri, Apr 3
Commited the non-header changes separately
Please add test coverage in llvm/test/Analysis/ConstantFolding/
This will fail to compile if someone includes Type.h without including DerivedTypes.h.
Fix review comment and lint
LGTM with one minor suggestion.
Diagnosing call/return statements seems nicer.
Mentioning "UndefMaskElem" probably doesn't make sense. Maybe should be "undef"?
Thu, Apr 2
Added support/tests for dynamically loaded NewPM LTO plugins. (Statically linked plugins can't be tested in regression tests because there are none by default.)
The current version only supports static plugins. Supporting dynamic plugins is a more difficult puzzle because I'd have to figure out how to thread the option through... I'll give it a shot, I guess.
If you're going to add code to check for it, we might as well add testcases for ridiculous sizes, like (__int128_t)1 << 100.
The extra assertion doesn't seem helpful; the one in getFixedSize() should be enough.
VT.getScalarSizeInBits(), I think.
You should be able to refactor the patterns into the definitions of the multiclasses sve_int_bin_cons_arit_0 and sve_int_arith_imm0, to avoid repeating them four times. (You might want to look at other places using null_frag in SVEInstrFormats.td for inspiration.)
Can you split whatever change affects the PPC tests into a separate patch, if it's not too tricky?
I think I would rather just pay the extra 8 bytes per VectorType, and expand this to support all vector types supported by LLVM. It's not like we allocate enough VectorTypes for it to matter, anyway.
That's the right idea. getNumValues() seems wrong, though.
Wed, Apr 1
My thoughts, in no particular order:
Yes, code in other passes, particularly SimplifyCFG.
We have to call getBooleanContents on the vector type, not the vector element type. On ARM, a boolean in a vector is ZeroOrNegativeOneBooleanContent. A scalar boolean on its own is ZeroOrOneBooleanContent. If you choose incorrectly, that's a potential miscompile. (Not sure why it isn't showing up as a regression test change for other targets; maybe the specific VSELECT pattern in question is rare after legalization.)