This change-set support re-factorization of X86InterleaveAccess pass into a class without any functional changes in order to allow better code sharing.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/X86/X86InterleavedAccess.cpp | ||
---|---|---|
69 ↗ | (On Diff #75878) | Not related to this patch, but is setting the original vector load's alignment for all 'decomposed' loads correct? |
lib/Target/X86/X86InterleavedAccess.h | ||
18 ↗ | (On Diff #75878) | No need to include in the header. forward declaration is sufficient: |
19 ↗ | (On Diff #75878) | Probably same as above, LLVM convention is to use forward declarations if reasonable. |
53 ↗ | (On Diff #75878) | Consider documenting this c-tor's arguments. |
66 ↗ | (On Diff #75878) | Does this method need to be public? |
77 ↗ | (On Diff #75878) | Does this method need to be public? |
lib/Target/X86/X86InterleavedAccess.cpp | ||
---|---|---|
152 ↗ | (On Diff #75878) | Isn't this a leak? Where is this deleted? |
lib/Target/X86/X86InterleavedAccess.cpp | ||
---|---|---|
224 ↗ | (On Diff #76113) | It would better to instantiate as a function-local variable. If you do want to allocate on the heap, please use std::unique_ptr or something similar to manage the object. |
The update addresses Zvi's comment about declaring Grp as a functional local variable.
lib/Target/X86/X86InterleavedAccess.cpp | ||
---|---|---|
26 ↗ | (On Diff #76449) | elements --> element |
53 ↗ | (On Diff #76449) | This reads funny. Did you mean "into \p NumSubVectors sub vectors of type \p T."? |
55 ↗ | (On Diff #76449) | Please describe the return value in the comment. I assume it is supposed to indicate success? It seems odd to me that that would be necessary. I would expect isSupported() == true to imply that the transform is expected to succeed. Should this just return void instead? |
64 ↗ | (On Diff #76449) | It would be slightly clearer if you used the variable names here, i.e. Input-Vectors --> InputVectors |
68 ↗ | (On Diff #76449) | Is this method really supposed to be restricted to 4x4 transpose? If so, your example should be a 4x4 transpose, not a 2x2 one. (Did you mean for the comment at the method definition to be here?) Trasposed --> Transposed |
95 ↗ | (On Diff #76449) | Don't duplicate the comments from the class definition unless they contain additional value (e.g. notes about the low level implementation). |
116 ↗ | (On Diff #76449) | Why are you passing VecInst, NumSubVectors, & SubVecTy as arguments to decompose? These are all easily accessible from the class members. |
119 ↗ | (On Diff #76449) | VecSize --> SubVecSize would be clearer |
Thanks Dave, the update addresses your comment.
lib/Target/X86/X86InterleavedAccess.cpp | ||
---|---|---|
55 ↗ | (On Diff #76449) | So the intent is to use it for breaking down any kind of instruction such as load, shuffle. Currently, load is only supported. Also there might be some other challenges where we were not able to create the dummy vectors and break-down the instruction evenly. |
68 ↗ | (On Diff #76449) | Yes. |
116 ↗ | (On Diff #76449) | So the intent is to use this function in a general way and use it to break down any kind of instruction. Currently, it is used for load instruction, but it will also be used to break down the long shuffle instruction in the strided-store pattern. |
Hi Simon,
I would think uint32_t would be preferred over unsigned because unsigned could lead to an incorrect result. Though I agree it will be safe to use unsigned here.
But in general is there a reason for wanting the size to change with the underlying architecture?
lib/Target/X86/X86InterleavedAccess.cpp | ||
---|---|---|
34 ↗ | (On Diff #77381) | const LoadInst* Inst; |
50 ↗ | (On Diff #77381) | Do you need to keep Builder inside? I assume it may be a local var inside function. |
81 ↗ | (On Diff #77381) | const Instruction *I |
216 ↗ | (On Diff #77381) | Why do you need to pass an empty builder here? |
220 ↗ | (On Diff #77381) | return Grp.isSupported() && Grp.lowerIntoOptimizedSequence(); |
This change-set addresses Elena's comments.
lib/Target/X86/X86InterleavedAccess.cpp | ||
---|---|---|
34 ↗ | (On Diff #77381) | Inst cannot be a constant pointer because we are bit-casting its pointer operand. Inst can be a both load/store, that's why I declared it as an Instruction instead of LoadInst. |
50 ↗ | (On Diff #77381) | Yes, because Builder is used in different member functions and the different member functions might not have any idea about the insertion point. Having it as a member variable helps updating the insertion point automatically after each new instruction creation. |
81 ↗ | (On Diff #77381) | I think it's covered by my previous comment. |
216 ↗ | (On Diff #77381) | We are using this builder with insertionpoint set to load instruction in multiple member functions where the member functions have no idea about the central insertion point. Therefore, we need to have it as a member variable set to the central insertion point for the other member functions. |
lib/Target/X86/X86InterleavedAccess.cpp | ||
---|---|---|
127 ↗ | (On Diff #77569) | It is always "load" or you are thinking about any general case? You can call the function decomposeLoad. |
141 ↗ | (On Diff #77569) | Zvi is right. And fix indentation, please. |
34 ↗ | (On Diff #77381) | so "Instruction* Inst" is enough. The "const" is redundant here. |
lib/Target/X86/X86InterleavedAccess.cpp | ||
---|---|---|
127 ↗ | (On Diff #77569) | Yes, that's the plan to use it in a general way, for decomposing any kind of instructions such as load, store, shuffle. |
141 ↗ | (On Diff #77569) | "And fix indentation, please" Instruction *NewLoad = Builder.CreateAlignedLoad(NewBasePtr, LI->getAlignment()); Are you talking about this? Indentation is not off here. I reran clang-format, everything remained as it is. |
69 ↗ | (On Diff #75878) | Sorry Zvi, I missed this comment earlier. Yes, that's correct. |
In general we should match whatever the original return values was - Type::getVectorNumElements() returns unsigned so the NumSubVectors should probably match that.
I'm sorry but I messed up on the types of a couple of the other instances - CreateShuffleVector should take an ArrayRef<uint32_t> and DataLayout::getTypeSizeInBits returns a uint64_t
lib/Target/X86/X86InterleavedAccess.cpp | ||
---|---|---|
196 ↗ | (On Diff #77569) | for (unsigned i = 0, e = Shuffles.size(); i != e ; ++i) |
LGTM. Thanks, Farhana!
X86InterleavedAccess.cpp | ||
---|---|---|
71 ↗ | (On Diff #77650) | Minor comment: AFAIU you can pass an ArrayRef object by value efficiently, but i did see several occurrences in LLVM's source code with ArrayRef's passed by reference. Whatever you choose, maybe make this entire file consistent (line 82, for example is inconsistent with line 71). |