For memcpy/memset/memmove, replace ExternalSymbolSDNode with a MCSymbolSDNode, which have a prefix dot before function name as entry point symbol.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5322 | Are we assuming when entering here, Callee could only be "isFunctionGlobalAddress" and "ExternalSymbolSDNode"? | |
5345 | The two if statement inside of "Subtarget.isAIXABI()" have very similar content. It would be great if we could do something to common things up. | |
5347 | I still kinda prefer this to be in an assert to begin with. | |
5351 | nit: formatting off here. | |
5356 | nit: formatting off here. |
Minor changes
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5322 | Good point. The Callee entering here won't only be isFunctionGlobalAddress and ExternalSymbolSDNode. | |
5345 | I would prefer to make it cleaner as well, but when I take a closer look, though each part goes through similar process, each step is kinda different. Especially, for external symbol sdnode, we only cover three cases so far. All those make it a little bit tricky to common things up. I would appreciate that if you give some more detailed thoughts about how to common things up here. | |
5347 | I can leave this open for a while to get more inputs and I will add a TODO as you suggested. |
As Jason suggested, the code duplication can be reduced:
const auto ProduceDescriptorAndReplaceCallee = [&](StringRef FuncName, bool EnsureContainingCsectIsSet, const auto &GetStorageClass) { auto &Context = DAG.getMachineFunction().getMMI().getContext(); MCSymbolXCOFF *S = cast<MCSymbolXCOFF>( Context.getOrCreateSymbol(Twine(".") + Twine(FuncName))); if (EnsureContainingCsectIsSet && !S->hasContainingCsect()) { // On AIX, an undefined symbol needs to be associated with a // MCSectionXCOFF to get the correct storage mapping class. In this // case, XCOFF::XMC_PR. MCSectionXCOFF *Sec = Context.getXCOFFSection( S->getName(), XCOFF::XMC_PR, XCOFF::XTY_ER, GetStorageClass(), SectionKind::getMetadata()); S->setContainingCsect(Sec); } // Replace the callee SDNode with the MCSymbolSDNode. Callee = DAG.getMCSymbol(S, PtrVT); Ops[1] = Callee; }; if (isFunctionGlobalAddress(Callee)) { GlobalAddressSDNode *G = cast<GlobalAddressSDNode>(Callee); const GlobalObject *GO = cast<GlobalObject>(G->getGlobal()); ProduceDescriptorAndReplaceCallee( GO->getName(), GO->isDeclaration(), [&] { return TargetLoweringObjectFileXCOFF::getStorageClassForGlobal(GO); }); } else if (ExternalSymbolSDNode *const E = dyn_cast<ExternalSymbolSDNode>(Callee)) { const char *SymName = E->getSymbol(); // TODO: Remove this when the support for ExternalSymbolSDNode is // complete. if (strcmp(SymName, "memcpy") == 0 || strcmp(SymName, "memmove") == 0 || strcmp(SymName, "memset") == 0) { ProduceDescriptorAndReplaceCallee(SymName, true, [] { return XCOFF::C_EXT; }); } }
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5374 | GO cannot be null or getName was called with a null pointer just above. | |
5376 | The formatting is still off here. If the comment is merged back into one line, clang-format will produce: // On AIX, an undefined symbol needs to be associated with a // MCSectionXCOFF to get the correct storage mapping class. In this // case, XCOFF::XMC_PR. | |
5377 | The indentation for this line is affected as well. | |
5390 | If the two conditions are mutually exclusive, then this should be an else if. If this check is meant to operate on the new Callee assigned in the block above, then a comment really should be added. | |
5392 | Add a space after the colon and add "for" after "support": // TODO: Remove this when the support for ExternalSymbolSDNode is complete. | |
5401 | If this is not gone, please copy the corrected comment text from the upstream-reviewed version above. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5403 | What happens if the user is defining memset in this file? |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5393 | I think there will be a handful of other symbols we will need to white list for now as well (not needed in this patch, but sooner rather then later). I suggest we add a function that determines if the name an ExternalSDNode refers to is one we support by using a StringSwitch of the accepted names. When a name is not a supported/expected symbol then we can dump the SDNode as part of the debug tracing, and fatal error. |
llvm/test/CodeGen/PowerPC/aix-external-sym-sdnode-lowering.ll | ||
---|---|---|
12 | XL calls .___memmove (with three underscores) with no NOP to implement both memcpy and memmove. This is implemented in libc as an extended operation (with no TOC dependency). This is also the case with .___memset for memset. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5390 | Thanks for mentioning that, but in LLVM coding standard: Don’t use else after a return, so I didn't use else if here. | |
5393 | I am not sure if I understand what you said correctly, but I think for those SDNode which doesn't come through as an ExternalSDNode will fall into GlobalAddressSDNode, for which we already add "." as a prefix for function entry point in previous commit? If I am wrong, can you state your concern more clearly? | |
5403 | I did some test, and I think it's worth bringing up the findings in the srum, but I will document what I found here. [test_user_define_memcpy.c] int memcpy ( void * destination, int num ) { return 3; } typedef struct cpy_array { int array[20]; } cpy; int main() { cpy A = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16}; cpy B = A; return B.array[0]; }
XLC doesn't synthesize load & store to memcpy, the result is correct =1.
GCC in this case didn't pick up library version memcpy, and the result was some random garbage value.
the same as above GCC case. So I think the following questions are:
typedef struct cpy_array { int array[20]; } cpy; int main() { cpy A = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16}; cpy B = A; return B.array[0]; } int memcpy ( void * destination, int num ) { return 3; } | |
llvm/test/CodeGen/PowerPC/aix-external-sym-sdnode-lowering.ll | ||
12 | Thank you for bringing this up, we discussed it in the scrum today, and it's preferred to defer this to a performance related change when we identify more similar cases. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5390 | Exactly how does that part of the coding standard apply here? This if is reachable after the previous if block has been entered. Again, the previous if block modifies Callee. Is this check meant to operate upon the modified Callee? | |
5393 | This has to do with mapping a call to memcmp to the entry point symbol named .___memcmp. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5403 | Regardless of whether the user-defined or the system-defined version gets called (I believe either is okay), we should make sure that the lookup and possible creation of the symbol here interacts reasonably with the definition or a user-declaration that changes the symbol properties (such as marking the function weak). |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5390 | I see what you are talking about here. I refactored the code and will update the patch soon. | |
5393 | It seems also belong to the follow-up performance patch. But I am also curious that any suggestion of it? @sfertile | |
5404 | @hubert.reinterpretcast @cebowleratibm @sfertile @jasonliu Since we didn't get the chance to talk about this in the scrum, I am thinking I can put some of my thoughts here for discussion.
Difficulties: We need a way to tell if there is a user-declared/user-defined memcpy in the translation unit when the compiler make the decision that if we want to replace a bunch of load and store with memcpy ExternalSymbolSDNode.
In summary, it seems a way to tell if there is a user-declared/user-defined memcpy in the translation unit is a must, i.e. if there is a way to tell if there exists a memcpy GlobalObject. But so far I got no luck to figure out how to do that. Anyone has any thoughts about that? |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5404 | The LLVM and GCC behaviour is acceptable. Under C90 subclause 7.1.3, a user definition of memcpy leads to undefined behaviour. At the assembly and object generation level, we can handle cases where the user leaves memcpy as a function in a somewhat reasonable manner. If the user defines memcpy as an object, then we will preferably have a way to keep both memcpy symbols around even if the resulting assembly or object file is not exactly valid. As for finding out whether the user has defined memcpy, I think we can look into llvm::Module::getNamedValue. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5229–5230 | The assert is unnecessary. cast does not return null pointers. | |
5230 | The name of this reads like an operation. It is not clear what the return value is (but it does return a value). Is this more getAIXFuncEntryPointSymbol? | |
5235 | Is there a rationale to the ordering here? I don't see a semantic or lexical reason for the current ordering. Ordering helps when comparing lists. | |
5244 | Do you mean that there is no name? | |
5249 | I think this should be an MVT to match the return type. | |
llvm/test/CodeGen/PowerPC/aix-external-sym-sdnode-lowering.ll | ||
41 | The test does not cover the larger set of strings that you whitelisted. I'm not sure if that is intentional. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5235 | Thanks, I will order them alphabetically. | |
5404 | Do you mean we don't want to either follow LLVM and GCC's behavior as Under C90 subclause 7.1.3, a user definition of memcpy leads to undefined behavior or not synthesize memcpy but introduce a third way to handle this by generating invalid assembly or object files? Can you state a little bit clearly why we want to do this rather than following LLVM and GCC? And how does invalid assmbly or object file will possibly look like if take memcpy for example? | |
llvm/test/CodeGen/PowerPC/aix-external-sym-sdnode-lowering.ll | ||
41 | Thanks for mentioning that, I was planning to update the testcase/add more testcases when the above ExternalSymbolSDNode undefined behavior discussion is closed. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5404 | It is okay to synthesize calls to memcpy expecting that the user did not define it themselves; however, to encode such calls when the symbol is defined in the current translation unit as a variable could be problematic (although less so if the symbols are clearly distinguished, and with the final XCOFF, the object and the function entry point would be distinctly named). As for your question regarding why we are not following LLVM and GCC, I have not said suggested anything that contradicts the LLVM and GCC behaviour as you have described it. The implication in what I said is that the goal is to not trigger an internal compiler error. It seems the scope for which we need to be concerned is already limited by Clang: error: redefinition of 'memcpy' as different kind of symbol I guess a problematic case could still be presented through LLVM IR. |
Update the pr to address user-defined/user-declared lib function issue;
Add more testcases;
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5232 | This is returning a GlobalValue, which means it could also be a global variable. If so, I don't think we want to enter this branch. | |
llvm/test/CodeGen/PowerPC/aix-external-sym-sdnode-lowering.ll | ||
2 | do we care to put -mcpu=pwr4 in this test case? | |
llvm/test/CodeGen/PowerPC/aix-user-defined-memcpy.ll | ||
7 ↗ | (On Diff #235228) | Maybe it's worth to explain what this test is for? For example, to make sure under this circumstances we would still set up the right memcpy symbol to call the user defined memcpy instead of external reference memcpy. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5229–5230 | Add "a" before "user-defined". | |
5229–5230 | Both instances of "user-defined" should be "user-declared". | |
5230 | Add "the" before "user-defined". | |
5230 | This code does not need to be inside an else block. | |
5233 | dyn_cast can return a null pointer. | |
5235 | This does not need to be inside an else block. | |
llvm/test/CodeGen/PowerPC/aix-user-defined-memcpy.ll | ||
20 ↗ | (On Diff #235228) | This test should preferably check the symbol table for a .o file and the relocation associated with the call. |
llvm/test/CodeGen/PowerPC/aix-external-sym-sdnode-lowering.ll | ||
---|---|---|
2 | I didn't add -mcpu=pwr4 because not all ExternalSDNode we want to test can be generated under a specific cpu level. And I think this testcase is more focused on, if xxx` ExternalSymbolSDNode synthesized by compiler, we are able to map it to .xxx. So it doesn't really matter to add cpu level as well. | |
llvm/test/CodeGen/PowerPC/aix-user-defined-memcpy.ll | ||
7 ↗ | (On Diff #235228) | Good point, I will add some comments in the testcase. |
20 ↗ | (On Diff #235228) | Good point, I will add it in. |
Because the best way to test which memcpy version we pick when there exists a user-declared one, we need to test the relocation of the call. So this patch would be moved to needs revision section, waiting for relocation patch to land first.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5122 | There is no demonstration that we should expect to encounter __floatdisf. Should we retain it in this list? Similarly for other cases not present in the test. | |
5155 | Sorry for missing this earlier. Would it be better to name this getAIXFuncEntryPointSymbolSDNode? | |
5173 | Remove this comment (it no longer fits here). | |
5206 | I find it very jarring when the next statement after a closing brace is not at the same indentation level (and similarly when the last statement before a closing brace is not indented just one level in). | |
5209 | This if does not need braces. I would much rather spend the braces on the if statement above. | |
llvm/test/CodeGen/PowerPC/aix-user-defined-memcpy.ll | ||
55 ↗ | (On Diff #236688) | Use --expand-relocs to get past this message. |
llvm/test/CodeGen/PowerPC/aix-user-defined-memcpy.ll | ||
---|---|---|
55 ↗ | (On Diff #236824) | Testing for this error does not make sense. I'm not sure if we plan to support unexpanded relocation anytime soon. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5122 | Same situation with __floatdidf on the new diff. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5233 | The two ifs can be folded using dyn_cast_or_null. | |
llvm/test/CodeGen/PowerPC/aix-user-defined-memcpy.ll | ||
2 ↗ | (On Diff #236824) | There should be no need to pipe into llvm-readobj. Place the split this into separate RUN lines between the .o generation and the inspection of the .o. |
5 ↗ | (On Diff #236824) | You only need to produce the .o file once (in this case, on the first RUN line of the file). |
31 ↗ | (On Diff #236824) | Replace "a.o file" with "a .o file". |
34 ↗ | (On Diff #236824) | Use the [[:space:]] formulation (@DiggerLin can help you) so we can allow the symbol table index to change. |
53 ↗ | (On Diff #236824) | Add a 32-SYM-NOT against further symbols named .memcpy. |
The test needs some minor changes that can be fixed on the commit. LGTM otherwise.
llvm/test/CodeGen/PowerPC/aix-user-defined-memcpy.ll | ||
---|---|---|
6 ↗ | (On Diff #237096) | You need --relocs in addition to --expand-relocs. |
54 ↗ | (On Diff #237096) | This needs a comment explaining that we are expecting to have the test fail when the support for relocations land. This would need to be Relocation{{[[:space:]]}} to avoid matching on Relocations. |
A bug is caught that IFunc is not supported on AIX, so we shouldn't have cast a GlobalValue to a GlobalObject for all PPC target.
I added an assertion about this, and prefer reviewers to take one more look and approve it if no concern raised. I should have caught this earlier, sorry for the inconvenience.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5229–5230 | Do you mean that GV must be a GlobalIFunc or must not be a GlobalIFunc? |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5229–5230 | Thanks for pointing this out, I messed it up when I moved code around. Fixed. |
There is no demonstration that we should expect to encounter __floatdisf. Should we retain it in this list? Similarly for other cases not present in the test.