This is an archive of the discontinued LLVM Phabricator instance.

[AIX] Replace ExternalSymbolSDNode with a MCSymbolSDNode for memcpy/memset/memmove
ClosedPublic

Authored by Xiangling_L on Nov 26 2019, 7:21 AM.

Details

Summary

For memcpy/memset/memmove, replace ExternalSymbolSDNode with a MCSymbolSDNode, which have a prefix dot before function name as entry point symbol.

Diff Detail

Event Timeline

Xiangling_L created this revision.Nov 26 2019, 7:21 AM
jasonliu added inline comments.Dec 3 2019, 12:45 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5322

Are we assuming when entering here, Callee could only be "isFunctionGlobalAddress" and "ExternalSymbolSDNode"?
If so, an assertion here would be necessary. If not, we would end up replace the callee with a (MCSymbolXCOFF)nullptr, which would be wrong.

5345

The two if statement inside of "Subtarget.isAIXABI()" have very similar content. It would be great if we could do something to common things up.

5347

I still kinda prefer this to be in an assert to begin with.
But if we decide that we want this expensive pattern matching to run in non-assert build as well, then we could keep this way.
Is this some code that we are going to remove later when we think we covered all the cases? If so, do we want to leave a TODO here?

5351

nit: formatting off here.

5356

nit: formatting off here.

Xiangling_L marked 7 inline comments as done.

Rebase on latest master;
Address 1st round reviews;

Minor changes

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5322

Good point. The Callee entering here won't only be isFunctionGlobalAddress and ExternalSymbolSDNode.

5345

I would prefer to make it cleaner as well, but when I take a closer look, though each part goes through similar process, each step is kinda different. Especially, for external symbol sdnode, we only cover three cases so far. All those make it a little bit tricky to common things up. I would appreciate that if you give some more detailed thoughts about how to common things up here.

5347

I can leave this open for a while to get more inputs and I will add a TODO as you suggested.

As Jason suggested, the code duplication can be reduced:

const auto ProduceDescriptorAndReplaceCallee =
    [&](StringRef FuncName, bool EnsureContainingCsectIsSet,
        const auto &GetStorageClass) {
      auto &Context = DAG.getMachineFunction().getMMI().getContext();

      MCSymbolXCOFF *S = cast<MCSymbolXCOFF>(
          Context.getOrCreateSymbol(Twine(".") + Twine(FuncName)));

      if (EnsureContainingCsectIsSet && !S->hasContainingCsect()) {
        // On AIX, an undefined symbol needs to be associated with a
        // MCSectionXCOFF to get the correct storage mapping class. In this
        // case, XCOFF::XMC_PR.
        MCSectionXCOFF *Sec = Context.getXCOFFSection(
            S->getName(), XCOFF::XMC_PR, XCOFF::XTY_ER, GetStorageClass(),
            SectionKind::getMetadata());
        S->setContainingCsect(Sec);
      }

      // Replace the callee SDNode with the MCSymbolSDNode.
      Callee = DAG.getMCSymbol(S, PtrVT);
      Ops[1] = Callee;
    };

if (isFunctionGlobalAddress(Callee)) {
  GlobalAddressSDNode *G = cast<GlobalAddressSDNode>(Callee);
  const GlobalObject *GO = cast<GlobalObject>(G->getGlobal());
  ProduceDescriptorAndReplaceCallee(
      GO->getName(), GO->isDeclaration(), [&] {
        return TargetLoweringObjectFileXCOFF::getStorageClassForGlobal(GO);
      });
} else if (ExternalSymbolSDNode *const E =
               dyn_cast<ExternalSymbolSDNode>(Callee)) {
  const char *SymName = E->getSymbol();

  // TODO: Remove this when the support for ExternalSymbolSDNode is
  // complete.
  if (strcmp(SymName, "memcpy") == 0 || strcmp(SymName, "memmove") == 0 ||
      strcmp(SymName, "memset") == 0) {
    ProduceDescriptorAndReplaceCallee(SymName, true,
                                      [] { return XCOFF::C_EXT; });
  }
}
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5374

GO cannot be null or getName was called with a null pointer just above.

5376

The formatting is still off here. If the comment is merged back into one line, clang-format will produce:

// On AIX, an undefined symbol needs to be associated with a
// MCSectionXCOFF to get the correct storage mapping class. In this
// case, XCOFF::XMC_PR.
5377

The indentation for this line is affected as well.

5390

If the two conditions are mutually exclusive, then this should be an else if. If this check is meant to operate on the new Callee assigned in the block above, then a comment really should be added.

5392

Add a space after the colon and add "for" after "support":

// TODO: Remove this when the support for ExternalSymbolSDNode is complete.
5401

If this is not gone, please copy the corrected comment text from the upstream-reviewed version above.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5403

What happens if the user is defining memset in this file?

sfertile added inline comments.Dec 10 2019, 7:53 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5393

I think there will be a handful of other symbols we will need to white list for now as well (not needed in this patch, but sooner rather then later). I suggest we add a function that determines if the name an ExternalSDNode refers to is one we support by using a StringSwitch of the accepted names. When a name is not a supported/expected symbol then we can dump the SDNode as part of the debug tracing, and fatal error.

llvm/test/CodeGen/PowerPC/aix-external-sym-sdnode-lowering.ll
12

XL calls .___memmove (with three underscores) with no NOP to implement both memcpy and memmove. This is implemented in libc as an extended operation (with no TOC dependency). This is also the case with .___memset for memset.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5393

There are other libc extended operations that don't seem to come through as an ExternalSDNode by default, such as for memcmp. @sfertile, do you have an suggested approach for this?

Xiangling_L marked 14 inline comments as done.Dec 16 2019, 7:10 PM
Xiangling_L added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5390

Thanks for mentioning that, but in LLVM coding standard: Don’t use else after a return, so I didn't use else if here.

5393

I am not sure if I understand what you said correctly, but I think for those SDNode which doesn't come through as an ExternalSDNode will fall into GlobalAddressSDNode, for which we already add "." as a prefix for function entry point in previous commit?

If I am wrong, can you state your concern more clearly?

5403

I did some test, and I think it's worth bringing up the findings in the srum, but I will document what I found here.

[test_user_define_memcpy.c]
int memcpy ( void * destination, int num ) { return 3; }

typedef struct cpy_array {
  int array[20];
} cpy;

int main() {
  cpy A = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16};
  cpy B = A;
  return B.array[0];
}
  1. XLC on AIX with no-opt:

XLC doesn't synthesize load & store to memcpy, the result is correct =1.

  1. GCC on AIX with no-opt:

GCC in this case didn't pick up library version memcpy, and the result was some random garbage value.

  1. LLVM with powerpc64le-unknown-linux-gnu target with no-opt:

the same as above GCC case.

So I think the following questions are:

  1. Is GCC and LLVM's behavior in this case a bug?
  2. Do we want to follow GCC's behavior?
  3. If the answer of 2) is yes, how do we want to achieve that, especially when the definition of memcpy is put after callInst of ExternalSymbolSDNode, like the following:
typedef struct cpy_array {
  int array[20];
} cpy;

int main() {
  cpy A = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16};
  cpy B = A;
  return B.array[0];
}

int memcpy ( void * destination, int num ) { return 3; }
llvm/test/CodeGen/PowerPC/aix-external-sym-sdnode-lowering.ll
12

Thank you for bringing this up, we discussed it in the scrum today, and it's preferred to defer this to a performance related change when we identify more similar cases.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5390

Exactly how does that part of the coding standard apply here? This if is reachable after the previous if block has been entered. Again, the previous if block modifies Callee. Is this check meant to operate upon the modified Callee?

5393

This has to do with mapping a call to memcmp to the entry point symbol named .___memcmp.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5403

Regardless of whether the user-defined or the system-defined version gets called (I believe either is okay), we should make sure that the lookup and possible creation of the symbol here interacts reasonably with the definition or a user-declaration that changes the symbol properties (such as marking the function weak).

Xiangling_L planned changes to this revision.Dec 18 2019, 11:10 AM
Xiangling_L marked 4 inline comments as done.
Xiangling_L marked 5 inline comments as done.Dec 18 2019, 4:40 PM
Xiangling_L added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5390

I see what you are talking about here. I refactored the code and will update the patch soon.

5393

It seems also belong to the follow-up performance patch. But I am also curious that any suggestion of it? @sfertile

5404

@hubert.reinterpretcast @cebowleratibm @sfertile @jasonliu Since we didn't get the chance to talk about this in the scrum, I am thinking I can put some of my thoughts here for discussion.

  1. I would treat GCC and LLVM 's behavior as a bug in this case. The compiler shouldn't synthesize to a memcpy which conflicts with user-defined memcpy and lead to some undefined behavior. The correct behavior seems should be when the compiler detect a user-defined/user-declared memcpy, it should give up the synthesis and keep original load and store.

Difficulties: We need a way to tell if there is a user-declared/user-defined memcpy in the translation unit when the compiler make the decision that if we want to replace a bunch of load and store with memcpy ExternalSymbolSDNode.

  1. On the other hand, if we want to follow GCC's behavior and try to pick up user-defined/user-declared memcpy to replace original bunch of load and store, we also need to way to tell if there is a user-declared/user-defined memcpy in the translation unit, based on which we set Csect accordingly.

In summary, it seems a way to tell if there is a user-declared/user-defined memcpy in the translation unit is a must, i.e. if there is a way to tell if there exists a memcpy GlobalObject. But so far I got no luck to figure out how to do that. Anyone has any thoughts about that?

Xiangling_L marked 2 inline comments as done.

Refactor the code

Clean the formatting issue

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5404

The LLVM and GCC behaviour is acceptable. Under C90 subclause 7.1.3, a user definition of memcpy leads to undefined behaviour. At the assembly and object generation level, we can handle cases where the user leaves memcpy as a function in a somewhat reasonable manner. If the user defines memcpy as an object, then we will preferably have a way to keep both memcpy symbols around even if the resulting assembly or object file is not exactly valid.

As for finding out whether the user has defined memcpy, I think we can look into llvm::Module::getNamedValue.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5229–5230

The assert is unnecessary. cast does not return null pointers.

5230

The name of this reads like an operation. It is not clear what the return value is (but it does return a value). Is this more getAIXFuncEntryPointSymbol?

5235

Is there a rationale to the ordering here? I don't see a semantic or lexical reason for the current ordering. Ordering helps when comparing lists.

5244

Do you mean that there is no name?

5249

I think this should be an MVT to match the return type.

llvm/test/CodeGen/PowerPC/aix-external-sym-sdnode-lowering.ll
41

The test does not cover the larger set of strings that you whitelisted. I'm not sure if that is intentional.

Xiangling_L marked 11 inline comments as done.Dec 23 2019, 2:05 PM
Xiangling_L added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5235

Thanks, I will order them alphabetically.

5404

Do you mean we don't want to either follow LLVM and GCC's behavior as Under C90 subclause 7.1.3, a user definition of memcpy leads to undefined behavior or not synthesize memcpy but introduce a third way to handle this by generating invalid assembly or object files? Can you state a little bit clearly why we want to do this rather than following LLVM and GCC? And how does invalid assmbly or object file will possibly look like if take memcpy for example?

llvm/test/CodeGen/PowerPC/aix-external-sym-sdnode-lowering.ll
41

Thanks for mentioning that, I was planning to update the testcase/add more testcases when the above ExternalSymbolSDNode undefined behavior discussion is closed.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5404

It is okay to synthesize calls to memcpy expecting that the user did not define it themselves; however, to encode such calls when the symbol is defined in the current translation unit as a variable could be problematic (although less so if the symbols are clearly distinguished, and with the final XCOFF, the object and the function entry point would be distinctly named). As for your question regarding why we are not following LLVM and GCC, I have not said suggested anything that contradicts the LLVM and GCC behaviour as you have described it. The implication in what I said is that the goal is to not trigger an internal compiler error. It seems the scope for which we need to be concerned is already limited by Clang:

error: redefinition of 'memcpy' as different kind of symbol

I guess a problematic case could still be presented through LLVM IR.

Xiangling_L marked 4 inline comments as done.

Update the pr to address user-defined/user-declared lib function issue;
Add more testcases;

jasonliu added inline comments.Jan 6 2020, 8:03 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5232

This is returning a GlobalValue, which means it could also be a global variable. If so, I don't think we want to enter this branch.

llvm/test/CodeGen/PowerPC/aix-external-sym-sdnode-lowering.ll
2

do we care to put -mcpu=pwr4 in this test case?

llvm/test/CodeGen/PowerPC/aix-user-defined-memcpy.ll
7 ↗(On Diff #235228)

Maybe it's worth to explain what this test is for? For example, to make sure under this circumstances we would still set up the right memcpy symbol to call the user defined memcpy instead of external reference memcpy.
The information are all in the review comments which is good, but it would be great if people don't need to dig up the original review to find them.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5229–5230

Add "a" before "user-defined".

5229–5230

Both instances of "user-defined" should be "user-declared".

5230

Add "the" before "user-defined".

5230

This code does not need to be inside an else block.

5233

dyn_cast can return a null pointer.

5235

This does not need to be inside an else block.

llvm/test/CodeGen/PowerPC/aix-user-defined-memcpy.ll
20 ↗(On Diff #235228)

This test should preferably check the symbol table for a .o file and the relocation associated with the call.

Xiangling_L marked 13 inline comments as done.Jan 7 2020, 7:37 AM
Xiangling_L added inline comments.
llvm/test/CodeGen/PowerPC/aix-external-sym-sdnode-lowering.ll
2

I didn't add -mcpu=pwr4 because not all ExternalSDNode we want to test can be generated under a specific cpu level. And I think this testcase is more focused on, if xxx` ExternalSymbolSDNode synthesized by compiler, we are able to map it to .xxx. So it doesn't really matter to add cpu level as well.

llvm/test/CodeGen/PowerPC/aix-user-defined-memcpy.ll
7 ↗(On Diff #235228)

Good point, I will add some comments in the testcase.

20 ↗(On Diff #235228)

Good point, I will add it in.

Xiangling_L marked 3 inline comments as done.

Address the comments;

Xiangling_L planned changes to this revision.Jan 7 2020, 7:58 AM

Because the best way to test which memcpy version we pick when there exists a user-declared one, we need to test the relocation of the call. So this patch would be moved to needs revision section, waiting for relocation patch to land first.

Update two testcases;

provide full context for diff;

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5122

There is no demonstration that we should expect to encounter __floatdisf. Should we retain it in this list? Similarly for other cases not present in the test.

5155

Sorry for missing this earlier. Would it be better to name this getAIXFuncEntryPointSymbolSDNode?

5173

Remove this comment (it no longer fits here).

5206

I find it very jarring when the next statement after a closing brace is not at the same indentation level (and similarly when the last statement before a closing brace is not indented just one level in).

5209

This if does not need braces. I would much rather spend the braces on the if statement above.

llvm/test/CodeGen/PowerPC/aix-user-defined-memcpy.ll
55 ↗(On Diff #236688)

Use --expand-relocs to get past this message.

jasonliu added inline comments.Jan 8 2020, 7:37 AM
llvm/test/CodeGen/PowerPC/aix-user-defined-memcpy.ll
55 ↗(On Diff #236824)

Testing for this error does not make sense. I'm not sure if we plan to support unexpanded relocation anytime soon.
If you want to test relocation, you should add in --expand-relocs as one of your llvm-readobj option.
But currently the relocation patch is under review/revision, you would not get any relocation info, and it does not make sense for you to test something that's "invalid".
I'm not sure if you could do any better than a TODO here.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5122

Same situation with __floatdidf on the new diff.

hubert.reinterpretcast added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5233

The two ifs can be folded using dyn_cast_or_null.

llvm/test/CodeGen/PowerPC/aix-user-defined-memcpy.ll
2 ↗(On Diff #236824)

There should be no need to pipe into llvm-readobj. Place the split this into separate RUN lines between the .o generation and the inspection of the .o.

5 ↗(On Diff #236824)

You only need to produce the .o file once (in this case, on the first RUN line of the file).

31 ↗(On Diff #236824)

Replace "a.o file" with "a .o file".

34 ↗(On Diff #236824)

Use the [[:space:]] formulation (@DiggerLin can help you) so we can allow the symbol table index to change.

53 ↗(On Diff #236824)

Add a 32-SYM-NOT against further symbols named .memcpy.

Xiangling_L marked 12 inline comments as done.

delete extra ExternalSymSDNode in code;
modify func name;
update the testcase;

jasonliu accepted this revision.Jan 9 2020, 10:33 AM

LGTM.

This revision is now accepted and ready to land.Jan 9 2020, 10:33 AM

The test needs some minor changes that can be fixed on the commit. LGTM otherwise.

llvm/test/CodeGen/PowerPC/aix-user-defined-memcpy.ll
6 ↗(On Diff #237096)

You need --relocs in addition to --expand-relocs.

54 ↗(On Diff #237096)

This needs a comment explaining that we are expecting to have the test fail when the support for relocations land. This would need to be Relocation{{[[:space:]]}} to avoid matching on Relocations.

Xiangling_L marked 2 inline comments as done.

A bug is caught that IFunc is not supported on AIX, so we shouldn't have cast a GlobalValue to a GlobalObject for all PPC target.

I added an assertion about this, and prefer reviewers to take one more look and approve it if no concern raised. I should have caught this earlier, sorry for the inconvenience.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5229–5230

Do you mean that GV must be a GlobalIFunc or must not be a GlobalIFunc?

Xiangling_L marked an inline comment as done.

minor change

Xiangling_L marked an inline comment as done.Jan 13 2020, 12:36 PM
Xiangling_L added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
5229–5230

Thanks for pointing this out, I messed it up when I moved code around. Fixed.

This revision was automatically updated to reflect the committed changes.