Page MenuHomePhabricator

Lowering CPI/JTI/BA to assembly
ClosedPublic

Authored by Xiangling_L on Thu, Nov 14, 7:15 AM.

Details

Summary
  • On AIX, jump table is always relative;
  • Put CPI and JTI into ReadOnlySection;
  • Create the temp symbol for block address symbol;
  • Update MIR testcases and add related assembly part;

Diff Detail

Event Timeline

Xiangling_L created this revision.Thu, Nov 14, 7:15 AM
sfertile added inline comments.Thu, Nov 14, 9:48 AM
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1757

Nit: Formatting.

llvm/lib/CodeGen/MachineModuleInfo.cpp
124

IIUC we expect that the functions entry point symbol already exists, and has its containing csect properly set. If so it be more appropriate to use Context.lookupSymbol and have appropriate error handling if the symbol doesn't exist or if its containing csect is not set.

jasonliu added a comment.EditedFri, Nov 15, 7:28 AM

I'm not able to apply this patch cleanly please rebase. Thank you.

jasonliu added inline comments.Fri, Nov 15, 7:38 AM
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1757

We could call member function of triple instead of raw comparison:
TM.getTargetTriple().isOSAIX or TM.getTargetTriple().isOSBinFormatXCOFF
Same for all the raw comparison below.

Xiangling_L marked 6 inline comments as done.

Update testcases;
Address 1st round reviews.

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1757

Sorry, but can you point out the formatting issue more explicitly? And the formatting here is got from clang-format.

1757

Thanks, I will update it.

llvm/lib/CodeGen/MachineModuleInfo.cpp
124

Thanks, I will add an assertion after we do lookupSymbol for FnEntryPointSym. And since in getContainingCsect(), we already add assertion to test if the csect for the sym exists or not, I will omit the assertion here.

jasonliu added a comment.EditedFri, Nov 15, 10:55 AM

When I apply your patch to master, it didn't apply cleanly. So please rebase.
And when I did some work to manually apply it, and run through clang-format, clang-format changed several places.
I suspect it's due to the tabs vs space issue.
I suggest to update this patch with after running the clang-format on it.

llvm/include/llvm/CodeGen/TargetLoweringObjectFileImpl.h
237

Format looks off here. You might used tabs instead of space.

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1758

Format issue:
This line doesn't look like aligned.

Xiangling_L marked 2 inline comments as done.

Rebase the patch on the latest master;
Fix some formatting issues;

When I apply your patch to master, it didn't apply cleanly. So please rebase.
And when I did some work to manually apply it, and run through clang-format, clang-format changed several places.
I suspect it's due to the tabs vs space issue.
I suggest to update this patch with after running the clang-format on it.

Thanks for your advice, I rebased this PR and fixed the format issue. Please try again to see if you can apply it on latest master.

sfertile marked an inline comment as done.Mon, Nov 18, 10:28 AM
sfertile added inline comments.
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1757

Its fixed now, but phabricator was showing the code inside the if block was indented less then the if statement itself.

llvm/lib/CodeGen/MachineModuleInfo.cpp
125

minor nit: has --> have.

llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
1875

Minor nit: IIUC then Comdats aren't supported by the XCOFF specification. The yet at the ned makes it sound to me like its something supported but not implemented in LLVM yet.

llvm/test/CodeGen/PowerPC/aix-lower-constant-pool-index.ll
87

I don't understand why we have a check-not in each of the tests for .tc. Is it because we haven't implemented the emission of the toc entries yet and its a reminder to fixup the test when we do implement that?

The "make check-all" did not return clean for the test case you just added.
There is mismatch for the test result and assertions. You might be missing something in the code that you modified.

jasonliu requested changes to this revision.Mon, Nov 18, 2:15 PM
This revision now requires changes to proceed.Mon, Nov 18, 2:15 PM

Sorry, please ignore the test did not come clean comments. I realized my branch did not have D70182 in it. After pulling it in, the test works as expected.
I will continue my review.

sfertile accepted this revision.Mon, Nov 18, 3:52 PM

Other then the few nits I mentioned LGTM.

Sorry, please ignore the test did not come clean comments. I realized my branch did not have D70182 in it. After pulling it in, the test works as expected.
I will continue my review.

I did the same thing :)

llvm/test/CodeGen/PowerPC/aix-lower-constant-pool-index.ll
87

Spoke offline with @Xiangling_L and the CHECK-NOTS are there so we can update the tests when we start writing out the toc entries.

Some observation for constant pool lowering:
We put all the constant pool data into rodata section, and create one TC entry per constant pool reference.
It hinders the ability for linker to garbage collect unused data. (Generates them into a separate RO section helps a bit, but won't fully solve this issue)
And we would also create a lot of TC entries for constant pool that might not get cleanup.

For comparison, gcc generates the value of constant pool data directly into the TC entry. It's faster and garbage collectable if not used.
xlC generates all the constant pool data in a single csect, but reference different data with one TC entry. Less efficient, but still less TC entry generated, and it does not mess with the other rodata.

So with this implementation, it would work functionally. But performance and space wise, could not compare to the other two compilers.
I'm Okay with moving forward with some TODOs in our mind.

jasonliu added inline comments.Mon, Nov 18, 4:30 PM
llvm/lib/CodeGen/MachineModuleInfo.cpp
127

nit: CSect -> Csect

jasonliu accepted this revision.Tue, Nov 19, 6:51 AM
This revision is now accepted and ready to land.Tue, Nov 19, 6:51 AM
Xiangling_L marked 4 inline comments as done.Tue, Nov 19, 8:31 AM

Some observation for constant pool lowering:
We put all the constant pool data into rodata section, and create one TC entry per constant pool reference.
It hinders the ability for linker to garbage collect unused data. (Generates them into a separate RO section helps a bit, but won't fully solve this issue)
And we would also create a lot of TC entries for constant pool that might not get cleanup.

For comparison, gcc generates the value of constant pool data directly into the TC entry. It's faster and garbage collectable if not used.

I did some investigation about LLVM constant pool, as I remember, except for int value which will be put directly into TC entry, others like double, float etc. will be put into constant pool in LLVM. If we want to mimic GCC behavior, it probably would be a large piece of refactoring.

xlC generates all the constant pool data in a single csect, but reference different data with one TC entry. Less efficient, but still less TC entry generated, and it does not mess with the other rodata.

Sorry, but can you list your testcase and some explanatory emitted results here, I am not able to reproduce what you said. And based on my observation, xlC generates one RO csect and one TC entry for each global const value [eg. float a = 5.56].

So with this implementation, it would work functionally. But performance and space wise, could not compare to the other two compilers.
I'm Okay with moving forward with some TODOs in our mind.

Thank you for your advice. Some plan I have in mind for emitting jump table index is that once we start supporting unique sections something like -fdata-sections, then we can emit each JTI to unique RO section. So I put an assertion in getSectionForJumpTable. And based on your suggestion I think I can do something similar in getSectionForConstant. Please let me know if this will resolve your concerns.

jasonliu added a comment.EditedTue, Nov 19, 9:06 AM

For comparison, gcc generates the value of constant pool data directly into the TC entry. It's faster and garbage collectable if not used.

I did some investigation about LLVM constant pool, as I remember, except for int value which will be put directly into TC entry, others like double, float etc. will be put into constant pool in LLVM. If we want to mimic GCC behavior, it probably would be a large piece of refactoring.

I don't think llvm actually try to put int directly into TC entry, it always put a label in TC entry, or we just use that integer value directly in the assembly. But please correct me if I'm wrong.
We don't need to switch to that behavior right now, but later when we care about performance then we might want to think about it.
Here's my test case:

float foo(){
  return 1.2;
}
double bar(){
  return 3.4;
}

GCC generates:

LC..0:
  .tc FS_3f99999a[TC],0x3f99999a
lfs 0,LC..0(2)

LC..1:
  .tc FD_400b3333_33333333[TC],0x400b3333,0x33333333
lfd 0,LC..1(2)
  • Notice that the value is store directly into the TC entry.

xlc:

  .csect  H.20.NO_SYMBOL{RO}, 3
  .long 0x3ff33333              # "?\36333"
  .long 0x33333333              # "3333"
  .long 0x400b3333              # "@\v33"
  .long 0x33333333              # "3333"

T.18.NO_SYMBOL:
  .tc H.18.NO_SYMBOL{TC},H.20.NO_SYMBOL{RO}

  l          r31,T.18.NO_SYMBOL(RTOC)
  lfd        fp1,0(r31)

  l          r31,T.18.NO_SYMBOL(RTOC)
  lfd        fp1,8(r31)
  • Notice that xlc only used 1 tc entry to access this RO data. It use a dedicate csect for constant pool instead of mixing with the other read only data. (you could add a "const int a= 10;" into the test case, and it won't use the same csect as constant pool)

xlC generates all the constant pool data in a single csect, but reference different data with one TC entry. Less efficient, but still less TC entry generated, and it does not mess with the other rodata.

Sorry, but can you list your testcase and some explanatory emitted results here, I am not able to reproduce what you said. And based on my observation, xlC generates one RO csect and one TC entry for each global const value [eg. float a = 5.56].

See my test case and explanations above.

So with this implementation, it would work functionally. But performance and space wise, could not compare to the other two compilers.
I'm Okay with moving forward with some TODOs in our mind.

Thank you for your advice. Some plan I have in mind for emitting jump table index is that once we start supporting unique sections something like -fdata-sections, then we can emit each JTI to unique RO section. So I put an assertion in getSectionForJumpTable. And based on your suggestion I think I can do something similar in getSectionForConstant. Please let me know if this will resolve your concerns.

-fdata-sections would help for the garbage collection case. But for performance wise, we might still want to investigate if we could do something similar like what GCC does (at least for constant pool).

I don't think llvm actually try to put int directly into TC entry, it always put a label in there. But please correct me if I'm wrong.

Sorry, actually I meant that LLVM put int directly into instruction, there even won't be any TC entry for it. Others, like float, double etc. will be put into constant pool and each of them has a TC entry.

-fdata-sections would help for the garbage collection case. But for performance wise, we might still want to investigate if we could do something similar like what GCC does (at least for constant pool).

Yes, I agree that GCC behavior is faster because there is no one extra step to load real value from the place TC entry points to. I will do more investigation about this and see if we want to address this in this PR.

This revision was automatically updated to reflect the committed changes.