- On AIX, jump table is always relative;
- Put CPI and JTI into ReadOnlySection;
- Create the temp symbol for block address symbol;
- Update MIR testcases and add related assembly part;
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp | ||
---|---|---|
1757 | Nit: Formatting. | |
llvm/lib/CodeGen/MachineModuleInfo.cpp | ||
123 | IIUC we expect that the functions entry point symbol already exists, and has its containing csect properly set. If so it be more appropriate to use Context.lookupSymbol and have appropriate error handling if the symbol doesn't exist or if its containing csect is not set. |
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp | ||
---|---|---|
1757 | We could call member function of triple instead of raw comparison: |
Update testcases;
Address 1st round reviews.
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp | ||
---|---|---|
1757 | Sorry, but can you point out the formatting issue more explicitly? And the formatting here is got from clang-format. | |
1757 | Thanks, I will update it. | |
llvm/lib/CodeGen/MachineModuleInfo.cpp | ||
123 | Thanks, I will add an assertion after we do lookupSymbol for FnEntryPointSym. And since in getContainingCsect(), we already add assertion to test if the csect for the sym exists or not, I will omit the assertion here. |
When I apply your patch to master, it didn't apply cleanly. So please rebase.
And when I did some work to manually apply it, and run through clang-format, clang-format changed several places.
I suspect it's due to the tabs vs space issue.
I suggest to update this patch with after running the clang-format on it.
llvm/include/llvm/CodeGen/TargetLoweringObjectFileImpl.h | ||
---|---|---|
237 | Format looks off here. You might used tabs instead of space. | |
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp | ||
1758 | Format issue: |
Thanks for your advice, I rebased this PR and fixed the format issue. Please try again to see if you can apply it on latest master.
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp | ||
---|---|---|
1757 | Its fixed now, but phabricator was showing the code inside the if block was indented less then the if statement itself. | |
llvm/lib/CodeGen/MachineModuleInfo.cpp | ||
124 | minor nit: has --> have. | |
llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp | ||
1875 | Minor nit: IIUC then Comdats aren't supported by the XCOFF specification. The yet at the ned makes it sound to me like its something supported but not implemented in LLVM yet. | |
llvm/test/CodeGen/PowerPC/aix-lower-constant-pool-index.ll | ||
87 | I don't understand why we have a check-not in each of the tests for .tc. Is it because we haven't implemented the emission of the toc entries yet and its a reminder to fixup the test when we do implement that? |
The "make check-all" did not return clean for the test case you just added.
There is mismatch for the test result and assertions. You might be missing something in the code that you modified.
Sorry, please ignore the test did not come clean comments. I realized my branch did not have D70182 in it. After pulling it in, the test works as expected.
I will continue my review.
Other then the few nits I mentioned LGTM.
I did the same thing :)
llvm/test/CodeGen/PowerPC/aix-lower-constant-pool-index.ll | ||
---|---|---|
87 | Spoke offline with @Xiangling_L and the CHECK-NOTS are there so we can update the tests when we start writing out the toc entries. |
Some observation for constant pool lowering:
We put all the constant pool data into rodata section, and create one TC entry per constant pool reference.
It hinders the ability for linker to garbage collect unused data. (Generates them into a separate RO section helps a bit, but won't fully solve this issue)
And we would also create a lot of TC entries for constant pool that might not get cleanup.
For comparison, gcc generates the value of constant pool data directly into the TC entry. It's faster and garbage collectable if not used.
xlC generates all the constant pool data in a single csect, but reference different data with one TC entry. Less efficient, but still less TC entry generated, and it does not mess with the other rodata.
So with this implementation, it would work functionally. But performance and space wise, could not compare to the other two compilers.
I'm Okay with moving forward with some TODOs in our mind.
llvm/lib/CodeGen/MachineModuleInfo.cpp | ||
---|---|---|
126 | nit: CSect -> Csect |
I did some investigation about LLVM constant pool, as I remember, except for int value which will be put directly into TC entry, others like double, float etc. will be put into constant pool in LLVM. If we want to mimic GCC behavior, it probably would be a large piece of refactoring.
xlC generates all the constant pool data in a single csect, but reference different data with one TC entry. Less efficient, but still less TC entry generated, and it does not mess with the other rodata.
Sorry, but can you list your testcase and some explanatory emitted results here, I am not able to reproduce what you said. And based on my observation, xlC generates one RO csect and one TC entry for each global const value [eg. float a = 5.56].
So with this implementation, it would work functionally. But performance and space wise, could not compare to the other two compilers.
I'm Okay with moving forward with some TODOs in our mind.
Thank you for your advice. Some plan I have in mind for emitting jump table index is that once we start supporting unique sections something like -fdata-sections, then we can emit each JTI to unique RO section. So I put an assertion in getSectionForJumpTable. And based on your suggestion I think I can do something similar in getSectionForConstant. Please let me know if this will resolve your concerns.
For comparison, gcc generates the value of constant pool data directly into the TC entry. It's faster and garbage collectable if not used.
I did some investigation about LLVM constant pool, as I remember, except for int value which will be put directly into TC entry, others like double, float etc. will be put into constant pool in LLVM. If we want to mimic GCC behavior, it probably would be a large piece of refactoring.
I don't think llvm actually try to put int directly into TC entry, it always put a label in TC entry, or we just use that integer value directly in the assembly. But please correct me if I'm wrong.
We don't need to switch to that behavior right now, but later when we care about performance then we might want to think about it.
Here's my test case:
float foo(){ return 1.2; } double bar(){ return 3.4; }
GCC generates:
LC..0: .tc FS_3f99999a[TC],0x3f99999a lfs 0,LC..0(2) LC..1: .tc FD_400b3333_33333333[TC],0x400b3333,0x33333333 lfd 0,LC..1(2)
- Notice that the value is store directly into the TC entry.
xlc:
.csect H.20.NO_SYMBOL{RO}, 3 .long 0x3ff33333 # "?\36333" .long 0x33333333 # "3333" .long 0x400b3333 # "@\v33" .long 0x33333333 # "3333" T.18.NO_SYMBOL: .tc H.18.NO_SYMBOL{TC},H.20.NO_SYMBOL{RO} l r31,T.18.NO_SYMBOL(RTOC) lfd fp1,0(r31) l r31,T.18.NO_SYMBOL(RTOC) lfd fp1,8(r31)
- Notice that xlc only used 1 tc entry to access this RO data. It use a dedicate csect for constant pool instead of mixing with the other read only data. (you could add a "const int a= 10;" into the test case, and it won't use the same csect as constant pool)
xlC generates all the constant pool data in a single csect, but reference different data with one TC entry. Less efficient, but still less TC entry generated, and it does not mess with the other rodata.
Sorry, but can you list your testcase and some explanatory emitted results here, I am not able to reproduce what you said. And based on my observation, xlC generates one RO csect and one TC entry for each global const value [eg. float a = 5.56].
See my test case and explanations above.
So with this implementation, it would work functionally. But performance and space wise, could not compare to the other two compilers.
I'm Okay with moving forward with some TODOs in our mind.Thank you for your advice. Some plan I have in mind for emitting jump table index is that once we start supporting unique sections something like -fdata-sections, then we can emit each JTI to unique RO section. So I put an assertion in getSectionForJumpTable. And based on your suggestion I think I can do something similar in getSectionForConstant. Please let me know if this will resolve your concerns.
-fdata-sections would help for the garbage collection case. But for performance wise, we might still want to investigate if we could do something similar like what GCC does (at least for constant pool).
I don't think llvm actually try to put int directly into TC entry, it always put a label in there. But please correct me if I'm wrong.
Sorry, actually I meant that LLVM put int directly into instruction, there even won't be any TC entry for it. Others, like float, double etc. will be put into constant pool and each of them has a TC entry.
-fdata-sections would help for the garbage collection case. But for performance wise, we might still want to investigate if we could do something similar like what GCC does (at least for constant pool).
Yes, I agree that GCC behavior is faster because there is no one extra step to load real value from the place TC entry points to. I will do more investigation about this and see if we want to address this in this PR.
Format looks off here. You might used tabs instead of space.