Setting alignment of .text to PageAlign results in a wasted space of 2mb(or 4kb with -no-huge-pages) after program header table.
I think it doesn't make sense to align text to such a huge values, and aligning the segment itself is enough.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The virtual address % page alignment must be equal to file offset % page alignment, does this ratio remains true without this code?
Of course it remains the same, the only difference is that .text comes right after program header.
The new text segment has newly allocated virtual address, offset and probably changed page alignment, I don't see how it remains the same in case the VA become aligned and the offset is don't.
Before patch
PHDR 0x200000 0x0000000000200000 0x0000000000200000 0x0002a0 0x0002a0 R 0x8 LOAD 0x200000 0x0000000000200000 0x0000000000200000 0x200404 0x200404 R E 0x200000 ... [26] .text PROGBITS 0000000000400000 400000 0001b1 00 AX 0 0 2097152
After
PHDR 0x200000 0x0000000000200000 0x0000000000200000 0x0002a0 0x0002a0 R 0x8 LOAD 0x200000 0x0000000000200000 0x0000000000200000 0x000744 0x000744 R E 0x200000 ... [26] .text PROGBITS 0000000000200340 200340 0001b1 00 AX 0 0 64
Offset is aligned with address, we just remove empty space both in file and in memory
In both cases we are loading from the same file offset the same amount of memory. Could you please give full sections headers before and after so I could compare them?
Full readelf before:
There are 34 section headers, starting at offset 0x400f40: Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 0] NULL 0000000000000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 00000000000002a8 0002a8 00001c 00 A 0 0 1 [ 2] .note.ABI-tag NOTE 00000000000002c4 0002c4 000020 00 A 0 0 4 [ 3] .gnu.hash GNU_HASH 00000000000002e8 0002e8 000024 00 A 4 0 8 [ 4] .dynsym DYNSYM 0000000000000310 000310 0000a8 18 A 5 1 8 [ 5] .dynstr STRTAB 00000000000003b8 0003b8 000084 00 A 0 0 1 [ 6] .gnu.version VERSYM 000000000000043c 00043c 00000e 02 A 4 0 2 [ 7] .gnu.version_r VERNEED 0000000000000450 000450 000020 00 A 5 1 8 [ 8] .rela.dyn RELA 0000000000000470 000470 0000c0 18 A 4 0 8 [ 9] .rela.plt RELA 0000000000000530 000530 000018 18 AI 4 22 8 [10] .init PROGBITS 0000000000001000 001000 00001b 00 AX 0 0 4 [11] .plt PROGBITS 0000000000001020 001020 000020 10 AX 0 0 16 [12] .plt.got PROGBITS 0000000000001040 001040 000008 08 AX 0 0 8 [13] .bolt.org.text PROGBITS 0000000000001050 001050 0001a5 00 AX 0 0 16 [14] .fini PROGBITS 00000000000011f8 0011f8 00000d 00 AX 0 0 4 [15] .rodata PROGBITS 0000000000002000 002000 000012 00 A 0 0 4 [16] .bolt.org.eh_frame_hdr PROGBITS 0000000000002014 002014 000044 00 A 0 0 4 [17] .bolt.org.eh_frame PROGBITS 0000000000002058 002058 000110 00 A 0 0 8 [18] .init_array INIT_ARRAY 0000000000003de8 002de8 000008 08 WA 0 0 8 [19] .fini_array FINI_ARRAY 0000000000003df0 002df0 000008 08 WA 0 0 8 [20] .dynamic DYNAMIC 0000000000003df8 002df8 0001e0 10 WA 5 0 8 [21] .got PROGBITS 0000000000003fd8 002fd8 000028 08 WA 0 0 8 [22] .got.plt PROGBITS 0000000000004000 003000 000020 08 WA 0 0 8 [23] .data PROGBITS 0000000000004020 003020 000010 00 WA 0 0 8 [24] .tm_clone_table PROGBITS 0000000000004030 003030 000000 00 WA 0 0 8 [25] .bss NOBITS 0000000000004030 003030 000008 00 WA 0 0 1 [26] .text PROGBITS 0000000000400000 400000 0001b1 00 AX 0 0 2097152 [27] .eh_frame PROGBITS 00000000004001b8 4001b8 0001e0 00 A 0 0 8 [28] .eh_frame_hdr PROGBITS 0000000000400398 400398 00006c 00 A 0 0 1 [29] .comment PROGBITS 0000000000000000 400404 0000a2 01 MS 0 0 1 [30] .symtab SYMTAB 0000000000000000 4004a8 000648 18 31 47 8 [31] .strtab STRTAB 0000000000000000 400af0 000220 00 0 0 1 [32] .shstrtab STRTAB 0000000000000000 400d10 00015d 00 0 0 1 [33] .note.bolt_info NOTE 0000000000000000 400e6d 0000a8 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), l (large), p (processor specific) Elf file type is DYN (Shared object file) Entry point 0x40001c There are 12 program headers, starting at offset 2097152 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x200000 0x0000000000200000 0x0000000000200000 0x0002a0 0x0002a0 R 0x8 INTERP 0x0002a8 0x00000000000002a8 0x00000000000002a8 0x00001c 0x00001c R 0x1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000548 0x000548 R 0x1000 LOAD 0x001000 0x0000000000001000 0x0000000000001000 0x000205 0x000205 R E 0x1000 LOAD 0x002000 0x0000000000002000 0x0000000000002000 0x000168 0x000168 R 0x1000 LOAD 0x002de8 0x0000000000003de8 0x0000000000003de8 0x000248 0x000250 RW 0x1000 LOAD 0x200000 0x0000000000200000 0x0000000000200000 0x200404 0x200404 R E 0x200000 DYNAMIC 0x002df8 0x0000000000003df8 0x0000000000003df8 0x0001e0 0x0001e0 RW 0x8 NOTE 0x0002c4 0x00000000000002c4 0x00000000000002c4 0x000020 0x000020 R 0x4 GNU_EH_FRAME 0x400398 0x0000000000400398 0x0000000000400398 0x00006c 0x00006c R 0x4 GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10 GNU_RELRO 0x002de8 0x0000000000003de8 0x0000000000003de8 0x000218 0x000218 R 0x1 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt 03 .init .plt .plt.got .bolt.org.text .fini 04 .rodata .bolt.org.eh_frame_hdr .bolt.org.eh_frame 05 .init_array .fini_array .dynamic .got .got.plt .data .bss 06 .text .eh_frame .eh_frame_hdr 07 .dynamic 08 .note.ABI-tag 09 .eh_frame_hdr 10 11 .init_array .fini_array .dynamic .got
Full readelf after:
There are 34 section headers, starting at offset 0x201280: Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 0] NULL 0000000000000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 00000000000002a8 0002a8 00001c 00 A 0 0 1 [ 2] .note.ABI-tag NOTE 00000000000002c4 0002c4 000020 00 A 0 0 4 [ 3] .gnu.hash GNU_HASH 00000000000002e8 0002e8 000024 00 A 4 0 8 [ 4] .dynsym DYNSYM 0000000000000310 000310 0000a8 18 A 5 1 8 [ 5] .dynstr STRTAB 00000000000003b8 0003b8 000084 00 A 0 0 1 [ 6] .gnu.version VERSYM 000000000000043c 00043c 00000e 02 A 4 0 2 [ 7] .gnu.version_r VERNEED 0000000000000450 000450 000020 00 A 5 1 8 [ 8] .rela.dyn RELA 0000000000000470 000470 0000c0 18 A 4 0 8 [ 9] .rela.plt RELA 0000000000000530 000530 000018 18 AI 4 22 8 [10] .init PROGBITS 0000000000001000 001000 00001b 00 AX 0 0 4 [11] .plt PROGBITS 0000000000001020 001020 000020 10 AX 0 0 16 [12] .plt.got PROGBITS 0000000000001040 001040 000008 08 AX 0 0 8 [13] .bolt.org.text PROGBITS 0000000000001050 001050 0001a5 00 AX 0 0 16 [14] .fini PROGBITS 00000000000011f8 0011f8 00000d 00 AX 0 0 4 [15] .rodata PROGBITS 0000000000002000 002000 000012 00 A 0 0 4 [16] .bolt.org.eh_frame_hdr PROGBITS 0000000000002014 002014 000044 00 A 0 0 4 [17] .bolt.org.eh_frame PROGBITS 0000000000002058 002058 000110 00 A 0 0 8 [18] .init_array INIT_ARRAY 0000000000003de8 002de8 000008 08 WA 0 0 8 [19] .fini_array FINI_ARRAY 0000000000003df0 002df0 000008 08 WA 0 0 8 [20] .dynamic DYNAMIC 0000000000003df8 002df8 0001e0 10 WA 5 0 8 [21] .got PROGBITS 0000000000003fd8 002fd8 000028 08 WA 0 0 8 [22] .got.plt PROGBITS 0000000000004000 003000 000020 08 WA 0 0 8 [23] .data PROGBITS 0000000000004020 003020 000010 00 WA 0 0 8 [24] .tm_clone_table PROGBITS 0000000000004030 003030 000000 00 WA 0 0 8 [25] .bss NOBITS 0000000000004030 003030 000008 00 WA 0 0 1 [26] .text PROGBITS 0000000000200340 200340 0001b1 00 AX 0 0 64 [27] .eh_frame PROGBITS 00000000002004f8 2004f8 0001e0 00 A 0 0 8 [28] .eh_frame_hdr PROGBITS 00000000002006d8 2006d8 00006c 00 A 0 0 1 [29] .comment PROGBITS 0000000000000000 200744 0000a2 01 MS 0 0 1 [30] .symtab SYMTAB 0000000000000000 2007e8 000648 18 31 47 8 [31] .strtab STRTAB 0000000000000000 200e30 000220 00 0 0 1 [32] .shstrtab STRTAB 0000000000000000 201050 00015d 00 0 0 1 [33] .note.bolt_info NOTE 0000000000000000 2011ad 0000a8 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), l (large), p (processor specific) Elf file type is DYN (Shared object file) Entry point 0x20035c There are 12 program headers, starting at offset 2097152 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x200000 0x0000000000200000 0x0000000000200000 0x0002a0 0x0002a0 R 0x8 INTERP 0x0002a8 0x00000000000002a8 0x00000000000002a8 0x00001c 0x00001c R 0x1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000548 0x000548 R 0x1000 LOAD 0x001000 0x0000000000001000 0x0000000000001000 0x000205 0x000205 R E 0x1000 LOAD 0x002000 0x0000000000002000 0x0000000000002000 0x000168 0x000168 R 0x1000 LOAD 0x002de8 0x0000000000003de8 0x0000000000003de8 0x000248 0x000250 RW 0x1000 LOAD 0x200000 0x0000000000200000 0x0000000000200000 0x000744 0x000744 R E 0x200000 DYNAMIC 0x002df8 0x0000000000003df8 0x0000000000003df8 0x0001e0 0x0001e0 RW 0x8 NOTE 0x0002c4 0x00000000000002c4 0x00000000000002c4 0x000020 0x000020 R 0x4 GNU_EH_FRAME 0x2006d8 0x00000000002006d8 0x00000000002006d8 0x00006c 0x00006c R 0x4 GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10 GNU_RELRO 0x002de8 0x0000000000003de8 0x0000000000003de8 0x000218 0x000218 R 0x1 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt 03 .init .plt .plt.got .bolt.org.text .fini 04 .rodata .bolt.org.eh_frame_hdr .bolt.org.eh_frame 05 .init_array .fini_array .dynamic .got .got.plt .data .bss 06 .text .eh_frame .eh_frame_hdr 07 .dynamic 08 .note.ABI-tag 09 .eh_frame_hdr 10 11 .init_array .fini_array .dynamic .got
As you see we make LOAD segment smaller by eliminating unnecessary padding, and we don't touch neither segment address nor offset/alignment
After editing the message it makes sense with different segments sizes before/after.
The thing is that the text is still padded here to 0x200000 which is good, the question is why it was double padded to 400000 before. It seems that it is done automatically by the llvm emitter, that is why the manually padding here is redundant.
If you mean text segment, yes it is properly aligned. As to why .text section was aligned additionally - well, it was indeed BinaryEmitter, but not automatically, it just used alignment which was passed to it. Now we stop passing huge alignment and all the padding is gone.
By automatically I meant that it is job for emitter. As I see the padding is still exists - between 003030 and 200340 file offset there is no data stored. But it is OK, the LOAD segment is still properly aligned for the huge pages.
Turns out we cannot remove padding with -hugify, so i preserve it in this case and hope all tests will pass.
Could you tell more what is the problem with hugify in this case? Huge pages might be used without hugify, would it affect the usage?
Hugify requires empty space before and after .text to properly remap it's contents. In abscence of -hugify it doesn't make a difference because dynamic loader operates on segments and i didn't change segment address or alignment.
The issue can also be solved if we force -hugify to remap whole segment in the same fashion as dynamic loader, but it's not trivial and we don't care that much
I see, thanks. Overall as I said this (double) alignment indeed seems to be excess. Let's wait what meta team members would tell us about that.
To me, it appears there are two outstanding issues that we need to solve. One is that we use large pages by default. Without hugify/hottext it doesn't make sense unless dynamic loader supports large pages. The second issue is that we align .text regardless of how many bytes it takes, leading sometimes to ~2MB "waste". Just using 64-byte alignment and relying on the containing segment being aligned to the page size will not always work. One example of the latter is "-use-old-text". I think using the MaxAlignBytes might be better suited to address this issue. What do you think?
To me, it appears there are two outstanding issues that we need to solve. One is that we use large pages by default. Without hugify/hottext it doesn't make sense unless dynamic loader supports large pages.
My understanding is that newer kernels/loaders support large pages and can properly map them without the use of -hugify, so it makes sense to use large pages by default. I wouldn't call it an issue, if you don't like it you can always use -no-huge-pages.
Why it will not work with -use-old-text? The whole point of this option is to put new text in place of old, and by trying to align it to 2 MB you make it unachievable unless the binary was prelinked with huge padding. If you still want 2MB aligned text, you can pass align-text=0x200000. I think this patch just makes defaults more reasonable, otherwise you can always align however you want. And i also don't understand why MaxAlignBytes might be better
Could you point to the kernel patch/version that enables it? It will be good to test the functionality.
Why it will not work with -use-old-text? The whole point of this option is to put new text in place of old, and by trying to align it to 2 MB you make it unachievable unless the binary was prelinked with huge padding.
What exactly is "unachievable"? For one of our apps, we use ICF to save ~20MB of code and the new code fits into the original space with 2MB alignment.
I think this patch just makes defaults more reasonable, otherwise you can always align however you want.
From the performance perspective, the default alignment of 64 bytes for .text does not sound reasonable to me . It should be page-aligned.
And i also don't understand why MaxAlignBytes might be better
It has an advantage to explicitly express the intention of not using more than MaxAlignBytes for alignment. The current patch has the assumption of having "small" program header before .text in a page-aligned segment which makes it fragile.
I researched a bit and can say the following things:
- I assumed the problem with hugify is just that older ld-linux or kernels ignored 0x200000 alignment of the segment, but turned out it is just part of the problem. It seems like it's caused by the loader, and with glibc 2.28(4.18.0 EulerOS) aligment is ignored, while on newer Ubuntu 5.10.0-1013-oem with glibc 2.31 addresses are assigned properly.
- I was wrong about out-of-box HugePage support, because aside from big alignment HugePages must be anonymous. Which means we need to use -hugify to insert code which will remap code section as anonymous and call madvise to ensure huge page. It also means that current implementation of -hugify is useless on newer kernels because if addresses are properly aligned, it does not create anonymous mapping and calls madvise right away, which does not return huge page(it can be checked by cat /proc/PID/smaps | grep HugePage, there will be all zeroes).
Why it will not work with -use-old-text? The whole point of this option is to put new text in place of old, and by trying to align it to 2 MB you make it unachievable unless the binary was prelinked with huge padding.
What exactly is "unachievable"? For one of our apps, we use ICF to save ~20MB of code and the new code fits into the original space with 2MB alignment.
Unachievable was wrong word, but on binaries i've seen hot + cold is usually a bit bigger than original .text even without alignment, so adding padding reduces chances to fit even more. 20MB saved by ICF must be a huge binary and relatively rare case. Anyway, it's not the matter here.
I think this patch just makes defaults more reasonable, otherwise you can always align however you want.
From the performance perspective, the default alignment of 64 bytes for .text does not sound reasonable to me . It should be page-aligned.
Aligning to regular page align seems reasonable in case of use-old-text(if we accept that for most mid-sized binaries it's not possible), even though i'm not sure it significantly affects performance without huge pages, but if we create new segment, 64 bytes seems fine, isn't it?
And i also don't understand why MaxAlignBytes might be better
It has an advantage to explicitly express the intention of not using more than MaxAlignBytes for alignment. The current patch has the assumption of having "small" program header before .text in a page-aligned segment which makes it fragile.
But we care more about alignment value than padding size. I think we're better off removing the assumption by explicitly handling use-old-text, instead of introducing new option.
It also means that current implementation of -hugify is useless on newer kernels because if addresses are properly aligned, it does not create anonymous mapping and calls madvise right away, which does not return huge page(it can be checked by cat /proc/PID/smaps | grep HugePage, there will be all zeroes).
Are you sure? They might be marked differently in /proc/pid/smaps, e.g. FilePmdMapped.
Aligning to regular page align seems reasonable in case of use-old-text(if we accept that for most mid-sized binaries it's not possible), even though i'm not sure it significantly affects performance without huge pages,
I don't have any performance data for regular-size pages. It will be good to collect some, e.g. for clang. If there's indeed no difference, we can do page-size alignment only for huge pages.
but if we create new segment, 64 bytes seems fine, isn't it?
Is it b/c we assume there's nothing but a PHDR table before the code in the new segment?
But we care more about alignment value than padding size.
Should we always align and disregarding the padding then (i.e. keep things as they are)? Sorry, if I missed your point.
Grepping for FilePmdMapped also doesn't show any non-zero values, so i'm pretty sure. And the doc for HugePages and Madvise says clearly that HugePages only work for anonymous mappings:
From here:
Currently, Transparent Huge Pages work only with private anonymous pages (see mmap(2)).
From here:
Currently THP only works for anonymous memory mappings and tmpfs/shmem. But in the future it can expand to other filesystems.
And here is what happens in the kernel:
call to khugepaged_enter_vma
call to hugepage_vma_check
abort if not anonymous
I don't have any performance data for regular-size pages. It will be good to collect some, e.g. for clang. If there's indeed no difference, we can do page-size alignment only for huge pages.
I'm not sure how to properly collect such data without polluting it with noise from OS and build system. Simply timing builds will produce a very diverse set of values without obvious correlation with the binary used.
Is it b/c we assume there's nothing but a PHDR table before the code in the new segment?
Well, if we create new segment it's guaranteed there's nothing but a PHDR table before the code because we call mapCodeSections before anything else.
Should we always align and disregarding the padding then (i.e. keep things as they are)? Sorry, if I missed your point.
My point is to align only when it makes sense, and aligning after PHDR does not. With use-old-text it's not so clear, so we can align in that case if you feel like it, but at least we shouldn't align after PHDR. And also it doesn't make sense to align to 2 MB in absence of hugify(because we won't get huge pages anyway), so i think we should act as if -no-huge-pages is true by default. What do you think?
And if we don't get Huge Pages without -hugify, why would we keep -no-huge-pages at all?
Thank you for the investigation on THPs and hugify. Indeed, it didn't work for me either. Putting munlock(2) before madvise(2) helps though. Could you please check D140425 on your end?
If you don't have a benchmark system setup, it's fine. I'll see if I can run something quickly.
Is it b/c we assume there's nothing but a PHDR table before the code in the new segment?
Well, if we create new segment it's guaranteed there's nothing but a PHDR table before the code because we call mapCodeSections before anything else.
At the moment - yes. So this is the assumption there.
Should we always align and disregarding the padding then (i.e. keep things as they are)? Sorry, if I missed your point.
My point is to align only when it makes sense, and aligning after PHDR does not.
How do you define what makes sense?
With use-old-text it's not so clear, so we can align in that case if you feel like it, but at least we shouldn't align after PHDR.
Why? Because PHDR is "small"?
And also it doesn't make sense to align to 2 MB in absence of hugify(because we won't get huge pages anyway), so i think we should act as if -no-huge-pages is true by default. What do you think?
I agree that it doesn't make sense to align at 2MB in the absence of hugify *and* --hot-text.
Internally, we use huge pages all the time, and at the time it was a sensible default choice. I agree that there's no need for 2MB alignment when huge pages are not used.
To be clear, when BOLT sees huge page symbols __hot_{start|end} on the input, it needs to use 2MB for code alignment. That's essentially the --hot-text option.
I applied the patch, and BOLTing helloworld, launching it from GDB and running cat /proc/PID/smaps | grep Huge or grep FilePmdMapped still doesn't show any non-zero values. I tried with /sys/kernel/mm/transparent_hugepage/enabled set to always and to madvise. I also tried processing redis-server, launching it and looking at it's mappings, and in that case i was able to find 6MB of HugePages, but it was mapped somewhere at 7ffff7400000-7ffff7c00000 which is definetely not .text section and looks like some internal anonymous mapping, and i am able to find it even if processing without the patch. If, however, i modify __bolt_hugify_self_impl to always call hugifyForOldKernel, i'm able to see HugePages for .text mapping both in helloworld and redis. All of the above was done on 5.10.0-1013-oem. But i also found that in kernel sources i linked above i missed a call to file_thp_enabled before check for anonymity. For this call to succeed, READ_ONLY_THP_FOR_FS config option must be enabled, and in our setup it is not set, so it's likely the cause of the issue. Can you check if it's enabled on your system?
How do you define what makes sense?
Well, something makes sense when benefits outweigh costs. The benefits of aligning after PHDR are not clear because PHDR occupies a small fraction of a page, especially in the context of huge pages, so it's unlikely to significantly increase TLB misses or otherwise affect performance. However, a benchmark would be interesting. While the costs are 2MB in padding space + in case of AArch64, more stubs inserted during longJmp. Processing clang without any flags renders 347691 stubs, with -no-huge-pages - 327125, and difference in .text size is 176 kb. Not that much, but it think it's reasonable to keep sections together unless there is a performance penalty. And i guess those 20k additional stubs also have some performance implications.
With use-old-text it's not so clear, so we can align in that case if you feel like it, but at least we shouldn't align after PHDR.
Why? Because PHDR is "small"?
Yes + the above
And also it doesn't make sense to align to 2 MB in absence of hugify(because we won't get huge pages anyway), so i think we should act as if -no-huge-pages is true by default. What do you think?
I agree that it doesn't make sense to align at 2MB in the absence of hugify *and* --hot-text.
I'll update the patch to set pageAlign to regular page size unless there's hugify or hot-text.
Now we just have to find more or less a reasonable strict definition for the alignment threshold. Some refactoring is required, but once we can pass MaxBytesToEmit to MCStreamer::emitCodeAlignment() for .text, it will solve the problem while also being future-proof. I believe it will also be helpful for the work you were planning to do on the complete binary rewrite.
Now we just have to find more or less a reasonable strict definition for the alignment threshold. Some refactoring is required, but once we can pass MaxBytesToEmit to MCStreamer::emitCodeAlignment() for .text, it will solve the problem while also being future-proof. I believe it will also be helpful for the work you were planning to do on the complete binary rewrite.
I don't fully understand why do you want to use emitCodeAlignment in the first place. From what i can see it's intended to align functions, and for sections we have Section->setAlignment() which is used currently with AlignText. Do you mean to declare some small "official" alignment with setAlignment(which will end up in the section header) and then align additionaly with emitCodeAlignment and MaxBytesToEmit before the first function? Why bother that much and what is wrong with current version of the patch?