Page MenuHomePhabricator

[COFF, ARM64] Remove definitions for _byteswap library functions
ClosedPublic

Authored by TomTan on Feb 7 2019, 11:01 AM.

Details

Summary

_byteswap_* functions are are implemented in below file as normal function
from libucrt.lib and declared in stdlib.h. Define them in intrin.h triggers
lld error "conflicting comdat type" and "duplicate symbols" which was just
added to LLD (https://reviews.llvm.org/D57324).

C:\Program Files (x86)\Windows Kits\10\Source\10.0.17763.0\ucrt\stdlib\byteswap.cpp

Diff Detail

Repository
rL LLVM

Event Timeline

TomTan created this revision.Feb 7 2019, 11:01 AM

Should clang IR generation be lowering these to bswap calls anyway? Even if the function technically exists in the ucrt, it's going to be pretty slow to call it.

Please leave the tests; fix the CHECK lines, if necessary, but we should still check it compiles.

TomTan updated this revision to Diff 185849.Feb 7 2019, 1:04 PM

Added the tests back. Clang IR should not lower these to bswap calls because they are global library functions. It might be slower to make the call to library function than bswap, but this is the same for other architectures supported by Windows. And just redefine global library function triggers link error in some scenarios.

Added the tests back. Clang IR should not lower these to bswap calls because they are global library functions. It might be slower to make the call to library function than bswap, but this is the same for other architectures supported by Windows. And just redefine global library function triggers link error in some scenarios.

I think there is some deeper reasoning is being omitted here.
Why does the fact that those are "global library functions" bans clang from lowering them into IR?
(and thus, "prevents" LLVM form doing optimizations)

TomTan added a comment.Feb 7 2019, 2:36 PM

Added the tests back. Clang IR should not lower these to bswap calls because they are global library functions. It might be slower to make the call to library function than bswap, but this is the same for other architectures supported by Windows. And just redefine global library function triggers link error in some scenarios.

I think there is some deeper reasoning is being omitted here.
Why does the fact that those are "global library functions" bans clang from lowering them into IR?
(and thus, "prevents" LLVM form doing optimizations)

The current issue could be caused by the implementation of comdat selection in LLD as below which provides more strict conflict check than MSVC link does.

These _bytewap_* in intrin.h were not intended as optimization, instead, it is expected to be consistent with declarations in stdlib.h. For optimization, it makes sense to enable it for all architectures supported by Windows, and make sure it works with LLD.

https://github.com/llvm-mirror/lld/blob/b6584c3fab115aa46dad27681af7eb3d4e5d2b35/COFF/InputFiles.cpp#L494

Added the tests back. Clang IR should not lower these to bswap calls because they are global library functions. It might be slower to make the call to library function than bswap, but this is the same for other architectures supported by Windows. And just redefine global library function triggers link error in some scenarios.

I think there is some deeper reasoning is being omitted here.
Why does the fact that those are "global library functions" bans clang from lowering them into IR?
(and thus, "prevents" LLVM form doing optimizations)

The current issue could be caused by the implementation of comdat selection in LLD as below which provides more strict conflict check than MSVC link does.

These _bytewap_* in intrin.h were not intended as optimization, instead, it is expected to be consistent with declarations in stdlib.h. For optimization, it makes sense to enable it for all architectures supported by Windows, and make sure it works with LLD.

https://github.com/llvm-mirror/lld/blob/b6584c3fab115aa46dad27681af7eb3d4e5d2b35/COFF/InputFiles.cpp#L494

These functions are also listed in https://github.com/llvm-mirror/clang/blob/c18e7e9007970a3105617f03bc9d1de89fa1a3e1/include/clang/Basic/Builtins.def#L773-L775
Are you sure they shouldn't be simply dropped from lib/Headers/intrin.h?

I notice they were just added in D56685, but that review has no commit message, so the problem it addressed is not documented..
So as-is this all looks a bit like hand-waving to me.

I did some quick testing with MSVC; apparently it inlines the implementations of these functions when optimizations are on. We definitely want to support inlining these. Since these are commonly used in performance-sensitive code, I'd prefer to implement the required changes to CodeGenFunction::EmitMSVCBuiltinExpr now, rather than chase after weird performance regressions in the future.


I'm not sure how you could end up with a "duplicate symbols" error from the current implementation, though; these functions are marked "static", so they shouldn't conflict with the real _byteswap_* functions.

TomTan added a comment.Feb 7 2019, 3:22 PM

Added the tests back. Clang IR should not lower these to bswap calls because they are global library functions. It might be slower to make the call to library function than bswap, but this is the same for other architectures supported by Windows. And just redefine global library function triggers link error in some scenarios.

I think there is some deeper reasoning is being omitted here.
Why does the fact that those are "global library functions" bans clang from lowering them into IR?
(and thus, "prevents" LLVM form doing optimizations)

The current issue could be caused by the implementation of comdat selection in LLD as below which provides more strict conflict check than MSVC link does.

These _bytewap_* in intrin.h were not intended as optimization, instead, it is expected to be consistent with declarations in stdlib.h. For optimization, it makes sense to enable it for all architectures supported by Windows, and make sure it works with LLD.

https://github.com/llvm-mirror/lld/blob/b6584c3fab115aa46dad27681af7eb3d4e5d2b35/COFF/InputFiles.cpp#L494

These functions are also listed in https://github.com/llvm-mirror/clang/blob/c18e7e9007970a3105617f03bc9d1de89fa1a3e1/include/clang/Basic/Builtins.def#L773-L775
Are you sure they shouldn't be simply dropped from lib/Headers/intrin.h?

I notice they were just added in D56685, but that review has no commit message, so the problem it addressed is not documented..
So as-is this all looks a bit like hand-waving to me.

The list in https://github.com/llvm-mirror/clang/blob/c18e7e9007970a3105617f03bc9d1de89fa1a3e1/include/clang/Basic/Builtins.def#L773-L775 doesn't require any implementation on Clang side,it provides below warning if they are not declared (include intrin.h or stdlib.h) before using. So it would not be affected.

test.cpp(3,12): warning: implicitly declaring library function '_byteswap_ushort' with type 'unsigned short (unsigned short)' [-Wimplicit-function-declaration]

return _byteswap_ushort(42);
       ^

test.cpp(3,12): note: include the header <stdlib.h> or explicitly provide a declaration for '_byteswap_ushort'

TomTan added a comment.Feb 7 2019, 3:30 PM

I did some quick testing with MSVC; apparently it inlines the implementations of these functions when optimizations are on. We definitely want to support inlining these. Since these are commonly used in performance-sensitive code, I'd prefer to implement the required changes to CodeGenFunction::EmitMSVCBuiltinExpr now, rather than chase after weird performance regressions in the future.


I'm not sure how you could end up with a "duplicate symbols" error from the current implementation, though; these functions are marked "static", so they shouldn't conflict with the real _byteswap_* functions.

It is a great idea to inline these in Clang, could this be tracked for a separate bug? The issue blocks Chromium build for Windows ARM64.

Yes, they are marked as static inline in intrin.h and are expected to link fine, but there are cases stdlib.h is included before including intrin.h, the former provides global declaration which seems inherited by the definition in intrin.h.

efriedma accepted this revision.Feb 7 2019, 3:47 PM

I guess we can track inlining separately, if you want to merge this quickly to unblock the Chrome build. LGTM

the former provides global declaration which seems inherited by the definition in intrin.h.

This is a bug, actually; we support a non-standard Microsoft extension for mismatched "static" markings, but the "static" is supposed to win.

Please make sure to file bugs for both the wrong linkage and inlining bswap.

This revision is now accepted and ready to land.Feb 7 2019, 3:47 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2019, 12:03 PM

Added the tests back. Clang IR should not lower these to bswap calls because they are global library functions. It might be slower to make the call to library function than bswap, but this is the same for other architectures supported by Windows. And just redefine global library function triggers link error in some scenarios.

I think there is some deeper reasoning is being omitted here.
Why does the fact that those are "global library functions" bans clang from lowering them into IR?
(and thus, "prevents" LLVM form doing optimizations)

The current issue could be caused by the implementation of comdat selection in LLD as below which provides more strict conflict check than MSVC link does.

lld isn't supposed to be more strict than link.exe. lld used to not implement the comdat selection field until recently, so lld got more strict – but it should've gotten only as strict as link.exe, not moreso. Do you have an example where lld is more strict than link.exe?

Added the tests back. Clang IR should not lower these to bswap calls because they are global library functions. It might be slower to make the call to library function than bswap, but this is the same for other architectures supported by Windows. And just redefine global library function triggers link error in some scenarios.

I think there is some deeper reasoning is being omitted here.
Why does the fact that those are "global library functions" bans clang from lowering them into IR?
(and thus, "prevents" LLVM form doing optimizations)

The current issue could be caused by the implementation of comdat selection in LLD as below which provides more strict conflict check than MSVC link does.

lld isn't supposed to be more strict than link.exe. lld used to not implement the comdat selection field until recently, so lld got more strict – but it should've gotten only as strict as link.exe, not moreso. Do you have an example where lld is more strict than link.exe?

My previous reply is LLD comdat linking issue was misleading. The current issue is that when function declaration (extern) and definition (static inline) shown up in sequence, the function declaration property extern wins over static in the resulting COMDAT, which causes linking issue for both LLD and link.exe. But this issue was just exposed by the recent addition of duplicated symbols check in LLD.