Hi all,
This is my initial attempt at implementing a __builtin_memcmp_inline function in LLVM. The motivation is to use it in an optimized LLVM Libc implementation of memcmp whilst still using -ffreestanding and -fno-builtin-memcmp.
I haven't added any tests and I know for a fact this goes completely wrong if your memcmp size is big enough to hit the maximum number of loads, or if you compile for size. I just wanted to bring up this RFC before I start going down the wrong path.
My current thinking is the following:
- Inline when compiling for size, the use of this builtin should force inlining where possible, regardless of global compilation options. My reasoning here is that if people really don't want to inline due to size considerations, don't use the inline variant of this builtin.
- I'm split on what to do for when the memcmp size is large enough to hit the maximum number of loads. I think there are three options: a) We ignore the maximum number of loads and just generate a lot of code... (kind of the same strategy with memcpy) b) We use a call to memcmp (and issue a diagnostic saying we did so?) c) Error out I'm not keen on (c), I suspect (a) would probably make the most sense and hope that users of the builtin are sensible?
Let me know if there are any strong opinions on this. Please add other reviewers if you know they would have an interest in this.
Kind regards,
Andre
memcmp