This is an archive of the discontinued LLVM Phabricator instance.

Implement __aeabi_c{d,f}{cmpeq,cmple,rcmple}.
ClosedPublic

Authored by jmgao on Aug 17 2015, 2:04 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

jmgao updated this revision to Diff 32342.Aug 17 2015, 2:04 PM
jmgao retitled this revision from to Implement __aeabi_c{d,f}{cmpeq,cmple,rcmple}..
jmgao updated this object.
jmgao added reviewers: compnerd, rengolin.
jmgao set the repository for this revision to rL LLVM.
jmgao added a subscriber: llvm-commits.
compnerd added inline comments.Aug 17 2015, 7:43 PM
lib/builtins/arm/aeabi_cdcmp.S
12

Is this actually portable? Can you rewrite this as:

__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__

or even:

__ARMEB__
13

Please replace FIXME with a slightly better message. How about something like:

#error big endian support not implemented
21

Did you mean the _helper here?

28

Please add a comment that as per the RTABI, this function *must* save/restore the core registers except ip, lr, and CPSR. And a note about lr since its not immediately obvious that it is for compactness.

31

Not a big fan of this helper name. It simply checks for NaNs.

33

Cant you hoist the pop {r0-r3} here?

39

Indicate that this is safe now since we have ruled out NaNs, so cmple can't trap.

83

Why not just swap and tail call?

lib/builtins/arm/aeabi_cdcmpeq_helper.c
13

Don't you need to check the mantissa to determine if it is a NaN?

lib/builtins/arm/aeabi_cfcmpeq_helper.c
13

Why the mask? Don't you need to check if the mantissa to determine if it is a NaN?

srhines added inline comments.Aug 17 2015, 7:51 PM
lib/builtins/arm/aeabi_cdcmp.S
20

Double comment on this line, and the rest of the lines are formatted strangely.

46

These comments don't match the implementation below, so something seems wrong.

lib/builtins/arm/aeabi_cdcmpeq_helper.c
19

Couldn't this be using __builtin_isnan instead of redefining it?

lib/builtins/arm/aeabi_cfcmpeq_helper.c
19

I have the same comment about using __builtin_isnan() here too.

rengolin edited edge metadata.Aug 18 2015, 4:19 AM

I agree with Stephen and Saleem on their comments, and I have some additional ones.

Also, avoid using push lr / pop pc, since this behaviour doesn't work across all ARM cores. There's a macro for returning that should be used.

cheers,
--renato

lib/builtins/arm/aeabi_cdcmp.S
20

you can move the comment up one line. Though, you may want to encode the logic in the example and avoid a comment.

31

me neither. If it's small enough, you can either write in asm or move to a separate static function in this file with a more descriptive name.

lib/builtins/arm/aeabi_cdcmpeq_helper.c
19

or even calling it directly from the asm function?

Thanks for the comments, I'll fix them up and have an updated patch shortly

lib/builtins/arm/aeabi_cdcmp.S
12

Fixed

13

Fixed

33

Yep, done

39

Fixed

46

Oops, the comments are incorrect. Fixed

83

I misinterpreted the RTABI doc, I was applying the "3 way status returning functions preserve all registers except ip, lr, cpsr" to the reversed 3 way functions as well.

Fixed

test/builtins/Unit/arm/call_apsr.S
12

indentation broken here

jmgao marked 15 inline comments as done.Aug 18 2015, 10:10 AM

Oops, I somehow managed to overlook the Done checkbox in Phabricator.

jmgao added inline comments.Aug 18 2015, 10:14 AM
lib/builtins/arm/aeabi_cfcmp.S
13

This comment is wrong

jmgao added a comment.Aug 18 2015, 1:52 PM

Also, avoid using push lr / pop pc, since this behaviour doesn't work across all ARM cores. There's a macro for returning that should be used.

Ugh, I didn't know that ARMv4's pop {pc} doesn't work with switching Thumb mode. Are there any other problems with that?

There are several other functions that use push {lr}, pop {pc}. What do you think about something along the lines of the following?

#if __ARM_ARCH > 4 || !defined(__ARM_ARCH_ISA_THUMB)
#define POP_FRAME(registers) \
    pop {registers, pc}
#else
#define POP_FRAME(registers) \
    pop {registers}          \
    pop {lr}                 \
    JMP(lr)
#endif
jmgao updated this revision to Diff 32474.Aug 18 2015, 4:28 PM
jmgao edited edge metadata.
jmgao removed rL LLVM as the repository for this revision.
jmgao marked 7 inline comments as done.

Address most of the comments

lib/builtins/arm/aeabi_cdcmp.S
28

If I'm reading it correctly, this function is not actually required to preserve the argument registers, only the c?cmple ones are. We just happen to need them after checking for NaNs.

I added the comment to c?cmple.

31

Renamed it to __aeabi_cdcmpeq_check_nan

lib/builtins/arm/aeabi_cdcmpeq_helper.c
19

Doesn't __builtin_isnan do overloading via magic and/or generic macros? Is there an actual symbol I can jump to?

lib/builtins/arm/aeabi_cfcmpeq_helper.c
13

Yeah, this is wrong, I replaced it with a __builtin_nan instead

srhines added inline comments.Aug 18 2015, 5:20 PM
lib/builtins/arm/aeabi_cdcmpeq_helper.c
19

It looks like you resolved the issue here (by calling through with the appropriate C type). Was this just a stale reply?

jmgao marked 2 inline comments as done.Aug 18 2015, 5:42 PM
jmgao added inline comments.
lib/builtins/arm/aeabi_cdcmpeq_helper.c
19

It was a response to the "calling it directly from the asm function" post from @rengolin in case that there actually is a way to call __builtin_isnan directly from asm, so I could do that instead.

compnerd added inline comments.Aug 18 2015, 7:43 PM
lib/builtins/arm/aeabi_cdcmp.S
30

Yeah, after a second (or third) reading, I agree. This function is exempt from that restriction. Im not sure what the benefit of saving lr here is, as you can just as well as bx lr in one case, and just tail call as you do in the other case.

34

I think that this plus the highlighting in my copy of the RTABI document caused the confusion. Saving behaviour of cdcmple is irrelevant.

35

You hoisted it one instruction too early. You just clobbered the result (r0).

45

Wouldn't this be shorter as:

__aeabi_cdcmpeq:
    push {r0-r3}
    blx __aeabi_cdcmpeq_check_nan
    cmp r0, #1
    pop {r0-r3}
    beq __aeabi_cdcmple
    msr CPSR_f, #APSR_C
    bx lr
74

It seems you can factor out the setting of the CPSR, and do a single return. Something along the lines of:

__aeabi_cdcmple:
  push {r0-r3}
  blx __aeabi_dcmplt
  cmp r0, #1
  moveq ip, #0
  beq 1f
  ldm sp, {r0-r3}
  blx __aeabi_dcmpeq
  cmp r0, #1
  moveq ip, #(APSR_Z | APSR_C)
  beq 1f
  mov ip, #APSR_C
1:msr CPSR_f, ip
  pop {r0-r3}
  bx lr
jmgao marked 2 inline comments as done.Aug 18 2015, 9:38 PM
jmgao added inline comments.
lib/builtins/arm/aeabi_cdcmp.S
30

The call to the check_nan helper trashes lr

35

Whoops. I'm not sure how I missed that. I guess the tests passed because it's not changing the behavior if the implementation doesn't trap on NaN and you don't pass in a float with a value of 1 in the first word.

45

Yes it would, except {r0-r3, lr} for the push/pop since the call to __aeabi_cdcmpeq_check_nan will trash lr.

76

How about this?

__aeabi_cdcmple:
    push {r0-r4, lr}

    bl __aeabi_dcmplt
    cmp r0, #1
    moveq r4, #0
    beq 1f

    ldm sp, {r0-r3}
    bl __aeabi_dcmpeq
    cmp r0, #1
    moveq r4, #(APSR_C | APSR_Z)
    movne r4, #(APSR_C)
1:
    msr CPSR_f, r4
    pop {r0-r4, pc}
compnerd added inline comments.Aug 19 2015, 10:41 AM
lib/builtins/arm/aeabi_cdcmp.S
45

Ah, right.

76

I like the idea of the moveq/movne ... though, Im not sure I understand the purpose of using r4 rather than ip which is already a scratch register. The pop {pc} has the issue that Renato pointed out wrt older chips, (the stack savings are probably not important).

jmgao updated this revision to Diff 32586.Aug 19 2015, 11:26 AM
jmgao marked an inline comment as done.
jmgao added inline comments.
lib/builtins/arm/aeabi_cdcmp.S
76

Oh yeah, using ip is fine here. I originally had orr r4, r4, #APSR_C; orreq r4, r4, #APSR_Z instead of the direct mov.

I added a macro for doing either pop {pc} or pop {ip}; bx ip

compnerd accepted this revision.Aug 19 2015, 8:56 PM
compnerd edited edge metadata.

Missing file headers on call_apsr.S and call_apr.h. Please use the DEFINE_COMPILERRT_PRIVATE_FUNCTION macros for call_apr_{d,f}. LGTM otherwise.

This revision is now accepted and ready to land.Aug 19 2015, 8:56 PM
jmgao added a comment.Aug 20 2015, 3:26 PM

Please use the DEFINE_COMPILERRT_PRIVATE_FUNCTION macros for call_apr_{d,f}.

Is #include "../../../../lib/builtins/assembly.h okay?

jmgao updated this revision to Diff 32765.Aug 20 2015, 4:04 PM
jmgao edited edge metadata.

Use DEFINE_COMPILERRT_PRIVATE_FUNCTION in call_apsr.S, file headers.

This revision was automatically updated to reflect the committed changes.