This is an archive of the discontinued LLVM Phabricator instance.

[AVR] Fix miscompilation of zext + add
ClosedPublic

Authored by aykevl on Apr 18 2020, 5:37 PM.

Details

Summary

Code like the following:

define i32 @foo(i32 %a, i1 zeroext %b) addrspace(1) {
entry:
  %conv = zext i1 %b to i32
  %add = add nsw i32 %conv, %a
  ret i32 %add
}

Would compile to the following (incorrect) code:

foo:
    mov     r18, r20
    clr     r19
    add     r22, r18
    adc     r23, r19
    sbci    r24, 0
    sbci    r25, 0
    ret

Those sbci instructions are clearly wrong, they should have been adc instructions.

This commit improves codegen to use adc instead:

foo:
    mov     r18, r20
    clr     r19
    ldi     r20, 0
    ldi     r21, 0
    add     r22, r18
    adc     r23, r19
    adc     r24, r20
    adc     r25, r21
    ret

This code is not optimal (it could be just 5 instructions instead of the current 9) but at least it doesn't miscompile.


WARNING: I don't know what I'm doing here. While removing that pattern works, I don't know why it's there or whether I just introduced a different bug.

Diff Detail

Event Timeline

aykevl created this revision.Apr 18 2020, 5:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 18 2020, 5:37 PM
Herald added subscribers: Jim, hiraditya. · View Herald Transcript

I'm not sure that this is a miscompile. Note that AVR has no "add with immediate" instruction, instead subsumed by the "subtract with immediate" instruction, which of course with immediate values being immediately known to the compiler, can be negated directly at the callsite to effectively form a "add with immediate" instruction.

In this case, with the two snippets

(trunk)

foo:
    mov     r18, r20
    clr     r19
    add     r22, r18
    adc     r23, r19
    sbci    r24, 0
    sbci    r25, 0
    ret

(with patch)

foo:
    mov     r18, r20
    clr     r19
    ldi     r20, 0
    ldi     r21, 0
    add     r22, r18
    adc     r23, r19
    adc     r24, r20
    adc     r25, r21
    ret

The first two instructions and the final ret are identical in both, let's remove.

(trunk)

add     r22, r18
adc     r23, r19
sbci    r24, 0
sbci    r25, 0

(with patch)

ldi     r20, 0
ldi     r21, 0
add     r22, r18
adc     r23, r19
adc     r24, r20
adc     r25, r21

With the patch, the LDIs and r20 and r21 aren't used until the last two ADCs, let's reorder

(with patch)

add     r22, r18
adc     r23, r19
ldi     r20, 0
ldi     r21, 0
adc     r24, r20
adc     r25, r21

This is equivalent assembly as before, and now the two snippets both have the exact same initial two add+adc instructions. Let's remove from both snippets, leaving

(trunk)

sbci    r24, 0
sbci    r25, 0

(with patch)

ldi     r20, 0
ldi     r21, 0
adc     r24, r20
adc     r25, r21

So the only possible difference between the two variants, is in the addition of last two bytes from the 32-bit zero extended i1 value.

The remaining (trunk) asm just subtracts zero from the upper half of the non-extended 32-bit integer %a in registers r24:r25. This is the compiler inserting an addition of the immediate 0 (as -0 = 0). This makes sense because the corresponding bits from the i1 value have to be zero because it was zero-extended to 32-bits. A subtraction of zero is always redundant, therefore the two SBCIs from the first patch are completely redundant, and could be removed by the compiler in a future optimization.

The remaining (with patch) asm just adds zero to the upper half of the non-zero extended 32-bit integer %a in registers r24:25 but with extra steps to load the zero value into a register.

The only two remaining patches are completely identical in terms of execution as far as I can tell.

In summary: I think this is not a miscompile, let me know if you still think otherwise.

dylanmckay requested changes to this revision.Apr 18 2020, 11:16 PM
This revision now requires changes to proceed.Apr 18 2020, 11:16 PM
aykevl added a comment.EditedApr 19 2020, 7:01 AM

It took me much longer than it should to find a solid reproducer, but here is one:

// in one file
uint32_t add(uint32_t a, uint16_t b) {
    return a + b;
}

// in another file
uint32_t add(uint32_t a, uint16_t b);


// somewhere in the code of that file
uart_printint("add", add(0x7ffff, 1));

Adding 0x7ffff and 1 should result in 0x80000. However, without this patch it results in 0x60000.

Assembly without this patch:

d20:       64 0f           add     r22, r20
d22:       75 1f           adc     r23, r21
d24:       80 40           sbci    r24, 0x00       ; 0
d26:       90 40           sbci    r25, 0x00       ; 0
d28:       08 95           ret

With this patch:

d1c:       20 e0           ldi     r18, 0x00       ; 0
d1e:       30 e0           ldi     r19, 0x00       ; 0
d20:       64 0f           add     r22, r20
d22:       75 1f           adc     r23, r21
d24:       82 1f           adc     r24, r18
d26:       93 1f           adc     r25, r19
d28:       08 95           ret

A subtraction of zero is always redundant, therefore the two SBCIs from the first patch are completely redundant, and could be removed by the compiler in a future optimization.

The sbci not only subtracts an immediate (in this case zero), but it also subtracts the carry bit. Therefore it definitely does have an effect. This also is true for the last two adc instructions: they are needed to add the carry bit in case the previous add instruction caused a wraparound. However sbci and adc use the carry in the opposite direction (subtracting or adding it to their result).

Looking at AVRInstrInfo.td, I suspect there are a few more bugs around sbci.

I found this bug while trying to add AVR support to compiler-rt. One instance of the bug is here (the +=). I originally found it somewhere else, but had trouble finding the exact location again.

dylanmckay accepted this revision.Jun 18 2020, 4:38 AM

Good reproduction, works exactly as expected and indeed this value is being miscalculated.

https://github.com/dylanmckay/avr-compiler-integration-tests/commit/481d69f92b25d267cdc9b74e61e37a31bb78f8c9

Thanks for the patch Ayke.

This revision is now accepted and ready to land.Jun 18 2020, 4:38 AM
This revision was automatically updated to reflect the committed changes.
aykevl added a comment.EditedJun 18 2020, 8:08 AM

Patch committed!

I looked for other instances of this bug, see below.

def : Pat<(add i16:$src1, imm0_63_neg:$src2),
          (SBIWRdK i16:$src1, (imm0_63_neg:$src2))>;
def : Pat<(add i16:$src1, imm:$src2),
          (SUBIWRdK i16:$src1, (imm16_neg_XFORM imm:$src2))>;

Both don't involve a carry bit, so should be fine.

def : Pat<(addc i16:$src1, imm:$src2),
          (SUBIWRdK i16:$src1, (imm16_neg_XFORM imm:$src2))>;

addc means "add with carry" while subiw is expanded to subi/sbci which discards the incoming carry bit. This may be a bug.

def : Pat<(add i8:$src1, imm:$src2),
          (SUBIRdK i8:$src1, (imm8_neg_XFORM imm:$src2))>;

No carry bits, so should be fine.

def : Pat<(addc i8:$src1, imm:$src2),
          (SUBIRdK i8:$src1, (imm8_neg_XFORM imm:$src2))>;

Like addc i16, this discards the carry bit so may be a bug.

def : Pat<(adde i8:$src1, imm:$src2),
          (SBCIRdK i8:$src1, (imm8_neg_XFORM imm:$src2))>;

adde means an add with a carry input and carry output. The sbci instruction does the same thing, except that it uses the carry in the wrong direction (subtracting instead of adding). Basically the 8-bit version of the bug this patch is about.

I don't have any reproducers/proofs, but will leave this here for future reference.

I suspect the patterns are technically a bug, but may not show up in practice if they're all buggy in the exact same way. If all additions in a chain are replaced with subtractions, then it should work.