This is an archive of the discontinued LLVM Phabricator instance.

Generate -1/0/1 memcmp/strcmp result for z13
AbandonedPublic

Authored by uweigand on Aug 12 2016, 2:51 PM.

Details

Reviewers
RolandF
Summary

The current IPM sequence for strcmp/memcmp result value generates -MAXINT/0/1 rather than -1/0/1, which is a valid but slightly unusual result. Some programs may rely on -1. There was a postgres test failure that was attributed to this. For z13 we can use lochi to compute the result just as fast but get the expected answer. Update: Apparently there are php test failures related to this as well.

Diff Detail

Event Timeline

RolandF updated this revision to Diff 67918.Aug 12 2016, 2:51 PM
RolandF retitled this revision from to Generate -1/0/1 memcmp/strcmp result for z13.
RolandF updated this object.
RolandF added a subscriber: zhanjunl.
RolandF updated this object.Aug 16 2016, 3:11 PM
uweigand edited edge metadata.Aug 17 2016, 9:09 AM

Hmm, so we actually have two (or three) separate issues here.

First of all, I agree that memcmp etc. should return -1/0/1, to avoid confusion. However, this should be the same on z13 and earlier machines; the behaviour should not depend on the architecture level. In GCC, we actually achieve that same result using IPM, via an sll / sra sequence (instead of srl / rotl):

ipm     %r2
sll     %r2,2
sra     %r2,30

It would probably make sense to do the same in LLVM as well. (Note that this means optimzeCompareInstr would have to recognize this sequence too; or maybe it does make sense to make an intermediate SELECT_CMP node just to simplify the opimization step.)

The second issue is that indeed on z13, it would be preferable to use LOCHI instead of the IPM sequence. But this is a pure optimization issue, which should not affect observable behaviour (and can probably be implemented via a second patch). Note that the same LOCHI optimization would *also* be beneficial on z13 for the other cases where we currently use a(nother) IPM sequence to implement SELECT_CC.

Finally, looking at the GCC code, there is another optimization there that LLVM does not yet implement: a somewhat frequent use case is that that result of memcmp is cast to a long (either explicitly or implicitly due to ABI conventions). GCC is able to omit a separate sign-extension step by using -in this case- a sequence like:

ipm     %r2
sllg    %r2,%r2,34
srag    %r2,%r2,62

On z13, of course we'd want to implement this via LOCGHI.

Some more questions regarding details of your patch:

  • Why expand SELECT_CMP sometimes before and sometimes after RA? Is there any particular benefit to doing it early? (Usually, early expansion is only necessary when we want to create new basic blocks.)
  • What's the purpose of RotByReg in emitSelectCmp? It appears to be unused ...

The ipm/sll/sra sequence gives -2/0/1 rather than -1/0/1.

Since the ipm sequence was originally done even earlier, I hesitated delaying it all the way to post-RA. But if post-RA is preferred, I am happy to do the expansion there.

RotByReg was just left over from trying several ways of writing rotate by a literal amount before finding the one that is expected. It should have been removed.

The ipm/sll/sra sequence gives -2/0/1 rather than -1/0/1.

Good point, I missed that. Still, the fact remains that this is what GCC has been generating since at least 2001, and we haven't ever seen any report of application bugs due to that ... Given that, I think it still would be preferable to generate that sequence instead of the one LLVM currently generates (which apparently does cause application problems).

Just for my curiosity, do you know any further details about what is causing the misbehaviour with postgres or php? It would be interesting to understand why the particular value used by LLVM causes problems, but other values don't (as I understand, some implementations of strcmp in libc also have historically returns values other than -1/0/1).

Since the ipm sequence was originally done even earlier, I hesitated delaying it all the way to post-RA. But if post-RA is preferred, I am happy to do the expansion there.

In general, it is preferable to expand things earlier rather than sooner. However, "early" usually means expanding during the DAG isel phase. On the other hand, if you have to expand late, you do it after RA. The primary reason for having another expansion step at the MI level (after DAG isel), but before RA, is if you need to create new basic blocks during expansion, which you cannot do at the DAG level.

So I guess my question is rather, if you can / have to expand to the IPM sequence early, why don't you simply leave that expansion at the DAG level (as it is today, just gated by a non-z13 check)? Otherwise, if there's no difference codegen-wise, it would be in general be preferable to have the expansion for z13 and pre-z13 in closer proximity, just for easier reading of the code. But in the end, that's just a preference, not a hard requirement ...

I took a look at the php problem. It was failing test /ext/standard/tests/strings/substr_compare.phpt, due to a -2 from a substr_compare operation where -1 was expected. The implementation in Zend/zend_operators.c is a call to the library memcmp function, so it is not a compiler issue.

I took a look at the php problem. It was failing test /ext/standard/tests/strings/substr_compare.phpt, due to a -2 from a substr_compare operation where -1 was expected. The implementation in Zend/zend_operators.c is a call to the library memcmp function, so it is not a compiler issue.

Interesting. While I'm not a PHP expert, the documentation for substr_compare I can find states:

Return Values

Returns < 0 if main_str from position offset is less than str, > 0 if it is greater than str, and 0 if they are equal. If offset is equal to or greater than the length of main_str, or the length is set and is less than 1 (prior to PHP 5.6), substr_compare() prints a warning and returns FALSE.

Given that, an implementation that calls out to a standard C memcmp and passes through the return value seems to be correct, and the test case would appear to be simply buggy. As long as this shows up only in a (possibly incorrect) test case and not in real-world PHP application, it doesn't seem like something we need to fix in the compiler.

RolandF updated this revision to Diff 69149.Aug 24 2016, 12:16 PM
RolandF edited edge metadata.

Unlike the php test failure, which is dependent on the library memcmp behaviour and fails for both clang and gcc, the postgres test failure only happens with clang. The uuid regression test fails for clang, and the failure goes away if src/backend/utils/adt/uuid.c is compiled with gcc. The issue is the result for the uuid_internal_cmp function, which is just a 16 byte memcmp. The address of the function is stored in a table of builtins and only called by address, and the complexity of the application and test environment make it difficult to trace back to where this function is called, which may be many places. It might be desirable to just be compatible with gcc. This diff updates the approach to use the gcc-type IPM/SLL/SRA sequence. The sequence is first translated into a SELECT_CMP operation. This makes it easier to perform the memcmp compare to zero optimization (SRA kills the CC). It also should make it easier to add support for LOCHI, since the compare to zero code can be shared, and to get the promotion to 64-bit case with shared code.

Unlike the php test failure, which is dependent on the library memcmp behaviour and fails for both clang and gcc, the postgres test failure only happens with clang. The uuid regression test fails for clang, and the failure goes away if src/backend/utils/adt/uuid.c is compiled with gcc. The issue is the result for the uuid_internal_cmp function, which is just a 16 byte memcmp. The address of the function is stored in a table of builtins and only called by address, and the complexity of the application and test environment make it difficult to trace back to where this function is called, which may be many places.

In src/include/utils/sortsupport.h, I see this code in ApplySortComparator:

compare = (*ssup->comparator) (datum1, datum2, ssup);
if (ssup->ssup_reverse)
        compare = -compare;

If the comparator routine returns INT_MIN, the result of the unary minus is undefined at this point. This might well explain the problem you're seeing.

It might be desirable to just be compatible with gcc. This diff updates the approach to use the gcc-type IPM/SLL/SRA sequence. The sequence is first translated into a SELECT_CMP operation. This makes it easier to perform the memcmp compare to zero optimization (SRA kills the CC). It also should make it easier to add support for LOCHI, since the compare to zero code can be shared, and to get the promotion to 64-bit case with shared code.

Makes sense to me. The patch looks mostly good, see inline comments for more details.

lib/Target/SystemZ/SystemZInstrFormats.td
2762

Don't we have add a Def for CC as well? This will expand to a sequence including a SRA, so it will clobber CC ...

lib/Target/SystemZ/SystemZInstrInfo.cpp
225

Would be nice to have a 32-bit and a 64-bit variant, where a LGFR of a SELECT_CMP32 would combine into a SELECT_CMP64 that is then implemented via SLLG / SRAG.

240

I think this need an implicit def of CC so that it matches what the instruction does. Did you try running the test cases with -verify-machineinstrs? That should have detected such mismatches ...

498

If we had the SELECT_CMP64, we wouldn't have to handle LGFR here.

501

I think if SELCMP is NULL, we also need to return false here, otherwise we'll crash below.

576

So given that we no longer generate the IPM / SRL / RLL sequence, and removeIPMBasedCompare checks for that very sequence, do we even need that routine any more?

uweigand commandeered this revision.Feb 6 2019, 7:13 AM
uweigand edited reviewers, added: RolandF; removed: uweigand.

Now fixed in a slightly different manner in as r353304.

uweigand abandoned this revision.Feb 6 2019, 7:13 AM