This is an archive of the discontinued LLVM Phabricator instance.

Limit bswap and bitreverse matching to legal integer width
Needs ReviewPublic

Authored by vmustya on Aug 22 2023, 1:57 PM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

The InstCombine pass matched a sequence of and-shl-add instructions into
bitreverse even when the demanded width is illegal for a target machine.
That produces the IR sequences like the following one:

%trunc = trunc i32 %x to i9
%rev = call i9 @llvm.bitreverse.i9(i9 %trunc)
%mask = and i9 %rev, -128
%reverse = zext i9 %mask to i32

The illegal bitreverse intrinsics are being emulated, so this combining
produces inefficient code.

Diff Detail

Event Timeline

vmustya created this revision.Aug 22 2023, 1:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2023, 1:57 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
vmustya requested review of this revision.Aug 22 2023, 1:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2023, 1:57 PM
craig.topper added inline comments.
llvm/test/Transforms/InstCombine/bitreverse.ll
265

The CHECK lines in this file are generated by a script, update_llc_test_checks.py. We shouldn't use CHECK-NOT in such tests as the script will just overwrite it the next time someone runs.

vmustya updated this revision to Diff 552756.Aug 23 2023, 9:30 AM

I've updated LIT tests using the utils/update_test_checks.py tool.

nikic added a subscriber: nikic.Aug 23 2023, 11:56 AM

Can the expansion produce better code than it does now, or does some important information get lost that prevents this?

Is the issue related to weird types like i9, or would this also be a problem for (say) i8?

Can the expansion produce better code than it does now, or does some important information get lost that prevents this?

Is the issue related to weird types like i9, or would this also be a problem for (say) i8?

I've only seen the issue with the weird types like i9. I've added the minimal reproducer as a lit test case:

define i32 @illegal_width(i32 %x) {
  %b0 = and i32 %x, 1
  %b1 = and i32 %x, 2
  %shift1 = mul nuw nsw i32 %b1, 64
  %shift0 = shl nuw nsw i32 %b0, 8
  %reverse = add i32 %shift0, %shift1
  ret i32 %reverse
}

For x86_64 LLVM generates the following assembly:

	rol	di, 8
	mov	eax, edi
	shl	eax, 6
	and	eax, 16384
	shl	edi, 5
	and	edi, 16384
	lea	eax, [rdi + 2*rax]
	shr	eax, 7
	ret

With the patch applied, the assembly looks much better:

	mov	eax, edi
	and	eax, 2
	shl	eax, 6
	and	edi, 1
	shl	edi, 8
	or	eax, edi
	ret
craig.topper added a comment.EditedAug 23 2023, 1:12 PM

Can the expansion produce better code than it does now, or does some important information get lost that prevents this?

Is the issue related to weird types like i9, or would this also be a problem for (say) i8?

I've only seen the issue with the weird types like i9. I've added the minimal reproducer as a lit test case:

define i32 @illegal_width(i32 %x) {
  %b0 = and i32 %x, 1
  %b1 = and i32 %x, 2
  %shift1 = mul nuw nsw i32 %b1, 64
  %shift0 = shl nuw nsw i32 %b0, 8
  %reverse = add i32 %shift0, %shift1
  ret i32 %reverse
}

For x86_64 LLVM generates the following assembly:

	rol	di, 8
	mov	eax, edi
	shl	eax, 6
	and	eax, 16384
	shl	edi, 5
	and	edi, 16384
	lea	eax, [rdi + 2*rax]
	shr	eax, 7
	ret

With the patch applied, the assembly looks much better:

	mov	eax, edi
	and	eax, 2
	shl	eax, 6
	and	edi, 1
	shl	edi, 8
	or	eax, edi
	ret

The backend is expanding to i16 reverse sequence followed by a shift. Only 2 bits of that are demanded, but I suspect multiple uses prevents SimplifyDemandedBit from optimizing it.

There only 2 bits involved in the reverse here. Perhaps we shouldn't form bit reverse from something so sparse?

vmustya marked an inline comment as done.Sep 1 2023, 1:29 PM