This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix for branch offset hardware workaround
ClosedPublic

Authored by rtaylor on Jun 18 2019, 7:19 AM.

Download Raw Diff

Details

Reviewers

dstuttard
tpr
nhaehnle
rampitec
arsenm

Commits

rG9ab812d4752b: [AMDGPU] Fix for branch offset hardware workaround
rL364451: [AMDGPU] Fix for branch offset hardware workaround

Summary

This fixes a hardware bug that makes a branch offset of 0x3f unsafe.
This replaces the 32 bit branch with offset 0x3f to a 64 bit
instruction that includes the same 32 bit branch and the encoding
for a s_nop 0 to follow. The relaxer than modifies the offsets
accordingly.

Change-Id: I10b7aed99d651f8159401b01bb421f105fa6288e

Diff Detail

Repository

rL LLVM

Build Status

Buildable 33556
Build 33555: arc lint + arc unit

Event Timeline

rtaylor created this revision.Jun 18 2019, 7:19 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2019, 7:19 AM

Herald added subscribers: llvm-commits, t-tye, tpr and 7 others. · View Herald Transcript

Harbormaster completed remote builds in B33556: Diff 205334.Jun 18 2019, 7:19 AM

rtaylor added reviewers: dstuttard, tpr, nhaehnle, rampitec, arsenm.Jun 18 2019, 7:23 AM

arsenm added inline comments.Jun 18 2019, 7:26 AM

lib/Target/AMDGPU/MCTargetDesc/AMDGPUAsmBackend.cpp
56–57	I would much rather avoid the proliferation of junk opcodes for this. Can you use a bundle or another way to add an independent nop instruction?
test/MC/AMDGPU/offsetbug.s
18–20	Can you use trivial instructions for sizes? I don't trust the size of most instructions to stay the same
82–83	Formatting

arsenm added inline comments.Jun 18 2019, 7:28 AM

lib/Target/AMDGPU/SOPInstructions.td
1011–1018	If the opcodes end up unavoidable, the class should be fixed to generate the branch and the dummy at the same time, rather than requiring repeating each definition with the opcode value

rtaylor marked 4 inline comments as done.Jun 18 2019, 8:16 AM

rtaylor added inline comments.

lib/Target/AMDGPU/MCTargetDesc/AMDGPUAsmBackend.cpp
56–57	I'm not sure that's possible, relaxInstruction is return the MCInst Res but I can't say for sure as I've never worked with bundles and I'm not sure how the compiler treats them. I agree though that this would be more ideal than having new 64 bit instructions.
lib/Target/AMDGPU/SOPInstructions.td
1011–1018	I can have a look at doing this.
test/MC/AMDGPU/offsetbug.s
18–20	Yes, I can. I can make them nops.
82–83	Ah, missed this, thanks.

rtaylor marked an inline comment as done.Jun 18 2019, 8:40 AM

rtaylor added inline comments.

lib/Target/AMDGPU/MCTargetDesc/AMDGPUAsmBackend.cpp
56–57	This would require a bundle to have a specific opcode, which from what I can see they do not.

arsenm added inline comments.Jun 18 2019, 8:47 AM

lib/Target/AMDGPU/MCTargetDesc/AMDGPUAsmBackend.cpp
56–57	I'm not sure what you mean. There's one BUNDLE opcode, and then instructions following I are marked as in the bundle

Can we have a disasm test? I want to see that we do not mess with decoding.

In D63494#1548719, @rampitec wrote:

Can we have a disasm test? I want to see that we do not mess with decoding.

Something other than the -disassemble checks?

In D63494#1548974, @rtaylor wrote:

In D63494#1548719, @rampitec wrote:

Can we have a disasm test? I want to see that we do not mess with decoding.

Something other than the -disassemble checks?

Ah, missed it. Thanks, this is sufficient.

In D63494#1548986, @rampitec wrote:

In D63494#1548974, @rtaylor wrote:

In D63494#1548719, @rampitec wrote:

Can we have a disasm test? I want to see that we do not mess with decoding.

Something other than the -disassemble checks?

Ah, missed it. Thanks, this is sufficient.

Np. Ok, thanks.

arsenm added inline comments.Jun 18 2019, 4:31 PM

lib/Target/AMDGPU/MCTargetDesc/AMDGPUAsmBackend.cpp
64	These names aren't good. I would prefer a sufficient like _pad_s_nop or something. 64 could mean many different things

rampitec added inline comments.Jun 18 2019, 4:39 PM

lib/Target/AMDGPU/MCTargetDesc/AMDGPUAsmBackend.cpp
113	I'd suggest using a multiclass to define and InstrMapping instead of switches.

Changes per suggestions

Harbormaster completed remote builds in B33637: Diff 205638.Jun 19 2019, 10:29 AM

rtaylor marked an inline comment as done.Jun 19 2019, 11:16 AM

rtaylor added inline comments.

lib/Target/AMDGPU/SOPInstructions.td
1011–1018	I don't think _Real is the best naming since there isn't a corresponding Pseudo but I had no clue what to call this multiclass so if anyone has a suggestion, I'm all ears.

rampitec added inline comments.Jun 19 2019, 11:21 AM

lib/Target/AMDGPU/SOPInstructions.td
960	SOPP_With_Relaxation maybe?
962	Can yo use InstrMapping mapping here to avoid switches?

arsenm added inline comments.Jun 19 2019, 11:49 AM

lib/Target/AMDGPU/MCTargetDesc/AMDGPUAsmBackend.cpp
61	I would lowercase the suffix. The general naming convention its HARDWARE_NAME_compiler_defined_variant

arsenm added inline comments.Jun 19 2019, 11:51 AM

test/MC/AMDGPU/offsetbug.s
2–8	Can you split the test into separate files? I think there should be one function that needs 1 relaxation, another where the relaxation of one triggers the relaxation of the other, and another that triggers this twice

rtaylor marked an inline comment as done.Jun 20 2019, 5:48 AM

rtaylor added inline comments.

lib/Target/AMDGPU/SOPInstructions.td
962	Do you feel that this is worth it in this case? Seems overly complicated for a very specific use case but if it's the standard to go in this direction than ok.

rampitec added inline comments.Jun 20 2019, 7:29 AM

lib/Target/AMDGPU/SOPInstructions.td
962	You just get rid of one switch, so it is not that big deal now.
1011–1018	_With_Relaxation?

rtaylor marked an inline comment as done.Jun 20 2019, 11:17 AM

rtaylor added inline comments.

lib/Target/AMDGPU/SOPInstructions.td
962	That's ok, I've pretty much done it.

Suggested changes: Added InstrMapping. Three test files. Lower-cased suffixes for MI defs. Changed multiclass suffix.

Harbormaster completed remote builds in B33702: Diff 205887.Jun 20 2019, 1:10 PM

rampitec added inline comments.Jun 20 2019, 1:35 PM

lib/Target/AMDGPU/SIInstrInfo.td
2316 ↗	(On Diff #205887)	I must be missing something, but how does it work? You are passing the same asm string into the both versions. What do you map to what then?
test/MC/AMDGPU/offsetbug_once.s
9 ↗	(On Diff #205887)	Indent.
test/MC/AMDGPU/offsetbug_one_and_one.s
78 ↗	(On Diff #205887)	Indent.
test/MC/AMDGPU/offsetbug_twice.s
112 ↗	(On Diff #205887)	And indent ;)

Fixed idents.

Harbormaster completed remote builds in B33710: Diff 205911.Jun 20 2019, 3:03 PM

rtaylor added inline comments.Jun 20 2019, 3:03 PM

lib/Target/AMDGPU/SIInstrInfo.td
2316 ↗	(On Diff #205887)	The asm string is the same for both, the s_nop is added in at encoding. I'm using the Size field, 4 maps to 8. This is quite similar to how VOP is doing in during getVOPe32 and get VOPe64, the difference is that they also use a VOP3 flag but it's not needed here. Here, if you notice I am just adding the s_nop in class SOPP64e.

LGTM, except one indent.

lib/Target/AMDGPU/SIInstrInfo.td
2316 ↗	(On Diff #205887)	Got it, thanks!
test/MC/AMDGPU/offsetbug_once.s
9 ↗	(On Diff #205911)	Another indent.

arsenm requested changes to this revision.Jun 20 2019, 3:10 PM

arsenm added inline comments.

lib/Target/AMDGPU/MCTargetDesc/AMDGPUAsmBackend.cpp
10	This is a layering violation
72–75	return test();

This revision now requires changes to proceed.Jun 20 2019, 3:10 PM

arsenm added inline comments.Jun 20 2019, 3:15 PM

lib/Target/AMDGPU/SOPInstructions.td
927–929	I don't understand how this gets you a nop?
932	SOPP_w_nop or something? The current naming sounds like an actually different encoding (same with SOPPe64)

rampitec added inline comments.Jun 20 2019, 3:20 PM

lib/Target/AMDGPU/SOPInstructions.td
927–929	I guess s_nop opcode is 0, simm is 0 and SOPP encoding is 0x17f. Maybe it would be more readable to write something like: let Inst{63-55} = S_NOP.Inst{31-23}; // encoding let Inst{54-48} = S_NOP.Inst{22-16}; // opcode

rtaylor marked an inline comment as done.Jun 20 2019, 4:50 PM

rtaylor added inline comments.

lib/Target/AMDGPU/MCTargetDesc/AMDGPUAsmBackend.cpp
10	Where would you like the InstrMapping to go? The others are here.

rtaylor marked 5 inline comments as done.Jun 20 2019, 5:16 PM

rtaylor added inline comments.

lib/Target/AMDGPU/MCTargetDesc/AMDGPUAsmBackend.cpp
10	I suppose I could put it into Utils/AMDGPUBaseInfo.h which is already included in AMDGPUAsmBackend.cpp.... though that seems at first glance to also be a layering violation. Maybe it would be better if this was a mapping table, it would fit into AMDGPUBaseInfo.h but I think InstrMapping is more fitting.
72–75	Not sure what you mean here though the else is useless, so I'll remove that and just return false.
lib/Target/AMDGPU/SOPInstructions.td
927–929	Yes, the opcode for s_nop is 0. The value I'm using is 0 and the encoding I had is the same as everywhere else but I can see how this makes it more understandable... maybe a simple comment here would help? I fixed this but it just makes tablegen less readable honestly, since now the S_NOP def has to go after the SOPP class but before the SOPP_w_nop encoding, so it's all alone among classes. Am I missing something? Prototyping would work but again, make everything less readable.
932	Sure, that's fine.
test/MC/AMDGPU/offsetbug_once.s
9 ↗	(On Diff #205911)	Thanks.

Fixed suggestions

Harbormaster completed remote builds in B33753: Diff 206117.Jun 22 2019, 6:31 AM

LGTM

This revision is now accepted and ready to land.Jun 26 2019, 9:53 AM

Closed by commit rL364451: [AMDGPU] Fix for branch offset hardware workaround (authored by rtayl). · Explain WhyJun 26 2019, 10:37 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPU.td

10 lines

AMDGPUSubtarget.h

5 lines

AMDGPUSubtarget.cpp

1 line

MCTargetDesc/

AMDGPUAsmBackend.cpp

89 lines

SOPInstructions.td

71 lines

test/

MC/

AMDGPU/

offsetbug.s

122 lines

Diff 205334

lib/Target/AMDGPU/AMDGPU.td

Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines
>;		>;

def FeatureFlatSegmentOffsetBug : SubtargetFeature<"flat-segment-offset-bug",		def FeatureFlatSegmentOffsetBug : SubtargetFeature<"flat-segment-offset-bug",
"HasFlatSegmentOffsetBug",		"HasFlatSegmentOffsetBug",
"true",		"true",
"GFX10 bug, inst_offset ignored in flat segment"		"GFX10 bug, inst_offset ignored in flat segment"
>;		>;

		def FeatureOffset3fBug : SubtargetFeature<"offset-3f-bug",
		"HasOffset3fBug",
		"true",
		"Branch offset of 3f hardware bug"
		>;

class SubtargetFeatureLDSBankCount <int Value> : SubtargetFeature <		class SubtargetFeatureLDSBankCount <int Value> : SubtargetFeature <
"ldsbankcount"#Value,		"ldsbankcount"#Value,
"LDSBankCount",		"LDSBankCount",
!cast<string>(Value),		!cast<string>(Value),
"The number of LDS banks per compute unit."		"The number of LDS banks per compute unit."
>;		>;

def FeatureLDSBankCount16 : SubtargetFeatureLDSBankCount<16>;		def FeatureLDSBankCount16 : SubtargetFeatureLDSBankCount<16>;
▲ Show 20 Lines • Show All 558 Lines • ▼ Show 20 Lines	def FeatureGroup {
list<SubtargetFeature> GFX10_1_Bugs = [		list<SubtargetFeature> GFX10_1_Bugs = [
FeatureVcmpxPermlaneHazard,		FeatureVcmpxPermlaneHazard,
FeatureVMEMtoScalarWriteHazard,		FeatureVMEMtoScalarWriteHazard,
FeatureSMEMtoVectorWriteHazard,		FeatureSMEMtoVectorWriteHazard,
FeatureInstFwdPrefetchBug,		FeatureInstFwdPrefetchBug,
FeatureVcmpxExecWARHazard,		FeatureVcmpxExecWARHazard,
FeatureLdsBranchVmemWARHazard,		FeatureLdsBranchVmemWARHazard,
FeatureNSAtoVMEMBug,		FeatureNSAtoVMEMBug,
		FeatureOffset3fBug,
FeatureFlatSegmentOffsetBug		FeatureFlatSegmentOffsetBug
];		];
}		}

def FeatureISAVersion10_1_0 : FeatureSet<		def FeatureISAVersion10_1_0 : FeatureSet<
!listconcat(FeatureGroup.GFX10_1_Bugs,		!listconcat(FeatureGroup.GFX10_1_Bugs,
[FeatureGFX10,		[FeatureGFX10,
FeatureLDSBankCount32,		FeatureLDSBankCount32,
▲ Show 20 Lines • Show All 285 Lines • ▼ Show 20 Lines	def HasDot2Insts : Predicate<"Subtarget->hasDot2Insts()">,
AssemblerPredicate<"FeatureDot2Insts">;		AssemblerPredicate<"FeatureDot2Insts">;

def HasDot5Insts : Predicate<"Subtarget->hasDot5Insts()">,		def HasDot5Insts : Predicate<"Subtarget->hasDot5Insts()">,
AssemblerPredicate<"FeatureDot5Insts">;		AssemblerPredicate<"FeatureDot5Insts">;

def HasDot6Insts : Predicate<"Subtarget->hasDot6Insts()">,		def HasDot6Insts : Predicate<"Subtarget->hasDot6Insts()">,
AssemblerPredicate<"FeatureDot6Insts">;		AssemblerPredicate<"FeatureDot6Insts">;

		def HasOffset3fBug : Predicate<"!Subtarget->hasOffset3fBug()">,
		AssemblerPredicate<"FeatureOffset3fBug">;

def EnableLateCFGStructurize : Predicate<		def EnableLateCFGStructurize : Predicate<
"EnableLateStructurizeCFG">;		"EnableLateStructurizeCFG">;

// Include AMDGPU TD files		// Include AMDGPU TD files
include "SISchedule.td"		include "SISchedule.td"
include "GCNProcessors.td"		include "GCNProcessors.td"
include "AMDGPUInstrInfo.td"		include "AMDGPUInstrInfo.td"
include "AMDGPURegisterInfo.td"		include "AMDGPURegisterInfo.td"
include "AMDGPURegisterBanks.td"		include "AMDGPURegisterBanks.td"
include "AMDGPUInstructions.td"		include "AMDGPUInstructions.td"
include "SIInstrInfo.td"		include "SIInstrInfo.td"
include "AMDGPUCallingConv.td"		include "AMDGPUCallingConv.td"
include "AMDGPUSearchableTables.td"		include "AMDGPUSearchableTables.td"

lib/Target/AMDGPU/AMDGPUSubtarget.h

Show First 20 Lines • Show All 362 Lines • ▼ Show 20 Lines	protected:

bool HasVcmpxPermlaneHazard;		bool HasVcmpxPermlaneHazard;
bool HasVMEMtoScalarWriteHazard;		bool HasVMEMtoScalarWriteHazard;
bool HasSMEMtoVectorWriteHazard;		bool HasSMEMtoVectorWriteHazard;
bool HasInstFwdPrefetchBug;		bool HasInstFwdPrefetchBug;
bool HasVcmpxExecWARHazard;		bool HasVcmpxExecWARHazard;
bool HasLdsBranchVmemWARHazard;		bool HasLdsBranchVmemWARHazard;
bool HasNSAtoVMEMBug;		bool HasNSAtoVMEMBug;
		bool HasOffset3fBug;
bool HasFlatSegmentOffsetBug;		bool HasFlatSegmentOffsetBug;

// Dummy feature to use for assembler in tablegen.		// Dummy feature to use for assembler in tablegen.
bool FeatureDisable;		bool FeatureDisable;

SelectionDAGTargetInfo TSInfo;		SelectionDAGTargetInfo TSInfo;
private:		private:
SIInstrInfo InstrInfo;		SIInstrInfo InstrInfo;
▲ Show 20 Lines • Show All 471 Lines • ▼ Show 20 Lines	public:
bool hasDPP8() const {		bool hasDPP8() const {
return HasDPP8;		return HasDPP8;
}		}

bool hasR128A16() const {		bool hasR128A16() const {
return HasR128A16;		return HasR128A16;
}		}

		bool hasOffset3fBug() const {
		return HasOffset3fBug;
		}

bool hasNSAEncoding() const {		bool hasNSAEncoding() const {
return HasNSAEncoding;		return HasNSAEncoding;
}		}

bool hasMadF16() const;		bool hasMadF16() const;

bool enableSIScheduler() const {		bool enableSIScheduler() const {
return EnableSIScheduler;		return EnableSIScheduler;
▲ Show 20 Lines • Show All 352 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUSubtarget.cpp

Show First 20 Lines • Show All 255 Lines • ▼ Show 20 Lines	GCNSubtarget::GCNSubtarget(const Triple &TT, StringRef GPU, StringRef FS,

HasVcmpxPermlaneHazard(false),		HasVcmpxPermlaneHazard(false),
HasVMEMtoScalarWriteHazard(false),		HasVMEMtoScalarWriteHazard(false),
HasSMEMtoVectorWriteHazard(false),		HasSMEMtoVectorWriteHazard(false),
HasInstFwdPrefetchBug(false),		HasInstFwdPrefetchBug(false),
HasVcmpxExecWARHazard(false),		HasVcmpxExecWARHazard(false),
HasLdsBranchVmemWARHazard(false),		HasLdsBranchVmemWARHazard(false),
HasNSAtoVMEMBug(false),		HasNSAtoVMEMBug(false),
		HasOffset3fBug(false),
HasFlatSegmentOffsetBug(false),		HasFlatSegmentOffsetBug(false),

FeatureDisable(false),		FeatureDisable(false),
InstrInfo(initializeSubtargetDependencies(TT, GPU, FS)),		InstrInfo(initializeSubtargetDependencies(TT, GPU, FS)),
TLInfo(TM, *this),		TLInfo(TM, *this),
FrameLowering(TargetFrameLowering::StackGrowsUp, getStackAlignment(), 0) {		FrameLowering(TargetFrameLowering::StackGrowsUp, getStackAlignment(), 0) {
CallLoweringInfo.reset(new AMDGPUCallLowering(*getTargetLowering()));		CallLoweringInfo.reset(new AMDGPUCallLowering(*getTargetLowering()));
Legalizer.reset(new AMDGPULegalizerInfo(*this, TM));		Legalizer.reset(new AMDGPULegalizerInfo(*this, TM));
▲ Show 20 Lines • Show All 496 Lines • Show Last 20 Lines

lib/Target/AMDGPU/MCTargetDesc/AMDGPUAsmBackend.cpp

//===-- AMDGPUAsmBackend.cpp - AMDGPU Assembler Backend -------------------===//		//===-- AMDGPUAsmBackend.cpp - AMDGPU Assembler Backend -------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
/// \file		/// \file
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "MCTargetDesc/AMDGPUFixupKinds.h"		#include "MCTargetDesc/AMDGPUFixupKinds.h"
		arsenmUnsubmitted Not Done Reply Inline Actions This is a layering violation arsenm: This is a layering violation
		rtaylorAuthorUnsubmitted Done Reply Inline Actions Where would you like the InstrMapping to go? The others are here. rtaylor: Where would you like the InstrMapping to go? The others are here.
		rtaylorAuthorUnsubmitted Done Reply Inline Actions I suppose I could put it into Utils/AMDGPUBaseInfo.h which is already included in AMDGPUAsmBackend.cpp.... though that seems at first glance to also be a layering violation. Maybe it would be better if this was a mapping table, it would fit into AMDGPUBaseInfo.h but I think InstrMapping is more fitting. rtaylor: I suppose I could put it into Utils/AMDGPUBaseInfo.h which is already included in…
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/BinaryFormat/ELF.h"		#include "llvm/BinaryFormat/ELF.h"
#include "llvm/MC/MCAsmBackend.h"		#include "llvm/MC/MCAsmBackend.h"
#include "llvm/MC/MCAssembler.h"		#include "llvm/MC/MCAssembler.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCFixupKindInfo.h"		#include "llvm/MC/MCFixupKindInfo.h"
#include "llvm/MC/MCObjectWriter.h"		#include "llvm/MC/MCObjectWriter.h"
Show All 13 Lines	public:
unsigned getNumFixupKinds() const override { return AMDGPU::NumTargetFixupKinds; };		unsigned getNumFixupKinds() const override { return AMDGPU::NumTargetFixupKinds; };

void applyFixup(const MCAssembler &Asm, const MCFixup &Fixup,		void applyFixup(const MCAssembler &Asm, const MCFixup &Fixup,
const MCValue &Target, MutableArrayRef<char> Data,		const MCValue &Target, MutableArrayRef<char> Data,
uint64_t Value, bool IsResolved,		uint64_t Value, bool IsResolved,
const MCSubtargetInfo *STI) const override;		const MCSubtargetInfo *STI) const override;
bool fixupNeedsRelaxation(const MCFixup &Fixup, uint64_t Value,		bool fixupNeedsRelaxation(const MCFixup &Fixup, uint64_t Value,
const MCRelaxableFragment *DF,		const MCRelaxableFragment *DF,
const MCAsmLayout &Layout) const override {		const MCAsmLayout &Layout) const override;
return false;
}
void relaxInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,		void relaxInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
MCInst &Res) const override {		MCInst &Res) const override;
llvm_unreachable("Not implemented");
}
bool mayNeedRelaxation(const MCInst &Inst,		bool mayNeedRelaxation(const MCInst &Inst,
const MCSubtargetInfo &STI) const override {		const MCSubtargetInfo &STI) const override;
return false;
}

unsigned getMinimumNopSize() const override;		unsigned getMinimumNopSize() const override;
bool writeNopData(raw_ostream &OS, uint64_t Count) const override;		bool writeNopData(raw_ostream &OS, uint64_t Count) const override;

const MCFixupKindInfo &getFixupKindInfo(MCFixupKind Kind) const override;		const MCFixupKindInfo &getFixupKindInfo(MCFixupKind Kind) const override;
};		};

} //End anonymous namespace		} //End anonymous namespace

		static unsigned getRelaxedOpcode(const MCInst &Inst) {
		unsigned Op = Inst.getOpcode();
		arsenmUnsubmitted Not Done Reply Inline Actions I would much rather avoid the proliferation of junk opcodes for this. Can you use a bundle or another way to add an independent nop instruction? arsenm: I would much rather avoid the proliferation of junk opcodes for this. Can you use a bundle or…
		rtaylorAuthorUnsubmitted Done Reply Inline Actions I'm not sure that's possible, relaxInstruction is return the MCInst Res but I can't say for sure as I've never worked with bundles and I'm not sure how the compiler treats them. I agree though that this would be more ideal than having new 64 bit instructions. rtaylor: I'm not sure that's possible, relaxInstruction is return the MCInst Res but I can't say for…
		rtaylorAuthorUnsubmitted Done Reply Inline Actions This would require a bundle to have a specific opcode, which from what I can see they do not. rtaylor: This would require a bundle to have a specific opcode, which from what I can see they do not.
		arsenmUnsubmitted Not Done Reply Inline Actions I'm not sure what you mean. There's one BUNDLE opcode, and then instructions following I are marked as in the bundle arsenm: I'm not sure what you mean. There's one BUNDLE opcode, and then instructions following I are…
		switch (Op) {
		default:
		return Op;
		case AMDGPU::S_BRANCH:
		arsenmUnsubmitted Not Done Reply Inline Actions I would lowercase the suffix. The general naming convention its HARDWARE_NAME_compiler_defined_variant arsenm: I would lowercase the suffix. The general naming convention its…
		return AMDGPU::S_BRANCH_64;
		case AMDGPU::S_CBRANCH_SCC0:
		return AMDGPU::S_CBRANCH_SCC0_64;
		arsenmUnsubmitted Not Done Reply Inline Actions These names aren't good. I would prefer a sufficient like _pad_s_nop or something. 64 could mean many different things arsenm: These names aren't good. I would prefer a sufficient like _pad_s_nop or something. 64 could…
		case AMDGPU::S_CBRANCH_SCC1:
		return AMDGPU::S_CBRANCH_SCC1_64;
		case AMDGPU::S_CBRANCH_VCCZ:
		return AMDGPU::S_CBRANCH_VCCZ_64;
		case AMDGPU::S_CBRANCH_VCCNZ:
		return AMDGPU::S_CBRANCH_VCCNZ_64;
		case AMDGPU::S_CBRANCH_EXECZ:
		return AMDGPU::S_CBRANCH_EXECZ_64;
		case AMDGPU::S_CBRANCH_EXECNZ:
		return AMDGPU::S_CBRANCH_EXECNZ_64;
		case AMDGPU::S_CBRANCH_CDBGSYS:
		arsenmUnsubmitted Not Done Reply Inline Actions return test(); arsenm: return test();
		rtaylorAuthorUnsubmitted Done Reply Inline Actions Not sure what you mean here though the else is useless, so I'll remove that and just return false. rtaylor: Not sure what you mean here though the else is useless, so I'll remove that and just return…
		return AMDGPU::S_CBRANCH_CDBGSYS_64;
		case AMDGPU::S_CBRANCH_CDBGSYS_AND_USER:
		return AMDGPU::S_CBRANCH_CDBGSYS_AND_USER_64;
		case AMDGPU::S_CBRANCH_CDBGSYS_OR_USER:
		return AMDGPU::S_CBRANCH_CDBGSYS_OR_USER_64;
		case AMDGPU::S_CBRANCH_CDBGUSER:
		return AMDGPU::S_CBRANCH_CDBGUSER_64;
		} // end of switch
		}

		void AMDGPUAsmBackend::relaxInstruction(const MCInst &Inst,
		const MCSubtargetInfo &STI,
		MCInst &Res) const {
		unsigned RelaxedOpcode = getRelaxedOpcode(Inst);
		Res.setOpcode(RelaxedOpcode);
		Res.addOperand(Inst.getOperand(0));
		return;
		}

		bool AMDGPUAsmBackend::fixupNeedsRelaxation(const MCFixup &Fixup,
		uint64_t Value,
		const MCRelaxableFragment *DF,
		const MCAsmLayout &Layout) const {
		// if the branch target has an offset of x3f this needs to be relaxed to
		// add a s_nop 0 immediately after branch to effectively increment offset
		// for hardware workaround in gfx1010
		if (((int64_t(Value)/4)-1) == 0x3f)
		return true;
		else
		return false;
		}

		bool AMDGPUAsmBackend::mayNeedRelaxation(const MCInst &Inst,
		const MCSubtargetInfo &STI) const {
		if (!STI.getFeatureBits()[AMDGPU::FeatureOffset3fBug])
		return false;

		switch (Inst.getOpcode()) {
		rampitecUnsubmitted Not Done Reply Inline Actions I'd suggest using a multiclass to define and InstrMapping instead of switches. rampitec: I'd suggest using a multiclass to define and InstrMapping instead of switches.
		case AMDGPU::S_BRANCH:
		case AMDGPU::S_CBRANCH_SCC0:
		case AMDGPU::S_CBRANCH_SCC1:
		case AMDGPU::S_CBRANCH_VCCZ:
		case AMDGPU::S_CBRANCH_VCCNZ:
		case AMDGPU::S_CBRANCH_EXECZ:
		case AMDGPU::S_CBRANCH_EXECNZ:
		case AMDGPU::S_CBRANCH_CDBGSYS:
		case AMDGPU::S_CBRANCH_CDBGSYS_AND_USER:
		case AMDGPU::S_CBRANCH_CDBGSYS_OR_USER:
		case AMDGPU::S_CBRANCH_CDBGUSER:
		return true;
		} // end of switch

		return false;
		}

static unsigned getFixupKindNumBytes(unsigned Kind) {		static unsigned getFixupKindNumBytes(unsigned Kind) {
switch (Kind) {		switch (Kind) {
case AMDGPU::fixup_si_sopp_br:		case AMDGPU::fixup_si_sopp_br:
return 2;		return 2;
case FK_SecRel_1:		case FK_SecRel_1:
case FK_Data_1:		case FK_Data_1:
return 1;		return 1;
case FK_SecRel_2:		case FK_SecRel_2:
▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SOPInstructions.td

	Show First 20 Lines • Show All 912 Lines • ▼ Show 20 Lines
	class SOPPe <bits<7> op> : Enc32 {			class SOPPe <bits<7> op> : Enc32 {
	bits <16> simm16;			bits <16> simm16;

	let Inst{15-0} = simm16;			let Inst{15-0} = simm16;
	let Inst{22-16} = op;			let Inst{22-16} = op;
	let Inst{31-23} = 0x17f; // encoding			let Inst{31-23} = 0x17f; // encoding
	}			}

				class SOPPe64 <bits<7> op> : Enc64 {
				bits <16> simm16;

				let Inst{15-0} = simm16;
				let Inst{22-16} = op;
				let Inst{31-23} = 0x17f; // encoding
				let Inst{47-32} = 0x0000;
				let Inst{54-48} = 0x00;
				let Inst{63-55} = 0x17f; //encoding
				arsenmUnsubmitted Not Done Reply Inline Actions I don't understand how this gets you a nop? arsenm: I don't understand how this gets you a nop?
				rampitecUnsubmitted Not Done Reply Inline Actions I guess s_nop opcode is 0, simm is 0 and SOPP encoding is 0x17f. Maybe it would be more readable to write something like: let Inst{63-55} = S_NOP.Inst{31-23}; // encoding let Inst{54-48} = S_NOP.Inst{22-16}; // opcode rampitec: I guess s_nop opcode is 0, simm is 0 and SOPP encoding is 0x17f. Maybe it would be more…
				rtaylorAuthorUnsubmitted Done Reply Inline Actions Yes, the opcode for s_nop is 0. The value I'm using is 0 and the encoding I had is the same as everywhere else but I can see how this makes it more understandable... maybe a simple comment here would help? I fixed this but it just makes tablegen less readable honestly, since now the S_NOP def has to go after the SOPP class but before the SOPP_w_nop encoding, so it's all alone among classes. Am I missing something? Prototyping would work but again, make everything less readable. rtaylor: Yes, the opcode for s_nop is 0. The value I'm using is 0 and the encoding I had is the same as…
				}

				class SOPP64 <bits<7> op, dag ins, string asm, list<dag> pattern = []> :
				arsenmUnsubmitted Not Done Reply Inline Actions SOPP_w_nop or something? The current naming sounds like an actually different encoding (same with SOPPe64) arsenm: SOPP_w_nop or something? The current naming sounds like an actually different encoding (same…
				rtaylorAuthorUnsubmitted Done Reply Inline Actions Sure, that's fine. rtaylor: Sure, that's fine.
				InstSI <(outs), ins, asm, pattern >, SOPPe64 <op> {

				let mayLoad = 0;
				let mayStore = 0;
				let hasSideEffects = 0;
				let SALU = 1;
				let SOPP = 1;
				let Size = 8;
				let SchedRW = [WriteSALU];

				let UseNamedOperandTable = 1;
				}

	class SOPP <bits<7> op, dag ins, string asm, list<dag> pattern = []> :			class SOPP <bits<7> op, dag ins, string asm, list<dag> pattern = []> :
	InstSI <(outs), ins, asm, pattern >, SOPPe <op> {			InstSI <(outs), ins, asm, pattern >, SOPPe <op> {

	let mayLoad = 0;			let mayLoad = 0;
	let mayStore = 0;			let mayStore = 0;
	let hasSideEffects = 0;			let hasSideEffects = 0;
	let SALU = 1;			let SALU = 1;
	let SOPP = 1;			let SOPP = 1;
	let Size = 4;			let Size = 4;
	let SchedRW = [WriteSALU];			let SchedRW = [WriteSALU];

	let UseNamedOperandTable = 1;			let UseNamedOperandTable = 1;
	}			}


	def S_NOP : SOPP <0x00000000, (ins i16imm:$simm16), "s_nop $simm16">;			def S_NOP : SOPP <0x00000000, (ins i16imm:$simm16), "s_nop $simm16">;
				rampitecUnsubmitted Not Done Reply Inline Actions SOPP_With_Relaxation maybe? rampitec: SOPP_With_Relaxation maybe?

	let isTerminator = 1 in {			let isTerminator = 1 in {
				rampitecUnsubmitted Not Done Reply Inline Actions Can yo use InstrMapping mapping here to avoid switches? rampitec: Can yo use InstrMapping mapping here to avoid switches?
				rtaylorAuthorUnsubmitted Done Reply Inline Actions Do you feel that this is worth it in this case? Seems overly complicated for a very specific use case but if it's the standard to go in this direction than ok. rtaylor: Do you feel that this is worth it in this case? Seems overly complicated for a very specific…
				rampitecUnsubmitted Not Done Reply Inline Actions You just get rid of one switch, so it is not that big deal now. rampitec: You just get rid of one switch, so it is not that big deal now.
				rtaylorAuthorUnsubmitted Done Reply Inline Actions That's ok, I've pretty much done it. rtaylor: That's ok, I've pretty much done it.

	def S_ENDPGM : SOPP <0x00000001, (ins EndpgmImm:$simm16), "s_endpgm$simm16"> {			def S_ENDPGM : SOPP <0x00000001, (ins EndpgmImm:$simm16), "s_endpgm$simm16"> {
	let isBarrier = 1;			let isBarrier = 1;
	let isReturn = 1;			let isReturn = 1;
	}			}

	def S_ENDPGM_SAVED : SOPP <0x0000001B, (ins), "s_endpgm_saved"> {			def S_ENDPGM_SAVED : SOPP <0x0000001B, (ins), "s_endpgm_saved"> {
	let SubtargetPredicate = isGFX8Plus;			let SubtargetPredicate = isGFX8Plus;
	Show All 17 Lines
	} // End SubtargetPredicate = isGFX10Plus			} // End SubtargetPredicate = isGFX10Plus

	let isBranch = 1, SchedRW = [WriteBranch] in {			let isBranch = 1, SchedRW = [WriteBranch] in {
	def S_BRANCH : SOPP <			def S_BRANCH : SOPP <
	0x00000002, (ins sopp_brtarget:$simm16), "s_branch $simm16",			0x00000002, (ins sopp_brtarget:$simm16), "s_branch $simm16",
	[(br bb:$simm16)]> {			[(br bb:$simm16)]> {
	let isBarrier = 1;			let isBarrier = 1;
	}			}
				def S_BRANCH_64 :SOPP64 <
				0x0000002, (ins sopp_brtarget:$simm16), "s_branch $simm16",
				[(br bb:$simm16)]> {
				let isBarrier = 1;
				}

	let Uses = [SCC] in {			let Uses = [SCC] in {
	def S_CBRANCH_SCC0 : SOPP <			def S_CBRANCH_SCC0 : SOPP <
	0x00000004, (ins sopp_brtarget:$simm16),			0x00000004, (ins sopp_brtarget:$simm16),
	"s_cbranch_scc0 $simm16"			"s_cbranch_scc0 $simm16"
	>;			>;
	def S_CBRANCH_SCC1 : SOPP <			def S_CBRANCH_SCC1 : SOPP <
	0x00000005, (ins sopp_brtarget:$simm16),			0x00000005, (ins sopp_brtarget:$simm16),
	"s_cbranch_scc1 $simm16"			"s_cbranch_scc1 $simm16"
	>;			>;
				def S_CBRANCH_SCC0_64 : SOPP64 <
				0x00000004, (ins sopp_brtarget:$simm16),
				"s_cbranch_scc0 $simm16"
				>;
				def S_CBRANCH_SCC1_64 : SOPP64 <
				0x00000005, (ins sopp_brtarget:$simm16),
				"s_cbranch_scc1 $simm16"
				>;
				arsenmUnsubmitted Not Done Reply Inline Actions If the opcodes end up unavoidable, the class should be fixed to generate the branch and the dummy at the same time, rather than requiring repeating each definition with the opcode value arsenm: If the opcodes end up unavoidable, the class should be fixed to generate the branch and the…
				rtaylorAuthorUnsubmitted Done Reply Inline Actions I can have a look at doing this. rtaylor: I can have a look at doing this.
				rtaylorAuthorUnsubmitted Done Reply Inline Actions I don't think _Real is the best naming since there isn't a corresponding Pseudo but I had no clue what to call this multiclass so if anyone has a suggestion, I'm all ears. rtaylor: I don't think _Real is the best naming since there isn't a corresponding Pseudo but I had no…
				rampitecUnsubmitted Not Done Reply Inline Actions _With_Relaxation? rampitec: _With_Relaxation?
	} // End Uses = [SCC]			} // End Uses = [SCC]

	let Uses = [VCC] in {			let Uses = [VCC] in {
	def S_CBRANCH_VCCZ : SOPP <			def S_CBRANCH_VCCZ : SOPP <
	0x00000006, (ins sopp_brtarget:$simm16),			0x00000006, (ins sopp_brtarget:$simm16),
	"s_cbranch_vccz $simm16"			"s_cbranch_vccz $simm16"
	>;			>;
	def S_CBRANCH_VCCNZ : SOPP <			def S_CBRANCH_VCCNZ : SOPP <
	0x00000007, (ins sopp_brtarget:$simm16),			0x00000007, (ins sopp_brtarget:$simm16),
	"s_cbranch_vccnz $simm16"			"s_cbranch_vccnz $simm16"
	>;			>;
				def S_CBRANCH_VCCZ_64 : SOPP64 <
				0x00000006, (ins sopp_brtarget:$simm16),
				"s_cbranch_vccz $simm16"
				>;
				def S_CBRANCH_VCCNZ_64 : SOPP64 <
				0x00000007, (ins sopp_brtarget:$simm16),
				"s_cbranch_vccnz $simm16"
				>;
	} // End Uses = [VCC]			} // End Uses = [VCC]

	let Uses = [EXEC] in {			let Uses = [EXEC] in {
	def S_CBRANCH_EXECZ : SOPP <			def S_CBRANCH_EXECZ : SOPP <
	0x00000008, (ins sopp_brtarget:$simm16),			0x00000008, (ins sopp_brtarget:$simm16),
	"s_cbranch_execz $simm16"			"s_cbranch_execz $simm16"
	>;			>;
	def S_CBRANCH_EXECNZ : SOPP <			def S_CBRANCH_EXECNZ : SOPP <
	0x00000009, (ins sopp_brtarget:$simm16),			0x00000009, (ins sopp_brtarget:$simm16),
	"s_cbranch_execnz $simm16"			"s_cbranch_execnz $simm16"
	>;			>;
				def S_CBRANCH_EXECZ_64 : SOPP64 <
				0x00000008, (ins sopp_brtarget:$simm16),
				"s_cbranch_execz $simm16"
				>;
				def S_CBRANCH_EXECNZ_64 : SOPP64 <
				0x00000009, (ins sopp_brtarget:$simm16),
				"s_cbranch_execnz $simm16"
				>;
	} // End Uses = [EXEC]			} // End Uses = [EXEC]

	def S_CBRANCH_CDBGSYS : SOPP <			def S_CBRANCH_CDBGSYS : SOPP <
	0x00000017, (ins sopp_brtarget:$simm16),			0x00000017, (ins sopp_brtarget:$simm16),
	"s_cbranch_cdbgsys $simm16"			"s_cbranch_cdbgsys $simm16"
	>;			>;
				def S_CBRANCH_CDBGSYS_64 : SOPP64 <
				0x00000017, (ins sopp_brtarget:$simm16),
				"s_cbranch_cdbgsys $simm16"
				>;

	def S_CBRANCH_CDBGSYS_AND_USER : SOPP <			def S_CBRANCH_CDBGSYS_AND_USER : SOPP <
	0x0000001A, (ins sopp_brtarget:$simm16),			0x0000001A, (ins sopp_brtarget:$simm16),
	"s_cbranch_cdbgsys_and_user $simm16"			"s_cbranch_cdbgsys_and_user $simm16"
	>;			>;
				def S_CBRANCH_CDBGSYS_AND_USER_64 : SOPP64 <
				0x0000001A, (ins sopp_brtarget:$simm16),
				"s_cbranch_cdbgsys_and_user $simm16"
				>;

	def S_CBRANCH_CDBGSYS_OR_USER : SOPP <			def S_CBRANCH_CDBGSYS_OR_USER : SOPP <
	0x00000019, (ins sopp_brtarget:$simm16),			0x00000019, (ins sopp_brtarget:$simm16),
	"s_cbranch_cdbgsys_or_user $simm16"			"s_cbranch_cdbgsys_or_user $simm16"
	>;			>;
				def S_CBRANCH_CDBGSYS_OR_USER_64 : SOPP64 <
				0x00000019, (ins sopp_brtarget:$simm16),
				"s_cbranch_cdbgsys_or_user $simm16"
				>;

	def S_CBRANCH_CDBGUSER : SOPP <			def S_CBRANCH_CDBGUSER : SOPP <
	0x00000018, (ins sopp_brtarget:$simm16),			0x00000018, (ins sopp_brtarget:$simm16),
	"s_cbranch_cdbguser $simm16"			"s_cbranch_cdbguser $simm16"
	>;			>;
				def S_CBRANCH_CDBGUSER_64 : SOPP64 <
				0x00000018, (ins sopp_brtarget:$simm16),
				"s_cbranch_cdbguser $simm16"
				>;

	} // End isBranch = 1			} // End isBranch = 1
	} // End isTerminator = 1			} // End isTerminator = 1

	let hasSideEffects = 1 in {			let hasSideEffects = 1 in {
	def S_BARRIER : SOPP <0x0000000a, (ins), "s_barrier",			def S_BARRIER : SOPP <0x0000000a, (ins), "s_barrier",
	[(int_amdgcn_s_barrier)]> {			[(int_amdgcn_s_barrier)]> {
	let SchedRW = [WriteBarrier];			let SchedRW = [WriteBarrier];
	▲ Show 20 Lines • Show All 598 Lines • Show Last 20 Lines

test/MC/AMDGPU/offsetbug.s

This file was added.

				// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1010 -show-encoding %s \| FileCheck %s --check-prefix=GFX10
				// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1010 -filetype=obj %s \| llvm-objdump -disassemble -mcpu=gfx1010 - \| FileCheck %s --check-prefix=BIN
				s_getpc_b64 s[0:1]
				v_add_nc_u32_e32 v4, s6, v0
				s_mov_b64 s[16:17], s[0:1]
				s_mov_b64 s[18:19], s[0:1]
				s_mov_b64 s[24:25], s[0:1]
				s_mov_b32 s0, s5
				arsenmUnsubmitted Not Done Reply Inline Actions Can you split the test into separate files? I think there should be one function that needs 1 relaxation, another where the relaxation of one triggers the relaxation of the other, and another that triggers this twice arsenm: Can you split the test into separate files? I think there should be one function that needs 1…
				s_mov_b32 s18, s4
				s_mov_b32 s16, s3
				s_mov_b32 s24, s2
				s_load_dwordx4 s[8:11], s[0:1], 0x10
				s_load_dwordx4 s[12:15], s[0:1], 0x0
				s_load_dwordx4 s[4:7], s[18:19], 0x0
				s_load_dwordx4 s[20:23], s[16:17], 0x0
				s_load_dwordx4 s[0:3], s[24:25], 0x0
				s_waitcnt lgkmcnt(0)
				tbuffer_load_format_x v0, v4, s[8:11], format:22, 0 idxen offset:4
				tbuffer_load_format_xyzw v[9:12], v4, s[8:11], format:56, 0 idxen offset:8
				tbuffer_load_format_xyzw v[13:16], v4, s[8:11], format:56, 0 idxen offset:12
				arsenmUnsubmitted Not Done Reply Inline Actions Can you use trivial instructions for sizes? I don't trust the size of most instructions to stay the same arsenm: Can you use trivial instructions for sizes? I don't trust the size of most instructions to stay…
				rtaylorAuthorUnsubmitted Done Reply Inline Actions Yes, I can. I can make them nops. rtaylor: Yes, I can. I can make them nops.
				s_waitcnt vmcnt(1)
				s_cbranch_vccnz BB0_2
				// GFX10: s_cbranch_vccnz BB0_2 ; encoding: [A,A,0x87,0xbf]
				// GFX10-NEXT: ; fixup A - offset: 0, value: BB0_2, kind: fixup_si_sopp_br
				// BIN: s_cbranch_vccnz BB0_2 // 00000000006C: BF870060
				tbuffer_load_format_xyzw v[8:11], v4, s[8:11], format:56, 0 idxen offset:16
				tbuffer_load_format_x v1, v4, s[8:11], format:22, 0 idxen offset:20
				tbuffer_load_format_x v2, v4, s[8:11], format:22, 0 idxen offset:24
				tbuffer_load_format_x v3, v4, s[8:11], format:22, 0 idxen
				tbuffer_load_format_xyzw v[4:7], v4, s[12:15], format:74, 0 idxen
				s_buffer_load_dword s62, s[4:7], 0x0
				v_nop
				s_buffer_load_dwordx8 s[12:19], s[20:23], 0x0
				s_buffer_load_dwordx4 s[8:11], s[20:23], 0x20
				s_waitcnt lgkmcnt(0)
				s_and_b64 vcc, exec, s[28:29]
				s_cbranch_vccnz BB0_1
				// GFX10: s_cbranch_vccnz BB0_1 ; encoding: [A,A,0x87,0xbf]
				// GFX10-NEXT: ; fixup A - offset: 0, value: BB0_1, kind: fixup_si_sopp_br
				// BIN: s_cbranch_vccnz BB0_1 // 0000000000BC: BF870041
				s_nop 0
				s_cbranch_execz BB0_3
				// GFX10: s_cbranch_execz BB0_3 ; encoding: [A,A,0x88,0xbf]
				// GFX10-NEXT: ; fixup A - offset: 0, value: BB0_3, kind: fixup_si_sopp_br
				// BIN: s_cbranch_execz BB0_3 // 0000000000C8: BF880040
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				arsenmUnsubmitted Not Done Reply Inline Actions Formatting arsenm: Formatting
				rtaylorAuthorUnsubmitted Done Reply Inline Actions Ah, missed this, thanks. rtaylor: Ah, missed this, thanks.
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_nop 0
				s_buffer_load_dword s26, s[0:3], 0x48
				s_waitcnt lgkmcnt(0)
				s_and_b64 vcc, exec, s[28:29]
				BB0_1:
				s_buffer_load_dword s28, s[4:7], 0x10
				BB0_3:
				s_waitcnt lgkmcnt(0)
				exp param0 v3, v0, v1, v2
				exp param1 v4, v4, v4, off
				s_cbranch_vccnz BB0_2
				// GFX10: s_cbranch_vccnz BB0_2 ; encoding: [A,A,0x87,0xbf]
				// GFX10-NEXT: ; fixup A - offset: 0, value: BB0_2, kind: fixup_si_sopp_br
				// BIN: s_cbranch_vccnz BB0_2 // 0000000001E0: BF870003
				s_nop 0
				s_nop 0
				s_nop 0
				BB0_2:
				s_nop 0
				s_nop 0
				s_endpgm

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix for branch offset hardware workaroundClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 205334

lib/Target/AMDGPU/AMDGPU.td

lib/Target/AMDGPU/AMDGPUSubtarget.h

lib/Target/AMDGPU/AMDGPUSubtarget.cpp

lib/Target/AMDGPU/MCTargetDesc/AMDGPUAsmBackend.cpp

lib/Target/AMDGPU/SOPInstructions.td

test/MC/AMDGPU/offsetbug.s

[AMDGPU] Fix for branch offset hardware workaround
ClosedPublic