This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Increase scalar integer divide latency for SiFive7.
ClosedPublic

Authored by craig.topper on May 22 2023, 12:30 PM.

Download Raw Diff

Details

Reviewers

michaelmaitland
reames
arcbbb
monkchiang

Commits

rG5734a81a5527: [RISCV] Increase scalar integer divide latency for SiFive7.

Summary

The scalar divider produces 1 bit per cycle so the worst case
latency is the input width plus a couple cycles.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

craig.topper created this revision.May 22 2023, 12:30 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2023, 12:30 PM

Herald added subscribers: jobnoorman, luke, VincentWu and 28 others. · View Herald Transcript

craig.topper requested review of this revision.May 22 2023, 12:30 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2023, 12:30 PM

Herald added subscribers: • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

michaelmaitland added inline comments.May 22 2023, 12:35 PM

llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
201–202	1 bit per cycle so the worst case latency is the input width plus a couple cycles. Why does latency take 2 extra cycles and resource cycles is only 1 extra cycle?

craig.topper added inline comments.May 22 2023, 12:55 PM

llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
201–202	I might still have the numbers off by a couple cycles, but it's probably better than being off by a factor of 2 or 4. They were already 1 apart so I left them 1 apart. This is what's currently in our downstream. The basic idea is that there is one pipeline stage in the divider that is used repeatedly in a loop. There is 1-2 stages in front of that and 1-2 stages after that. Two divides can't overlap in the repeated part but we might be able to start a new divide while the previous divide is still in the stages after the repeated part. The ResourceCycles for SiFive7IDiv is intended to track the number of cycles that can't overlap. Unfortunately, I don't have the exact number for how much they can overlap.

LGTM.

This revision is now accepted and ready to land.May 22 2023, 1:02 PM

This revision was landed with ongoing or failed builds.May 22 2023, 1:37 PM

Closed by commit rG5734a81a5527: [RISCV] Increase scalar integer divide latency for SiFive7. (authored by craig.topper). · Explain Why

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rG5734a81a5527: [RISCV] Increase scalar integer divide latency for SiFive7..

Harbormaster completed remote builds in B233664: Diff 524431.May 22 2023, 2:55 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVSchedSiFive7.td

8 lines

Diff 524466

llvm/lib/Target/RISCV/RISCVSchedSiFive7.td

	Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines
	// Integer multiplication			// Integer multiplication
	let Latency = 3 in {			let Latency = 3 in {
	def : WriteRes<WriteIMul, [SiFive7PipeB]>;			def : WriteRes<WriteIMul, [SiFive7PipeB]>;
	def : WriteRes<WriteIMul32, [SiFive7PipeB]>;			def : WriteRes<WriteIMul32, [SiFive7PipeB]>;
	}			}

	// Integer division			// Integer division
	def : WriteRes<WriteIDiv, [SiFive7PipeB, SiFive7IDiv]> {			def : WriteRes<WriteIDiv, [SiFive7PipeB, SiFive7IDiv]> {
	let Latency = 16;			let Latency = 66;
	let ResourceCycles = [1, 15];			let ResourceCycles = [1, 65];
				michaelmaitlandUnsubmitted Not Done Reply Inline Actions 1 bit per cycle so the worst case latency is the input width plus a couple cycles. Why does latency take 2 extra cycles and resource cycles is only 1 extra cycle? michaelmaitland: > 1 bit per cycle so the worst case latency is the input width plus a couple cycles. Why does…
				craig.topperAuthorUnsubmitted Done Reply Inline Actions I might still have the numbers off by a couple cycles, but it's probably better than being off by a factor of 2 or 4. They were already 1 apart so I left them 1 apart. This is what's currently in our downstream. The basic idea is that there is one pipeline stage in the divider that is used repeatedly in a loop. There is 1-2 stages in front of that and 1-2 stages after that. Two divides can't overlap in the repeated part but we might be able to start a new divide while the previous divide is still in the stages after the repeated part. The ResourceCycles for SiFive7IDiv is intended to track the number of cycles that can't overlap. Unfortunately, I don't have the exact number for how much they can overlap. craig.topper: I might still have the numbers off by a couple cycles, but it's probably better than being off…
	}			}
	def : WriteRes<WriteIDiv32, [SiFive7PipeB, SiFive7IDiv]> {			def : WriteRes<WriteIDiv32, [SiFive7PipeB, SiFive7IDiv]> {
	let Latency = 16;			let Latency = 34;
	let ResourceCycles = [1, 15];			let ResourceCycles = [1, 33];
	}			}

	// Bitmanip			// Bitmanip
	let Latency = 3 in {			let Latency = 3 in {
	// Rotates are in the late-B ALU.			// Rotates are in the late-B ALU.
	def : WriteRes<WriteRotateImm, [SiFive7PipeB]>;			def : WriteRes<WriteRotateImm, [SiFive7PipeB]>;
	def : WriteRes<WriteRotateImm32, [SiFive7PipeB]>;			def : WriteRes<WriteRotateImm32, [SiFive7PipeB]>;
	def : WriteRes<WriteRotateReg, [SiFive7PipeB]>;			def : WriteRes<WriteRotateReg, [SiFive7PipeB]>;
	▲ Show 20 Lines • Show All 760 Lines • Show Last 20 Lines