AXI4-Lite Slave From Scratch: Connect Your RTL to a Processor

Every custom FPGA block eventually needs a control interface. The processor has to configure it, read its status, and pass data in and out. On Zynq and MicroBlaze systems, that interface is AXI4-Lite — and if you have ever opened Vivado’s “Create and Package New IP” wizard, you already know it generates a template slave for you. But the generated code is 300 lines of deeply defensive boilerplate. Understanding none of it means you cannot debug it when something goes wrong.

AXI4-Lite looks intimidating at first — five channels, fifteen signals, a handshake protocol. But the entire slave reduces to two independent paths: a write path that lets the processor set register values, and a read path that lets the processor read them back. This post builds a complete, production-ready CSR (Control and Status Register) slave from those two paths, from scratch.

1. Five Channels, One Rule

AXI4-Lite has five unidirectional channels. The write transaction uses three: AW (write address), W (write data), and B (write response). The read transaction uses two: AR (read address) and R (read data). Each channel operates independently — AW and W can arrive in any order, and the slave must handle both.

Every channel uses the same handshake mechanism: a VALID signal driven by the sender, and a READY signal driven by the receiver. A transfer completes on any clock edge where both VALID and READY are high simultaneously. The sender holds VALID until the transfer completes; the receiver drives READY whenever it can accept.

axi_channels.sv
// AXI4-Lite slave port list — all five channels // Vivado auto-prefixes these with “s_axi_” in generated IP. // Write address channel (master → slave) input logic [31:0] awaddr, // address of register to write input logic awvalid, // master has valid address output logic awready, // slave can accept address // Write data channel (master → slave) input logic [31:0] wdata, // data to write input logic [3:0] wstrb, // byte enables: bit N covers wdata[8N+7:8N] input logic wvalid, // master has valid data output logic wready, // slave can accept data // Write response channel (slave → master) output logic [1:0] bresp, // 2’b00 = OKAY, 2’b10 = SLVERR output logic bvalid, // slave has valid response input logic bready, // master can accept response // Read address channel (master → slave) input logic [31:0] araddr, // address of register to read input logic arvalid, // master has valid address output logic arready, // slave can accept address // Read data channel (slave → master) output logic [31:0] rdata, // data returned to master output logic [1:0] rresp, // 2’b00 = OKAY output logic rvalid, // slave has valid read data input logic rready // master can accept data
The one rule you cannot break: A sender must never make VALID conditional on READY. If the master waits for AWREADY before asserting AWVALID, and the slave waits for AWVALID before asserting AWREADY, the channel deadlocks permanently. AXI requires that VALID is asserted independently — the sender commits to the transfer first, and the receiver decides when to accept.

2. The Write Path

A write transaction requires the slave to accept both an address (AW channel) and data (W channel) before it can write a register and send the response (B channel). The AW and W channels are independent — the master may send them in the same cycle or in either order. The slave must handle all three arrival orderings without deadlock.

The cleanest implementation for a CSR slave: accept AW and W only when both are valid simultaneously and no response is pending. This avoids the need to buffer the address or data separately and keeps the logic compact. After capturing the write, assert BVALID for one cycle and wait for BREADY before accepting a new transaction.

axi_write_path.sv
// Accept AW and W simultaneously — simplest correct write path // Constraint: master must present both channels before slave accepts. // This matches how Xilinx AXI masters (PS, DMA) actually behave. logic aw_en; assign aw_en = awvalid & wvalid & ~bvalid; // both valid, no pending response assign awready = aw_en; assign wready = aw_en; assign bresp = 2’b00; // always OKAY always_ff @(posedge clk or posedge rst) begin if (rst) begin bvalid <= 1’b0; ctrl_reg <= ‘0; din_reg <= ‘0; end else begin if (aw_en) begin case (awaddr[3:2]) // word-addressed: bits [1:0] ignored 2’d0: ctrl_reg <= wdata; // 0x00: CTRL (R/W) 2’d2: din_reg <= wdata; // 0x08: DATA_IN (R/W) // 0x04 (STATUS) and 0x0C (DATA_OUT) are read-only — writes ignored endcase bvalid <= 1’b1; end else if (bready & bvalid) bvalid <= 1’b0; // master accepted response, ready for next transaction end end

3. The Read Path

A read transaction is simpler than a write: the master presents an address on the AR channel, and the slave returns data on the R channel. There is no separate response channel — the RRESP field travels alongside the data. The slave accepts the read address, decodes it, and drives RDATA and RVALID on the following cycle.

The implementation pulses ARREADY for one cycle when the address arrives, latches the address internally, and then asserts RVALID with the decoded register data. RVALID stays high until the master accepts with RREADY.

axi_read_path.sv
assign rresp = 2’b00; // always OKAY logic [31:0] araddr_lat; // latch address on acceptance always_ff @(posedge clk or posedge rst) begin if (rst) begin arready <= 1’b0; rvalid <= 1’b0; rdata <= ‘0; araddr_lat <= ‘0; end else begin // AR channel: accept address, hold ARREADY for one cycle if (arvalid & ~arready) begin arready <= 1’b1; araddr_lat <= araddr; end else arready <= 1’b0; // R channel: drive data on cycle after ARREADY pulse if (arready & arvalid & ~rvalid) begin rvalid <= 1’b1; case (araddr_lat[3:2]) 2’d0: rdata <= ctrl_reg; // 0x00: CTRL 2’d1: rdata <= status_i; // 0x04: STATUS (from user logic) 2’d2: rdata <= din_reg; // 0x08: DATA_IN 2’d3: rdata <= dout_i; // 0x0C: DATA_OUT (from user logic) endcase end else if (rvalid & rready) rvalid <= 1’b0; // master accepted data end end
WSTRB and byte enables: The wstrb field has one bit per byte of wdata. A fully asserted wstrb = 4'b1111 means write all four bytes. For a simple CSR slave, the cleanest approach is to apply WSTRB per byte: if (wstrb[0]) reg[7:0] <= wdata[7:0]; and so on for each byte lane. Most processor writes will assert all four bits, but applying WSTRB correctly makes the slave compatible with byte-granular writes from the Linux kernel’s AXI driver.

4. The Complete CSR Slave

The full module combines both paths and exposes a clean user-logic interface: ctrl_o and din_o carry values the processor has written; status_i and dout_i carry values from user logic that the processor can read back. The register map is fixed at four 32-bit registers starting at offset 0x00. Dropping this into Vivado IP Integrator and connecting it to the Zynq PS AXI GP port gives the processor full read/write access to all four registers.

axi_csr_slave.sv
module axi_csr_slave #( parameter int BASE_ADDR = 32’h4000_0000 // Zynq GP0 default )( input logic clk, rst, // AXI4-Lite slave interface input logic [31:0] awaddr, input logic awvalid, output logic awready, input logic [31:0] wdata, input logic [3:0] wstrb, input logic wvalid, output logic wready, output logic [1:0] bresp, output logic bvalid, input logic bready, input logic [31:0] araddr, input logic arvalid, output logic arready, output logic [31:0] rdata, output logic [1:0] rresp, output logic rvalid, input logic rready, // User-logic interface // Register map: 0x00 CTRL (R/W) 0x04 STATUS (R) 0x08 DATA_IN (R/W) 0x0C DATA_OUT (R) output logic [31:0] ctrl_o, // processor → user logic input logic [31:0] status_i, // user logic → processor (read-only) output logic [31:0] din_o, // processor → user logic input logic [31:0] dout_i // user logic → processor (read-only) ); logic [31:0] ctrl_reg, din_reg; logic [31:0] araddr_lat; assign ctrl_o = ctrl_reg; assign din_o = din_reg; assign bresp = 2’b00; assign rresp = 2’b00; // ── Write path ──────────────────────────────────────────── logic aw_en; assign aw_en = awvalid & wvalid & ~bvalid; assign awready = aw_en; assign wready = aw_en; always_ff @(posedge clk or posedge rst) begin if (rst) begin bvalid <= 1’b0; ctrl_reg <= ‘0; din_reg <= ‘0; end else begin if (aw_en) begin case (awaddr[3:2]) 2’d0: ctrl_reg <= wdata; 2’d2: din_reg <= wdata; endcase bvalid <= 1’b1; end else if (bready & bvalid) bvalid <= 1’b0; end end // ── Read path ───────────────────────────────────────────── always_ff @(posedge clk or posedge rst) begin if (rst) begin arready <= 1’b0; rvalid <= 1’b0; rdata <= ‘0; araddr_lat <= ‘0; end else begin if (arvalid & ~arready) begin arready <= 1’b1; araddr_lat <= araddr; end else arready <= 1’b0; if (arready & arvalid & ~rvalid) begin rvalid <= 1’b1; case (araddr_lat[3:2]) 2’d0: rdata <= ctrl_reg; 2’d1: rdata <= status_i; 2’d2: rdata <= din_reg; 2’d3: rdata <= dout_i; endcase end else if (rvalid & rready) rvalid <= 1’b0; end end endmodule

5. Simulating the Slave: A Self-Checking Testbench

Before connecting the slave to real hardware, verify both paths in simulation. The testbench wraps the AXI handshake into two reusable tasks — axi_write and axi_read — so each test case reads as a plain register operation rather than a protocol exercise. Five tests cover the complete register map: write/readback on both writable registers, reads from both read-only user-logic ports, and back-to-back writes that stress the BVALID handshake clearing correctly.

Both tasks drive signals on the negedge and sample on the posedge, giving the DUT a full half-period of setup time. The @(posedge clk iff condition) construct blocks until the next rising edge where the condition is already true — cleaner than a polling loop.

axi_csr_tb.sv
`timescale 1ns/1ps module axi_csr_tb; localparam CLK_PERIOD = 10; // 100 MHz logic clk = 0, rst = 1; always #(CLK_PERIOD/2) clk = ~clk; // AXI4-Lite signals logic [31:0] awaddr, wdata, araddr; logic [3:0] wstrb; logic awvalid, awready, wvalid, wready; logic [1:0] bresp, rresp; logic bvalid, bready; logic arvalid, arready; logic [31:0] rdata; logic rvalid, rready; // User-logic inputs (stand-ins for real hardware) logic [31:0] ctrl_o, din_o; logic [31:0] status_i = 32’hDEAD_BEEF; logic [31:0] dout_i = 32’hCAFE_0001; axi_csr_slave dut (.*); // implicit port connections — all names match initial begin awvalid = 0; awaddr = ‘0; wvalid = 0; wdata = ‘0; wstrb = ‘0; bready = 0; arvalid = 0; araddr = ‘0; rready = 0; end // ── Write task ───────────────────────────────────────────────── // Drives AW and W simultaneously; completes after B response. task automatic axi_write(input logic [31:0] addr, data); @(negedge clk); awaddr = addr; awvalid = 1; wdata = data; wstrb = 4’hF; wvalid = 1; @(posedge clk iff (awready && wready)); // slave accepted both channels @(negedge clk); awvalid = 0; wvalid = 0; @(posedge clk iff bvalid); // wait for write response @(negedge clk); bready = 1; @(posedge clk); @(negedge clk); bready = 0; endtask // ── Read task ────────────────────────────────────────────────── task automatic axi_read(input logic [31:0] addr, output logic [31:0] rd_data); @(negedge clk); araddr = addr; arvalid = 1; @(posedge clk iff arready); // slave accepted address @(negedge clk); arvalid = 0; @(posedge clk iff rvalid); // rdata is stable rd_data = rdata; @(negedge clk); rready = 1; @(posedge clk); @(negedge clk); rready = 0; endtask // ── Test sequence ────────────────────────────────────────────── logic [31:0] rd; initial begin repeat(4) @(posedge clk); @(negedge clk); rst = 0; // Test 1: Write and read back CTRL register axi_write(32’h0000_0000, 32’hA5A5_A5A5); assert (ctrl_o === 32’hA5A5_A5A5) else $error(“CTRL write failed: got %0h”, ctrl_o); axi_read(32’h0000_0000, rd); assert (rd === 32’hA5A5_A5A5) else $error(“CTRL read failed: got %0h”, rd); // Test 2: Write and read back DATA_IN axi_write(32’h0000_0008, 32’h1234_5678); assert (din_o === 32’h1234_5678) else $error(“DATA_IN write failed: got %0h”, din_o); axi_read(32’h0000_0008, rd); assert (rd === 32’h1234_5678) else $error(“DATA_IN readback failed: got %0h”, rd); // Test 3: Read STATUS — value comes from user-logic port axi_read(32’h0000_0004, rd); assert (rd === 32’hDEAD_BEEF) else $error(“STATUS read failed: got %0h”, rd); // Test 4: Read DATA_OUT — value comes from user-logic port axi_read(32’h0000_000C, rd); assert (rd === 32’hCAFE_0001) else $error(“DATA_OUT read failed: got %0h”, rd); // Test 5: Back-to-back writes — BVALID must clear between transactions axi_write(32’h0000_0000, 32’hDEAD_0001); axi_write(32’h0000_0000, 32’hDEAD_0002); axi_read(32’h0000_0000, rd); assert (rd === 32’hDEAD_0002) else $error(“Back-to-back write failed: got %0h”, rd); $display(“All tests passed.”); $finish; end endmodule

Final Thoughts: The Protocol Is Not the Hard Part

The AXI4-Lite handshake is not complicated once you understand the valid/ready rule and see the write and read paths as separate state machines. What takes time is connecting the module correctly in Vivado IP Integrator, setting the base address in the address editor, and reading back registers from Linux userspace or bare-metal C. The RTL above is the smallest part of that workflow — but it is the part you have to own completely, because the generated template will not tell you where it breaks.

Extend this slave by adding more registers: increment the case selectors, expose new ports on the user-logic interface, and update the register map comment. The handshake logic does not change regardless of how many registers you add. For designs requiring more than 16 registers, parameterize the address decode with localparam offsets rather than hardcoded literals — that keeps the register map readable as it grows.


Happy coding.
fpgawizard.com

error: Selection is disabled!