Every custom FPGA block eventually needs a control interface. The processor has to configure it, read its status, and pass data in and out. On Zynq and MicroBlaze systems, that interface is AXI4-Lite — and if you have ever opened Vivado’s “Create and Package New IP” wizard, you already know it generates a template slave for you. But the generated code is 300 lines of deeply defensive boilerplate. Understanding none of it means you cannot debug it when something goes wrong.
AXI4-Lite looks intimidating at first — five channels, fifteen signals, a handshake protocol. But the entire slave reduces to two independent paths: a write path that lets the processor set register values, and a read path that lets the processor read them back. This post builds a complete, production-ready CSR (Control and Status Register) slave from those two paths, from scratch.
1. Five Channels, One Rule
AXI4-Lite has five unidirectional channels. The write transaction uses three: AW (write address), W (write data), and B (write response). The read transaction uses two: AR (read address) and R (read data). Each channel operates independently — AW and W can arrive in any order, and the slave must handle both.
Every channel uses the same handshake mechanism: a VALID signal driven by the sender, and a READY signal driven by the receiver. A transfer completes on any clock edge where both VALID and READY are high simultaneously. The sender holds VALID until the transfer completes; the receiver drives READY whenever it can accept.
input logic [31:0] awaddr,
input logic awvalid,
output logic awready,
input logic [31:0] wdata,
input logic [3:0] wstrb,
input logic wvalid,
output logic wready,
output logic [1:0] bresp,
output logic bvalid,
input logic bready,
input logic [31:0] araddr,
input logic arvalid,
output logic arready,
output logic [31:0] rdata,
output logic [1:0] rresp,
output logic rvalid,
input logic rready
The one rule you cannot break: A sender must never make VALID conditional on READY. If the master waits for AWREADY before asserting AWVALID, and the slave waits for AWVALID before asserting AWREADY, the channel deadlocks permanently. AXI requires that VALID is asserted independently — the sender commits to the transfer first, and the receiver decides when to accept.
2. The Write Path
A write transaction requires the slave to accept both an address (AW channel) and data (W channel) before it can write a register and send the response (B channel). The AW and W channels are independent — the master may send them in the same cycle or in either order. The slave must handle all three arrival orderings without deadlock.
The cleanest implementation for a CSR slave: accept AW and W only when both are valid simultaneously and no response is pending. This avoids the need to buffer the address or data separately and keeps the logic compact. After capturing the write, assert BVALID for one cycle and wait for BREADY before accepting a new transaction.
logic aw_en;
assign aw_en = awvalid & wvalid & ~bvalid;
assign awready = aw_en;
assign wready = aw_en;
assign bresp = 2’b00;
always_ff @(posedge clk or posedge rst) begin
if (rst) begin
bvalid <= 1’b0;
ctrl_reg <= ‘0;
din_reg <= ‘0;
end else begin
if (aw_en) begin
case (awaddr[3:2])
2’d0: ctrl_reg <= wdata;
2’d2: din_reg <= wdata;
endcase
bvalid <= 1’b1;
end else if (bready & bvalid)
bvalid <= 1’b0;
end
end
3. The Read Path
A read transaction is simpler than a write: the master presents an address on the AR channel, and the slave returns data on the R channel. There is no separate response channel — the RRESP field travels alongside the data. The slave accepts the read address, decodes it, and drives RDATA and RVALID on the following cycle.
The implementation pulses ARREADY for one cycle when the address arrives, latches the address internally, and then asserts RVALID with the decoded register data. RVALID stays high until the master accepts with RREADY.
assign rresp = 2’b00;
logic [31:0] araddr_lat;
always_ff @(posedge clk or posedge rst) begin
if (rst) begin
arready <= 1’b0;
rvalid <= 1’b0;
rdata <= ‘0;
araddr_lat <= ‘0;
end else begin
if (arvalid & ~arready) begin
arready <= 1’b1;
araddr_lat <= araddr;
end else
arready <= 1’b0;
if (arready & arvalid & ~rvalid) begin
rvalid <= 1’b1;
case (araddr_lat[3:2])
2’d0: rdata <= ctrl_reg;
2’d1: rdata <= status_i;
2’d2: rdata <= din_reg;
2’d3: rdata <= dout_i;
endcase
end else if (rvalid & rready)
rvalid <= 1’b0;
end
end
WSTRB and byte enables: The wstrb field has one bit per byte of wdata. A fully asserted wstrb = 4'b1111 means write all four bytes. For a simple CSR slave, the cleanest approach is to apply WSTRB per byte: if (wstrb[0]) reg[7:0] <= wdata[7:0]; and so on for each byte lane. Most processor writes will assert all four bits, but applying WSTRB correctly makes the slave compatible with byte-granular writes from the Linux kernel’s AXI driver.
4. The Complete CSR Slave
The full module combines both paths and exposes a clean user-logic interface: ctrl_o and din_o carry values the processor has written; status_i and dout_i carry values from user logic that the processor can read back. The register map is fixed at four 32-bit registers starting at offset 0x00. Dropping this into Vivado IP Integrator and connecting it to the Zynq PS AXI GP port gives the processor full read/write access to all four registers.
module axi_csr_slave #(
parameter int BASE_ADDR = 32’h4000_0000
)(
input logic clk, rst,
input logic [31:0] awaddr, input logic awvalid, output logic awready,
input logic [31:0] wdata, input logic [3:0] wstrb,
input logic wvalid, output logic wready,
output logic [1:0] bresp, output logic bvalid, input logic bready,
input logic [31:0] araddr, input logic arvalid, output logic arready,
output logic [31:0] rdata, output logic [1:0] rresp,
output logic rvalid, input logic rready,
output logic [31:0] ctrl_o,
input logic [31:0] status_i,
output logic [31:0] din_o,
input logic [31:0] dout_i
);
logic [31:0] ctrl_reg, din_reg;
logic [31:0] araddr_lat;
assign ctrl_o = ctrl_reg;
assign din_o = din_reg;
assign bresp = 2’b00;
assign rresp = 2’b00;
logic aw_en;
assign aw_en = awvalid & wvalid & ~bvalid;
assign awready = aw_en;
assign wready = aw_en;
always_ff @(posedge clk or posedge rst) begin
if (rst) begin
bvalid <= 1’b0;
ctrl_reg <= ‘0;
din_reg <= ‘0;
end else begin
if (aw_en) begin
case (awaddr[3:2])
2’d0: ctrl_reg <= wdata;
2’d2: din_reg <= wdata;
endcase
bvalid <= 1’b1;
end else if (bready & bvalid)
bvalid <= 1’b0;
end
end
always_ff @(posedge clk or posedge rst) begin
if (rst) begin
arready <= 1’b0;
rvalid <= 1’b0;
rdata <= ‘0;
araddr_lat <= ‘0;
end else begin
if (arvalid & ~arready) begin
arready <= 1’b1;
araddr_lat <= araddr;
end else
arready <= 1’b0;
if (arready & arvalid & ~rvalid) begin
rvalid <= 1’b1;
case (araddr_lat[3:2])
2’d0: rdata <= ctrl_reg;
2’d1: rdata <= status_i;
2’d2: rdata <= din_reg;
2’d3: rdata <= dout_i;
endcase
end else if (rvalid & rready)
rvalid <= 1’b0;
end
end
endmodule
5. Simulating the Slave: A Self-Checking Testbench
Before connecting the slave to real hardware, verify both paths in simulation. The testbench wraps the AXI handshake into two reusable tasks — axi_write and axi_read — so each test case reads as a plain register operation rather than a protocol exercise. Five tests cover the complete register map: write/readback on both writable registers, reads from both read-only user-logic ports, and back-to-back writes that stress the BVALID handshake clearing correctly.
Both tasks drive signals on the negedge and sample on the posedge, giving the DUT a full half-period of setup time. The @(posedge clk iff condition) construct blocks until the next rising edge where the condition is already true — cleaner than a polling loop.
`timescale 1ns/1ps
module axi_csr_tb;
localparam CLK_PERIOD = 10;
logic clk = 0, rst = 1;
always #(CLK_PERIOD/2) clk = ~clk;
logic [31:0] awaddr, wdata, araddr;
logic [3:0] wstrb;
logic awvalid, awready, wvalid, wready;
logic [1:0] bresp, rresp;
logic bvalid, bready;
logic arvalid, arready;
logic [31:0] rdata;
logic rvalid, rready;
logic [31:0] ctrl_o, din_o;
logic [31:0] status_i = 32’hDEAD_BEEF;
logic [31:0] dout_i = 32’hCAFE_0001;
axi_csr_slave dut (.*);
initial begin
awvalid = 0; awaddr = ‘0;
wvalid = 0; wdata = ‘0; wstrb = ‘0;
bready = 0; arvalid = 0; araddr = ‘0; rready = 0;
end
task automatic axi_write(input logic [31:0] addr, data);
@(negedge clk);
awaddr = addr; awvalid = 1;
wdata = data; wstrb = 4’hF; wvalid = 1;
@(posedge clk iff (awready && wready));
@(negedge clk); awvalid = 0; wvalid = 0;
@(posedge clk iff bvalid);
@(negedge clk); bready = 1;
@(posedge clk);
@(negedge clk); bready = 0;
endtask
task automatic axi_read(input logic [31:0] addr, output logic [31:0] rd_data);
@(negedge clk);
araddr = addr; arvalid = 1;
@(posedge clk iff arready);
@(negedge clk); arvalid = 0;
@(posedge clk iff rvalid);
rd_data = rdata;
@(negedge clk); rready = 1;
@(posedge clk);
@(negedge clk); rready = 0;
endtask
logic [31:0] rd;
initial begin
repeat(4) @(posedge clk);
@(negedge clk); rst = 0;
axi_write(32’h0000_0000, 32’hA5A5_A5A5);
assert (ctrl_o === 32’hA5A5_A5A5)
else $error(“CTRL write failed: got %0h”, ctrl_o);
axi_read(32’h0000_0000, rd);
assert (rd === 32’hA5A5_A5A5)
else $error(“CTRL read failed: got %0h”, rd);
axi_write(32’h0000_0008, 32’h1234_5678);
assert (din_o === 32’h1234_5678)
else $error(“DATA_IN write failed: got %0h”, din_o);
axi_read(32’h0000_0008, rd);
assert (rd === 32’h1234_5678)
else $error(“DATA_IN readback failed: got %0h”, rd);
axi_read(32’h0000_0004, rd);
assert (rd === 32’hDEAD_BEEF)
else $error(“STATUS read failed: got %0h”, rd);
axi_read(32’h0000_000C, rd);
assert (rd === 32’hCAFE_0001)
else $error(“DATA_OUT read failed: got %0h”, rd);
axi_write(32’h0000_0000, 32’hDEAD_0001);
axi_write(32’h0000_0000, 32’hDEAD_0002);
axi_read(32’h0000_0000, rd);
assert (rd === 32’hDEAD_0002)
else $error(“Back-to-back write failed: got %0h”, rd);
$display(“All tests passed.”);
$finish;
end
endmodule
Final Thoughts: The Protocol Is Not the Hard Part
The AXI4-Lite handshake is not complicated once you understand the valid/ready rule and see the write and read paths as separate state machines. What takes time is connecting the module correctly in Vivado IP Integrator, setting the base address in the address editor, and reading back registers from Linux userspace or bare-metal C. The RTL above is the smallest part of that workflow — but it is the part you have to own completely, because the generated template will not tell you where it breaks.
Extend this slave by adding more registers: increment the case selectors, expose new ports on the user-logic interface, and update the register map comment. The handshake logic does not change regardless of how many registers you add. For designs requiring more than 16 registers, parameterize the address decode with localparam offsets rather than hardcoded literals — that keeps the register map readable as it grows.
Happy coding.
fpgawizard.com