accelerated-matrix-processor

HW-SW Co-Designed Stack for a custom Tensor-Core (targeting TSMC 70nm)

I’m responsible for parts of the software and hardware aspects of the Memory Subsystem integrated into AMP0/AMP1. This involves ensuring optimal scratchpad usage and QoS through multi-layered-control starting at the PyTorch Dispatcher through a Software-Controlled Scratchpad and into the Systolic Array FUs.

Top-level view of AMP1

dcache

Top-Level of the 4-Way Non-Blocking Parameterizble D-Cache using PLRU replacement.
Addressing scheme inside each Bank
Inside view of each Bank
Inside view of the MSHR Buffer
Showcasing Parameters

crossbar

Top-Level of the Full-Mesh and Butterfly Crossbars

scratchpad + tca

Top-Level of AMP1's Scratchpad and Compute Accelerator