accelerated-matrix-processor
HW-SW Co-Designed Stack for a custom Tensor-Core (targeting TSMC 70nm)
- Leading memory-subsystem development in the AI Hardware team. Responsible for micro-architecture and ISA design.
- Designed \& functionally verified the the lockup-free D-Cache and Systolic Array Controllers integrated into the AMP1 Tensor Core. Synthesized using Cadence Flowtool to 700MHz, and ensured toggle coverage using Siemens QuestaSim.
- Enhanced the AFTx07 RISCV core with the Zicond Ext. for macro-fusion of conditional logic/arithmetic sequences.
- Helped develop semiconductor-design-specific curriculum under the CASCADE Apprenticeship Program w/ Synopsys. % * Responsible for ensuring team documentation adherence to standards and assisting with logistics for 200+ students.

Top-level view of AMP1
dcache

Top-Level of the 4-Way Non-Blocking Parameterizble D-Cache using PLRU replacement.

Addressing scheme inside each Bank

Inside view of each Bank

Inside view of the MSHR Buffer

Showcasing Parameters
crossbar

Top-Level of the Full-Mesh and Butterfly Crossbars
scratchpad + tca

Top-Level of AMP1's Scratchpad and Compute Accelerator