accelerated-matrix-processor

HW-SW Co-Designed Stack for a custom Tensor-Core (targeting TSMC 70nm)

  • Leading memory-subsystem development in the AI Hardware team. Responsible for micro-architecture and ISA design.
  • Designed \& functionally verified the the lockup-free D-Cache and Systolic Array Controllers integrated into the AMP1 Tensor Core. Synthesized using Cadence Flowtool to 700MHz, and ensured toggle coverage using Siemens QuestaSim.
  • Enhanced the AFTx07 RISCV core with the Zicond Ext. for macro-fusion of conditional logic/arithmetic sequences.
  • Helped develop semiconductor-design-specific curriculum under the CASCADE Apprenticeship Program w/ Synopsys. % * Responsible for ensuring team documentation adherence to standards and assisting with logistics for 200+ students.
Top-level view of AMP1

dcache

Top-Level of the 4-Way Non-Blocking Parameterizble D-Cache using PLRU replacement.
Addressing scheme inside each Bank
Inside view of each Bank
Inside view of the MSHR Buffer
Showcasing Parameters

crossbar

Top-Level of the Full-Mesh and Butterfly Crossbars

scratchpad + tca

Top-Level of AMP1's Scratchpad and Compute Accelerator