atalla-tensor-core

HW-SW Co-Designed Stack for a custom Tensor-Core (targeting TSMC 65nm)

  • Leading on-chip Memory Subsystem for Atalla Tensor Core; focusing on architecture diagramming & ISA design.
  • Built a cycle-accurate simulator of the datapath for performance modelling using implicit-convolution and GEMM kernels.
  • Architected a parameterizable 2MB Scratchpad with on-the-fly swizzling and a pipelined N × N crossbar – optimized for PPA.
  • Designed INT8/FP16 datapaths between Systolic Array & Vector Core; integrating DDR4 controller for asynchronous DRAM transfers.

Main Repo.

scratchpad

Code here.

Top-Level of Atalla's 2MB Scratchpad Arch

dcache

Code here.

Top-Level of the 4-Way Non-Blocking Parameterizble D-Cache using PLRU replacement.
Addressing scheme inside each Bank
Inside view of the MSHR Buffer

crossbar

Code here.

Top-Level of the Full-Mesh and Butterfly Crossbars