atalla-tensor-core
HW-SW Co-Designed Stack for a custom Tensor-Core (targeting TSMC 65nm)
- Leading on-chip Memory Subsystem for Atalla Tensor Core; focusing on architecture diagramming & ISA design.
- Built a cycle-accurate simulator of the datapath for performance modelling using implicit-convolution and GEMM kernels.
- Architected a parameterizable 2MB Scratchpad with on-the-fly swizzling and a pipelined N × N crossbar – optimized for PPA.
- Designed INT8/FP16 datapaths between Systolic Array & Vector Core; integrating DDR4 controller for asynchronous DRAM transfers.
Poster presented to Semiconductor Leadership Board @ Purdue!
scratchpad
Code here.
Top-Level of Atalla's 2MB Scratchpad Arch
dcache
Code here.
Top-Level of the 4-Way Non-Blocking Parameterizble D-Cache using PLRU replacement.
Addressing scheme inside each Bank
Inside view of the MSHR Buffer
crossbar
Code here.
Top-Level of the Full-Mesh and Butterfly Crossbars