profile-photo

Austin Silveria

Trying to saturate the networked memory and compute hierarchy

Email X Profile GitHub

Research

Memory Efficient Attention over Structured Trees
Austin Silveria
Feb 2024

Group queries into blocks that maximize shared ancestry to minimize total key value loads.

Sparsity Aware Inference with CPU-DRAM Offloaded Language Models
Austin Silveria
Dec 2023

10x faster approximate batch size 1 inference for offloaded models by exploiting structured MLP sparsity.

Timeline

2023 -
Independent AI research/engineering focused on efficient edge inference.
2019 - 2023
Jr. Software Developer during school/summers, then full time after graduation. Launched a new jurisdiction planning automation system from scratch (which delivery stations cover which areas) and mentored Jr. Developers.
2017 - 2021

B.S. in Computer Science at the California Polytechnic State University, San Luis Obispo. Side hobby of reading research papers and working on small projects--e.g. a CLI tool to explore personal search history with sentence embeddings and a web app extending this to a mapbox application.