Publications

Julian Bellavita*, Lorenzo Pichetti*, Thomas Pasquali, Flavio Vella, and Giulia Guidi. 2026. "Communication-Avoiding SpGEMM via Trident Partitioning on Hierarchical GPU Interconnect". In The 40th ACM International Conference on Supercomputing (ICS 2026). *Equal contribution

Julian Bellavita, Matthew Rubino, Nakul Iyer, Andrew Chang, Aditya Devarakonda, Flavio Vella, and Giulia Guidi. 2026. "Communication-Avoiding Linear Algebraic Kernel K-Means on GPUs". In The 40th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2026).

Julian Bellavita, Thomas Pasquali, Laura Del Rio Martin, Flavio Vella, and Giulia Guidi. 2025. "Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra". In The 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP 2025).

Thomas McFarland, Julian Bellavita, and Giulia Guidi. 2025. "Parallel GPU-Enabled Algorithms for SpGEMM on Arbitrary Semirings with Hybrid Communication". Short paper. In Proceedings of the 16th ACM/SPEC International Conference on Performance Engineering (ICPE 2025).

Adrián Castelló, Julian Bellavita, Grace Dinh, Yuka Ikarashi, Hector Martínez. "Tackling the Matrix Multiplication Micro-Kernel Generation with Exo". In IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2024).

Julian Bellavita, Mathias Jacquelin, Esmond G. Ng, Dan Bonachea, Johnny Corbino, and Paul H. Hargrove. "symPACK: A GPU-Capable Fan-Out Sparse Cholesky Solver". In Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W 2023).

Julian Bellavita, Caitlin Sim, Kesheng Wu, Alex Sim, Shinjae Yoo, Hiro Ito, Vincent Garonne, Eric Lancon. "Understanding Data Access Patterns for dCache System". In 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023).

Julian Bellavita, Alex Sim, Kesheng Wu, Inder Monga, Chin Guok, Frank Würthwein, and Diego Davila. 2022. "Studying Scientific Data Lifecycle in On-demand Distributed Storage Caches". In Fifth International Workshop on Systems and Network Telemetry and Analytics (SNTA 2022).

Talks

"Block Leverage Scores for Cache-Friendly Randomized CP Decomposition", RNLA Workshop at IPAM, 2025

"Algorithms for Computing a Tensor Times Matrix Chain in Mixed Precision", Oak Ridge National Laboratory Discrete Algorithms Group Seminar, 2025

"Multi-GPU Communication Schemes for Large-Scale Supercomputers", Cornell Systems Seminar, 2024

"Accelerating High-Dimensional K-Means Clustering on GPUs with Sparse Matrix Multiplication", University of Trento, 2024

"RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs", Cornell HPC Group, 2024

"Portable code generation and semi-automatic scheduling for BLIS microkernels with Exo", BLIS Retreat, 2022

Posters

"Forward Error Bounds and Efficient Algorithms for Computing a Tensor Times Matrix Chain in Low Precision on GPUs", ACM Student Research Competition at SC, 2025

"Mixed Precision Algorithms for Computing the Tucker Decomposition", CSGF Program Review, 2025

"Efficient Large-Scale Multi-GPU Clustering using Sparse Linear Algebra", IPDPS PhD Forum, 2025