About Me
Hi, I am Zhiyu Ding from Shandong, China. I am currently majoring in Data Science and Big Data Technology at Southwest Petroleum University.
In the world of code, I’m a self-proclaimed "performance geek." For me, simply making a program run is never enough. I thrive on pushing the limits of high-performance computing through parallel design and CUDA optimization. Whether it's precise profiling, pinpointing bottlenecks, or meticulously tuning memory access patterns, I genuinely enjoy every moment spent squeezing every last drop of power out of the hardware.
Beyond the lab, I am equally passionate about life. I'm fascinated by F1 and endurance racing where engineering limits meet human resolve. This passion extends to the open road because I love the thrill of driving and exploring new places myself. In quieter moments, you’ll find me cooking up a storm in the kitchen or behind a camera lens to capture landscapes and stories that make life beautiful.
My current dream is to visit the Nürburgring and experience the purest freedom of speed in the winding "Green Hell."
Education

Southwest Petroleum University
School of Computer Science and Software Engineering · Data Science and Big Data Technology
GPA 4.13 / 5.0, ranked 1 / 66 in major
2023 - Present
Experience

AlphaFold3 Inference Performance Optimization
ASC25 Student Supercomputer Challenge
Deployed and optimized AlphaFold3 on an Intel Xeon plus NVIDIA A100 heterogeneous platform, focusing on JAX compilation latency, GPU execution strategy, and CPU-side numerical stability. Achieved 1.2x to 5.3x speedup while preserving result quality.
2025.01 - 2025.02

Parallel Optimization for an Oil Spill Prediction Model
2024 Marine Computing Challenge Finals
Implemented MPI plus OpenMP hybrid parallelization for a two-dimensional oil spill prediction model, then used VTune to improve load balance, communication, and memory behavior. Reached 2482.14x speedup on 2 nodes / 128 CPU cores and won the national third prize.
2024.07 - 2024.08

Tecorigin Deep Learning Operator Optimization
OpenAtom Competition · Operator Development Challenge
Identified I/O bottlenecks in a convolution forward operator and optimized it with SPM memory usage, double-buffered asynchronous pipelining, SIMD data reordering, and cost-model-driven tuning, delivering 3.7x overall speedup.
2024.06 - 2024.09

PCG Optimization on the New Generation Sunway Platform
Domestic CPU Parallel Application Challenge
Reworked SpMV, dot product, and preconditioning stages with DMA / LDM dataflow optimization, kernel fusion, and reduction tuning, reducing total runtime from 1287 seconds to 32.5 seconds.
2024.02 - 2024.04
