Posts by Collection

portfolio

LMDeploy

LMDeploy, is a high performance inference framework for LLMs.

Pytorch2 is an ML compiler framework for dynamic deep learning workloads. It features Dynamo and Inductor as the compiler frontend and backend to optimize deep learning codes. I have contributed to both components during my PhD study, which deepens my understanding to ML compilers.

Triton

Triton is a GPU compiler for writing customized high-performance kernels. I have contributed to the interpreter runtime and an argsort kernel for the Top-K operation during my free time.

publications

Paper Title Number 1

Published in Journal 1, 2009

This paper is about the number 1. The number 2 is left for future work.

Recommended citation: Your Name, You. (2009). "Paper Title Number 1." Journal 1. 1(1).
Download Paper | Download Slides | Download Bibtex

Paper Title Number 2

Published in Journal 1, 2010

This paper is about the number 2. The number 3 is left for future work.

Recommended citation: Your Name, You. (2010). "Paper Title Number 2." Journal 1. 1(2).
Download Paper | Download Slides

Paper Title Number 3

Published in Journal 1, 2015

This paper is about the number 3. The number 4 is left for future work.

Recommended citation: Your Name, You. (2015). "Paper Title Number 3." Journal 1. 1(3).
Download Paper | Download Slides

Paper Title Number 4

Published in GitHub Journal of Bugs, 2024

This paper is about fixing template issue #693.

Recommended citation: Your Name, You. (2024). "Paper Title Number 3." GitHub Journal of Bugs. 1(3).
Download Paper

teaching

R244: Large-Scale Data Processing and Optimisation

Graduate course, University of Cambridge, 2021

This module provides an introduction to large-scale data processing, optimisation, and the impact on computer system’s architecture. Large-scale distributed applications with high volume data processing such as training of machine learning will grow ever more in importance. Supporting the design and implementation of robust, secure, and heterogeneous large-scale distributed systems is essential. To deal with distributed systems with a large and complex parameter space, tuning and optimising computer systems is becoming an important and complex task, which also deals with the characteristics of input data and algorithms used in the applications. Algorithm designers are often unaware of the constraints imposed by systems and the best way to consider these when designing algorithms with massive volume of data. On the other hand, computer systems often miss advances in algorithm design that can be used to cut down processing time and scale up systems in terms of the size of the problem they can address. Integrating machine learning approaches (e.g. Bayesian Optimisation, Reinforcement Learning) for system optimisation will be explored in this course.

Guoliang He

Posts by Collection

portfolio

publications

teaching