The University of Trento and the Hicrest Laboratory invite applications for a PhD student position on efficient and reliable inference in modern AI systems.
The project will investigate techniques such as quantization, sparsification, and pruning to reduce the computational cost, memory footprint, and energy consumption of deep learning models, while carefully assessing their impact on accuracy, runtime performance, and reliability.
The successful candidate will develop tools, benchmarks, and optimization methods for characterising and improving inference workloads across models, datasets, hardware platforms, and deployment conditions.
The work will contribute to the Archytas framework and will target reproducible, experimentally grounded research in AI systems.
Research context
Modern deep learning models are increasingly expensive to deploy. Inference workloads require substantial computational resources, memory bandwidth, and energy, especially when models are executed at scale or on heterogeneous hardware platforms. Techniques such as quantization, sparsification, and pruning are widely used to improve efficiency, but their effects are not always straightforward.
Reducing precision, removing parameters, or exploiting sparsity can improve throughput and memory efficiency, but may also introduce accuracy degradation, performance variability, or reliability issues under different execution conditions. A systematic methodology is therefore needed to understand when these techniques are beneficial, where they fail, and how they can be improved.
This PhD project will address these challenges by combining AI systems benchmarking, performance engineering, and reliability-aware optimization.
Research activities
- During the first year, the PhD student will focus on the development of tools and benchmarks to systematically evaluate the impact of quantization, sparsification, and pruning on inference workloads. The benchmark infrastructure will support reproducible experiments and comparative evaluation across:
- neural network models and architectures;
- datasets and inference tasks;
- hardware platforms and accelerators;
- optimization techniques and configuration choices; metrics such as accuracy, latency, throughput, memory footprint, energy efficiency, and reliability.
- During the second and third years, the PhD thesis will build on the initial experimental analysis to investigate and improve existing inference optimization techniques. The objective will be to identify limitations of current approaches and design enhanced methods that better balance efficiency, accuracy, and reliability.
Possible research directions include:
- adaptive quantization strategies;
- reliability-aware pruning and sparsification methods;
- runtime mechanisms for selecting or tuning inference configurations;
- model-, workload-, and hardware-aware optimization policies;
- experimental analysis of performance and reliability trade-offs in realistic inference scenarios.
Expected outcomes
The expected result of the PhD project is a methodological and experimental framework for evaluating and improving efficient inference techniques. The project will produce both software artefacts and scientific contributions, including:
- a reproducible benchmark infrastructure for efficient AI inference;
- a systematic evaluation of quantization, sparsification, and pruning techniques;
- new methods for balancing efficiency, accuracy, and reliability;
- experimental insights into inference behaviour across hardware platforms;
- publications in leading conferences and journals in AI systems, high-performance computing, computer architecture, and machine learning systems.
Candidate profile
We are looking for a motivated candidate with a strong background in computer science, computer engineering, machine learning systems, or a related field.
** Required qualifications**
- Master’s degree, or equivalent, in Computer Science, Computer Engineering, Electrical Engineering, Data Science, Artificial Intelligence, or a related discipline.
- Strong programming skills, preferably in Python and/or C/C++.
- Familiarity with deep learning frameworks such as PyTorch, TensorFlow, ONNX Runtime, or similar.
- Interest in systems-oriented AI research, benchmarking, performance analysis, and reproducible experimentation.
- Good written and spoken English.
Desirable qualifications
Experience in one or more of the following areas will be considered an advantage:
- deep learning inference optimization; quantization, pruning, sparsity, or model compression;
- GPU programming, CUDA, or accelerator-based computing;
- performance profiling and benchmarking; reliability analysis, fault tolerance, or approximate computing;
- energy-efficient computing;
- high-performance computing or heterogeneous systems.
What we offer
The selected candidate will join an active research environment at the University of Trento and will work on a timely topic at the intersection of AI, systems, and efficient computing.
The position offers:
- a fully funded 3-year PhD position (1.7K euros per month after tax);
- opportunities to publish in international conferences and journals;
- access to experimental infrastructure for AI and systems research;
- collaboration opportunities within the Archytas framework and related research initiatives; support for attending conferences, workshops, summer schools, and research visits, subject to project rules and available funding.
- Internship at Covision Lab
