PhD Student Position in Efficient and Reliable AI Inference

The University of Trento and the Hicrest Laboratory invite applications for a PhD student position on efficient and reliable inference in modern AI systems.

The project will investigate techniques such as quantization, sparsification, and pruning to reduce the computational cost, memory footprint, and energy consumption of deep learning models, while carefully assessing their impact on accuracy, runtime performance, and reliability.

The successful candidate will develop tools, benchmarks, and optimization methods for characterising and improving inference workloads across models, datasets, hardware platforms, and deployment conditions.

The work will contribute to the Archytas framework and will target reproducible, experimentally grounded research in AI systems.

Research context

Modern deep learning models are increasingly expensive to deploy. Inference workloads require substantial computational resources, memory bandwidth, and energy, especially when models are executed at scale or on heterogeneous hardware platforms. Techniques such as quantization, sparsification, and pruning are widely used to improve efficiency, but their effects are not always straightforward.

Reducing precision, removing parameters, or exploiting sparsity can improve throughput and memory efficiency, but may also introduce accuracy degradation, performance variability, or reliability issues under different execution conditions. A systematic methodology is therefore needed to understand when these techniques are beneficial, where they fail, and how they can be improved.

This PhD project will address these challenges by combining AI systems benchmarking, performance engineering, and reliability-aware optimization.

Research activities

During the first year, the PhD student will focus on the development of tools and benchmarks to systematically evaluate the impact of quantization, sparsification, and pruning on inference workloads. The benchmark infrastructure will support reproducible experiments and comparative evaluation across:

neural network models and architectures;
datasets and inference tasks;
hardware platforms and accelerators;
optimization techniques and configuration choices; metrics such as accuracy, latency, throughput, memory footprint, energy efficiency, and reliability.

During the second and third years, the PhD thesis will build on the initial experimental analysis to investigate and improve existing inference optimization techniques. The objective will be to identify limitations of current approaches and design enhanced methods that better balance efficiency, accuracy, and reliability.

Possible research directions include:

adaptive quantization strategies;
reliability-aware pruning and sparsification methods;
runtime mechanisms for selecting or tuning inference configurations;
model-, workload-, and hardware-aware optimization policies;
experimental analysis of performance and reliability trade-offs in realistic inference scenarios.

Expected outcomes

The expected result of the PhD project is a methodological and experimental framework for evaluating and improving efficient inference techniques. The project will produce both software artefacts and scientific contributions, including:

a reproducible benchmark infrastructure for efficient AI inference;
a systematic evaluation of quantization, sparsification, and pruning techniques;
new methods for balancing efficiency, accuracy, and reliability;
experimental insights into inference behaviour across hardware platforms;
publications in leading conferences and journals in AI systems, high-performance computing, computer architecture, and machine learning systems.

Candidate profile

We are looking for a motivated candidate with a strong background in computer science, computer engineering, machine learning systems, or a related field.

** Required qualifications**

Master’s degree, or equivalent, in Computer Science, Computer Engineering, Electrical Engineering, Data Science, Artificial Intelligence, or a related discipline.
Strong programming skills, preferably in Python and/or C/C++.
Familiarity with deep learning frameworks such as PyTorch, TensorFlow, ONNX Runtime, or similar.
Interest in systems-oriented AI research, benchmarking, performance analysis, and reproducible experimentation.
Good written and spoken English.

Desirable qualifications

Experience in one or more of the following areas will be considered an advantage:

deep learning inference optimization; quantization, pruning, sparsity, or model compression;
GPU programming, CUDA, or accelerator-based computing;
performance profiling and benchmarking; reliability analysis, fault tolerance, or approximate computing;
energy-efficient computing;
high-performance computing or heterogeneous systems.

What we offer

The selected candidate will join an active research environment at the University of Trento and will work on a timely topic at the intersection of AI, systems, and efficient computing.

The position offers:

a fully funded 3-year PhD position (1.7K euros per month after tax);
opportunities to publish in international conferences and journals;
access to experimental infrastructure for AI and systems research;
collaboration opportunities within the Archytas framework and related research initiatives; support for attending conferences, workshops, summer schools, and research visits, subject to project rules and available funding.
Internship at Covision Lab

About

University of Trento

The University of Trento, founded in 1962, transitioned to a public institution in 1982. It serves over 16,000 students and offers a robust environment for study and research, supported by …

Summary

The University of Trento seeks a PhD student to explore efficient AI inference techniques, focusing on quantization, pruning, and benchmarking for improved accuracy and reliability across diverse systems.