This short course introduces the role of large language models (LLMs) and agentic workflows in chip design, with an emphasis on practical applications and evaluation. It begins by examining how LLM-driven workflows from the software domain translate to hardware design, then focuses on the evolving landscape of AI capabilities across a range of chip design tasks. These include code generation, testbench creation, debugging and root cause analysis, design space exploration, and early forms of end-to-end agentic workflows that operate across toolchains. The course highlights how these capabilities are being applied today and what they suggest about near-term impact.
The course then covers the importance of benchmark and evaluation design for AI systems in chip design, including distinctions between agentic and non-agentic tasks, challenges in assessing open-ended workflows, and the importance of reproducibility and standardized reporting. It concludes with a forward-looking discussion on next-generation agentic workflows, including deeper integration with EDA tools, multi-agent orchestration, and emerging research directions such as benchmark standardization, interoperability, and more robust evaluation frameworks.