LLM versus you at a technical interview - Verification Haven

This year has seen a lot of news, achievements, failures, and hysterics on the topic of AI. It passes exams, throws people out of jobs, and wins a Nobel Prize (not really). I think Christmas holidays and its inalienable alcoholic haze are the ripe time to deliver the ultimate ass kick to meat bags and compare how AI will fare at the technical interview for an ASIC verification position. ![[bender.png]] ## Input Data I used GPT-4o and Gemini 2.0 Flash, selected for no particular reason. No special instructions were given to the LLMs—just the questions, the same questions I ask candidates. There are three main chunks of questions: 1. **Basic UVM:** Usually, I ask these to junior candidates only. The questions are factual. 2. **Advanced SV/UVM:** To answer these correctly, the candidate has to understand the areas targeted by the questions. This is the type of knowledge that doesn’t typically stick when you just read a book and never use it. 3. **Problem-Solving:** How to verify such-and-such DUT, what’s the proper architecture for a UVM agent with certain requirements, and so on. The most interesting part, obviously. Furthermore, I tried to use the same level of scrutiny I use at interviews to assess understanding, which is not excessively high. An interview is stressful, and a candidate eventually grows tired. LLMs, on the other hand, tend to be accurate only to a certain degree, and the more you ask, the more mistakes accumulate. ## Results **TLDR** before going into details: I will say that both LLMs performed more than adequately. In fact, **I would place them in the upper 25%** of my candidates. A summary of performance per section is in the table below. | Section | ChatGPT-4o | Gemini 2.0-Flash | | --------------- | ---------- | ---------------- | | Basic UVM | 95% | 90% | | Advanced SV/UVM | 80% | 75% | | Problems | 75% | 75% | Now, specifics. LLMs received almost full marks on the basic questions. I would’ve been surprised otherwise: usually, the first link from Google contains a good enough answer. Both models, however, struggled with UVM phases. For some reason, that’s too much for them to comprehend. They also performed well in the advanced SV/UVM questions. Not too surprising either: these are still questions one can google, and LLMs don’t require understanding to memorize a fact. Here, the shortcomings of the models became apparent, however. By shortcomings, I mean an utter, incomprehensible, Hollywood-level stupidity. A model would give me a correct answer and then illustrate it with a code snippet that was syntactically correct but completely missed the point of the answer. After pointing this out, the model would gladly amend the answer… only to make the exact same mistake again. Talk about understanding. Now, the most interesting part: problem-solving. Again, both models were generally quite good. One of the problems involved reading an Excel sheet, understanding the interface described there, and explaining the verification strategy for such a DUT. From the manner of the answers, it seemed that the models didn’t understand the DUT and just threw me a bunch of typical considerations for the class of similar designs. I wouldn’t say it’s bad, though: the answer wasn’t wrong; it was just incomplete. LLMs also managed to write some simple code and propose a working architecture for a specific UVM agent. GPT even managed to get what I consider a golden solution after an additional question. In general, I liked GPT’s answers better. It was more concise and precise. Gemini generated much more text, and these unprompted explanations sometimes contained mistakes. ## Conclusion From the interactions with the models, it’s apparent that any kind of understanding is not even a question. And that’s okay. The most important thing is that many of my technical questions have roots in practice: problems I’ve encountered in the wild, questions I’ve been asked by colleagues, and issues I’ve helped to resolve. The very good performance of LLMs on such questions promises they can be of real help in solving everyday verification issues. They are quick, get the gist of the question on the first try, and provide answers that are good enough not necessarily to solve the problem but to point in the right direction. Caution should be exercised, of course. The models tend to make a lot of mistakes in the details. It’s not a problem. On the contrary, we all would’ve been in deep trouble otherwise. As of now, ChatGPT is not ready to take your job, whatever its performance in the interview. At the same time, it’s not to be dismissed: the question is not whether to use it or not, but how to do so effectively.