semturs_2024 - Obsidian Publish

## Solving Domain-Specific Problems Using LLMs > [!Cite]- > Semturs, Christopher, Shekoofeh Azizi, Scott Coull, Umesh Shankar, and Wieland Holfelder. “Solving Domain-Specific Problems Using LLMs,” 2024. [https://www.kaggle.com/whitepaper-solving-domains-specific-problems-using-llms](https://www.kaggle.com/whitepaper-solving-domains-specific-problems-using-llms). > > [link](https://www.kaggle.com/whitepaper-solving-domains-specific-problems-using-llms) [online](http://zotero.org/users/local/kycSZ2wR/items/YR9X53VN) [local](zotero://select/library/items/YR9X53VN) [pdf](file://C:\Users\erikt\Zotero\storage\XJGXVFYK\Semturs%20et%20al.%20-%202024%20-%20Solving%20Domain-Specific%20Problems%20Using%20LLMs.pdf) ## Notes %% begin notes %% %% end notes %% %% begin annotations %% ### Imported: 2024-11-16 10:40 am In reality, the people who practice cybersecurity - the developers, system administrators, SREs, and many junior analysts to whom our work here is dedicated - have the Sisyphean task of keeping up with the latest threats and trying to protect complex systems against them. Many practitioners’ days are largely filled with repetitive or manual tasks, such as individually triaging hundreds of alerts, that take valuable time away from developing more strategic defenses. Based on our experience working with users and partners, we see three major challenges in the security industry today: threats, toil, and talent. New and evolving threats: The threat landscape is constantly changing, with new and increasingly sophisticated attacks emerging all the time. This makes it difficult for defenders to keep up with the latest information, and conversely for practitioners to sift through that flood of data to identify what’s relevant to them and take action. Operational toil: People working in security operations or DevOps roles often spend a significant amount of time on repetitive manual tasks that could be automated or assisted. This leads to overload and takes away time from more strategic activities. Excessive focus on minutiae also prevents analysts and engineers from seeing the bigger picture that is key to securing their organizations. Talent shortage: There is a shortage of skilled security professionals, making it difficult for organizations to find the people they need to protect their data and systems. Often, people enter security-focused roles without much training and with little spare time to expand their skills on the job. We envision a world where novices and security experts alike are paired with AI expertise to free themselves from repetition and toil, accomplish tasks that seem impossible to us today, and provide new opportunities to share knowledge. Our vision of the SecLM API is to provide a ‘one-stop shop’ for getting answers to security questions, regardless of their level of complexity. Freshness: The model should be able to access the latest threat and vulnerability data, which changes on a daily basis. Due to its cost and duration (often days), retraining the model on a daily or hourly basis to incorporate the latest data is not a feasible approach. User-specific data: The model should be able to operate on the user’s own security data within the user’s environment without the risk of exposing that sensitive data to others or the infrastructure provider. This rules out any centralized training on user data. Security expertise: The model should be able to understand high-level security concepts and terminology, and break them into manageable pieces that are useful when solving the problem. For instance, decomposing a high-level attack strategy (e.g., lateral movement) into its constituent components for search or detection. User-specific data: The model should be able to reason about the provided security data in a multi-step fashion by combining different data sources, techniques, and specialized models to solve security problems. Given the diversity of downstream tasks that are expected of the model, evaluating its performance can be a challenging exercise, particularly when some categories of tasks may experience inherent trade-offs. For this reason, the fine-tuned model is evaluated using a number of complementary methods. Several of our downstream tasks, such as malware classification and certain types of simple security-focused question answering, can be framed as classification problems and a standard battery of classification metrics can be used to concretely quantify the performance on those tasks. For other, less quantifiable tasks, we can leverage a set of golden responses that we can use to calculate similaritybased metrics (e.g., ROUGE,2 BLEU,3 BERTScore4), but we can also compare across models using automated side-by-side preference evaluations using a separate (oftentimes larger) LLM. Finally, given the highly technical nature of security problems and the importance of accuracy in our tasks, we rely on expert human evaluators to score outputs using a Likert scale and side-by-side preference evaluation. Taken together, these metrics provide us with the guidance needed to ensure our fine-tuning training has improved overall model quality, and help us direct future changes in model training. Medical question-answering (QA) has always been a grand challenge in artificial intelligence (AI). The vast and ever-evolving nature of medical knowledge, combined with the need for accurate and nuanced reasoning, has made it difficult for AI systems to achieve human-level performance on medical QA tasks. Our first version of Med-PaLM, described in a preprint in late 2022 and published in Nature in July 2023,7 was the first AI system to exceed the passing mark on US Medical License Exam (USMLE)-style questions.8 The study also evaluated long-form answers and described a comprehensive evaluation framework. In March 2023, Med-PaLM 2 was announced and described in a preprint.9 It demonstrated rapid advancements, both on USMLE-style questions and on long-form answers. MedPaLM 2 achieves an accuracy of 86.5% on USMLE-style questions, a 19% leap over our own results from Med-PaLM. As evaluated by physicians, the model's long-form answers to consumer medical questions improved substantially compared to earlier versions of Med-PaLM or the underlying non-medically tuned base models. It also demonstrated how fine-tuning and related techniques can truly harness the power of LLMs in a domainspecific way. Cybersecurity: The ever-evolving landscape of cyber threats demands innovative solutions. SecLM, an LLM designed for cybersecurity, acts as a force multiplier for security professionals by intelligently processing vast amounts of data. This empowers them to analyze and respond to threats more effectively. The vision for SecLM is to create a comprehensive platform that caters to the diverse needs of security practitioners, regardless of their expertise. The combination of LLMs and human expertise has the potential to revolutionize the field of cybersecurity, achieving superior results with less effort. Healthcare: Healthcare data is increasing in quantity and complexity, leading to a need for innovative solutions to render medical information more helpful, useful, and accessible. MedLM, a family of models fine-tuned for the healthcare industry, can help unlock knowledge and make medicine more effective. MedLM is built on Med-PaLM, an LLM developed for medical applications. Med-PaLM has demonstrated expert-level performance in medical question-and-answering tasks. This achievement is just the first step in a journey towards improving health outcomes through the utilization of GenAI. The key takeaway from this research is that technology alone is not enough. Collaboration with the clinical community and careful multi-step evaluations are crucial for successful application of LLMs in healthcare. Going forward, vertical-specific models like the MedLM foundation models are expected to yield even better results for specific applications of interest, furthering the potential of AI in healthcare. %% end annotations %% %% Import Date: 2024-11-16T11:58:56.363-07:00 %%