Prompt-Injection Detection and Defense in LLM-Integrated Knowledge Management Systems

Stanley Beck; Zhi Gao; Milos Russell; Anders Miles

Authors

Stanley Beck Department of Computer Science, Colorado State University, Fort Collins, CO, USA.
Zhi Gao Department of Computer Science, Binghamton University, Binghamton, NY, USA.
Milos Russell Department of Computer Science, University of New Hampshire, Durham, NH, USA.
Anders Miles Department of Computer Science, George Mason University, Fairfax, VA, USA.

Keywords:

prompt injection, large language models, knowledge management systems, adversarial attacks, system security, defense mechanisms, socio-technical infrastructure

Abstract

The integration of large language models into organizational knowledge management systems has introduced significant productivity gains alongside novel security vulnerabilities, most notably prompt-injection attacks. These attacks exploit the instruction-following capacity of LLMs to subvert system behavior, potentially leading to data exfiltration, unauthorized access, or propagation of misinformation across corporate knowledge bases. This paper presents a comprehensive system-level analysis of detection and defense strategies for prompt-injection threats within LLM-integrated knowledge management architectures. We examine the structural trade-offs between the flexibility required for open-ended query processing and the constraints necessary for secure operation. We propose a layered defense framework that encompasses input sanitization, runtime monitoring, output verification, and governance policies, while critically assessing the limitations of each layer. The paper also explores how architectural choices, such as retrieval-augmented generation pipelines, context window management, and role-based access controls, influence attack surfaces and defensive efficacy. Through cross-domain comparisons with traditional injection attacks in database and web systems, we highlight the unique challenges posed by the semantic nature of prompt manipulations. We further discuss infrastructural sustainability, fairness implications in access control, and the policy landscape for deploying LLMs in sensitive knowledge environments. The analysis concludes that no single defense mechanism is sufficient; instead, a resilient system must combine technical controls with organizational governance, continuous monitoring, and adaptive threat models. This work provides a roadmap for researchers and practitioners seeking to build robust knowledge management systems that can leverage LLM capabilities without compromising security.

References

1. Alavi, M., & Leidner, D. E. (2001). Knowledge management and knowledge management systems: Conceptual foundations and research issues. MIS Quarterly, 25(1), 107-136.

2. Perez, E., Huang, S., Song, F., Cai, T., Ring, R., Aslanides, J., Glaese, A., McAleese, N., & Irving, G. (2022). Red teaming language models with language models. arXiv preprint arXiv:2202.03286.

3. Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not what you've signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security (pp. 79-90).

4. Liu, Y., Deng, G., Xu, Z., Li, Y., Zheng, Y., Zhang, Y., Zhao, L., Zhang, T., & Liu, Y. (2023). Jailbreaking ChatGPT via prompt engineering: An empirical study. arXiv preprint arXiv:2305.13860.

5. Wei, A., Haghtalab, N., & Steinhardt, J. (2023). Jailbroken: How does LLM safety training fail? In Advances in Neural Information Processing Systems (NeurIPS 2023).

6. Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T., Song, D., Erlingsson, Ú., & others. (2021). Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21).

7. Wallace, E., Feng, S., Kandpal, N., Gardner, M., & Singh, S. (2019). Universal adversarial triggers for attacking and analyzing NLP. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP).

8. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., & others. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems (NeurIPS 2020).

9. Schulhoff, S., Pinto, J., Khan, S., & Bhardwaj, A. (2023). Ignore this title and HackAPrompt: Exposing systemic vulnerabilities of LLMs through a global prompt injection competition. arXiv preprint arXiv:2311.16119.

10. Qian, J., Wang, H., Li, S., & He, Y. (2023). Tool-augmented language models: A survey. arXiv preprint arXiv:2305.10601.

11. Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., & Hashimoto, T. (2023). Exploiting programmatic behavior of LLMs: Dual-use through prompt injection. In 32nd USENIX Security Symposium (USENIX Security 23).

12. Piet, J., Alrashed, M., Sitawarin, C., & Wagner, D. (2023). Jatmo: Prompt injection defense by task-specific finetuning. arXiv preprint arXiv:2312.08054.

13. Dilmaghani, S., Brust, M. R., Danoy, G., & Bouvry, P. (2023). Security and privacy challenges of large language models: A survey. arXiv preprint arXiv:2310.16022.

14. Chang, Y., Narang, P., Suzuki, H., & Akamai, T. (2023). A survey on evaluation of large language models. ACM Computing Surveys, 56(4), 1-45.

15. Huang, J., Chen, Z., & Xiao, Y. (2023). Prompt injection attack detection and defense for large language models: A survey. arXiv preprint arXiv:2311.07214.

16. Staab, S., & Studer, R. (2010). Knowledge management and knowledge engineering. In S. Staab & R. Studer (Eds.), Handbook on ontologies (pp. 3-20). Springer.

17. Garg, S., & Choudhury, M. (2023). Evaluating the security of enterprise LLM deployments: A case study. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS).

18. Brundage, M., Avin, S., Wang, J., Belfield, H., Krueger, G., Hadfield, G., Khlaaf, H., Yang, J., Toner, H., Fong, R., & others. (2020). Toward trustworthy AI development: Mechanisms for supporting verifiable claims. arXiv preprint arXiv:2004.07213.

19. Zhang, Y., & Li, X. (2023). Defending against prompt injection with meta-prompting. In Findings of the Association for Computational Linguistics: ACL 2023.

20. Cohen, J. E., & Nissenbaum, H. (2021). AI and the law: From liability to governance. Harvard Journal of Law & Technology, 34(2), 467-530.

21. Muller, M., & Boyd, D. (2023). Fairness and accountability in knowledge management systems with LLMs. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT).

22. Crawford, K., & Calo, R. (2016). There is a blind spot in AI research. Nature, 538(7625), 311-313.

23. Floridi, L., & Cowls, J. (2022). A unified framework of five principles for AI in society. In The 2019 Yearbook of the Digital Ethics Lab (pp. 71-86). Springer.

24. European Commission. (2021). Proposal for a regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). COM(2021) 206 final.

Prompt-Injection Detection and Defense in LLM-Integrated Knowledge Management Systems

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Current Issue

Information

Make a Submission

Journal Information

Indexing & Infrastructure