Memory-Efficient Fine-Tuning of Large Language Models for Enterprise Knowledge Automation

Ananya Natarajan; Zhen Mao; Jesse Eriksson; Arthur Martin

Authors

Ananya Natarajan Department of Computer Science and Engineering, University of Nevada, Reno, Reno, NV, USA.
Zhen Mao Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA.
Jesse Eriksson Department of Computer Science, University of Alabama at Birmingham, Birmingham, AL, USA.
Arthur Martin School of Information Technology, University of Cincinnati, Cincinnati, OH, USA.

Keywords:

Large language models, memory-efficient fine-tuning, enterprise knowledge automation, parameter-efficient adaptation, model governance, sustainable AI, retrieval-augmented generation

Abstract

Large language models have demonstrated remarkable capabilities in natural language understanding and generation, yet their prodigious memory and computational demands pose substantial barriers to cost-effective deployment in enterprise knowledge automation systems. This paper presents a comprehensive examination of memory-efficient fine-tuning strategies that enable organizations to adapt pre-trained language models to domain-specific knowledge bases while maintaining acceptable performance and operational feasibility. We systematically analyze parameter-efficient techniques, including low-rank adaptation, adapter modules, and prefix tuning, and evaluate their trade-offs in terms of memory footprint, training throughput, inference latency, and model fidelity. Beyond algorithmic considerations, we address the socio-technical dimensions of enterprise adoption, including governance frameworks for model versioning and compliance, robustness under distribution shift, fairness across diverse knowledge corpora, and the sustainability implications of reduced computational resource consumption. Architectural decisions for integrating fine-tuned models with existing enterprise data pipelines, retrieval-augmented generation, and knowledge graph infrastructures are discussed in depth. Through a synthesis of recent advances in sparse fine-tuning, quantization-aware training, and memory-optimized hardware utilization, we propose a layered deployment architecture that balances accuracy, cost, and regulatory constraints. The paper concludes with forward-looking recommendations for policy development and infrastructure design that align memory-efficient fine-tuning with the long-term goals of responsible and scalable enterprise automation.

References

1. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.

2. Patterson, D., Gonzalez, J., Le, Q. V., Liang, C., Munguia, L. M., Rothchild, D., ... & Dean, J. (2021). Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350.

3. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215.

4. Rajbhandari, S., Rasley, J., Ruwase, O., & He, Y. (2020). ZeRO: Memory optimizations toward training trillion parameter models. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 1–16.

5. Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., de Laroussilhe, Q., Gesmundo, A., ... & Gelly, S. (2019). Parameter-efficient transfer learning for NLP. Proceedings of the 36th International Conference on Machine Learning, 2790–2799.

6. European Parliament. (2016). Regulation (EU) 2016/679 of the European Parliament and of the Council (General Data Protection Regulation). Official Journal of the European Union, L119, 1–88.

7. Li, X., Sun, J., Zheng, Y., & Tu, Z. (2022). Parameter-efficient fine-tuning for continual learning of large language models. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 248–257.

8. Pfeiffer, J., Rücklé, A., Poth, C., Kamath, A., Vulić, I., Ruder, S., ... & Gurevych, I. (2020). AdapterHub: A framework for adapting transformers. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 46–54.

9. Mahabadi, R. K., Ruder, S., Dehghani, M., & Henderson, J. (2021). Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 226–236.

10. Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2022). LoRA: Low-rank adaptation of large language models. Proceedings of the International Conference on Learning Representations.

11. Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient finetuning of quantized language models. Advances in Neural Information Processing Systems, 36.

12. Guo, D., Rush, A. M., & Kim, Y. (2021). Parameter-efficient transfer learning with diff pruning. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 237–247.

13. Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI Technical Report.

14. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459–9474.

15. Pfeiffer, J., Kamath, A., Rücklé, A., Cho, K., & Gurevych, I. (2021). AdapterFusion: Non-destructive task composition for transfer learning. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 487–503.

16. Zhang, N., Ye, L., Mao, Y., Li, G., Lu, Y., & Yu, D. (2022). Knowledge graph enhanced language models for entity-level knowledge base question answering. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1244–1255.

17. Wang, B., Wang, W., Zhang, S., & Chen, W. (2023). Knowledge graph injection into large language models via adapter fine-tuning. arXiv preprint arXiv:2305.12475.

18. Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2022). LoRA: Low-rank adaptation of large language models. Proceedings of the International Conference on Learning Representations.

19. Dong, L., Wei, F., Tan, C., Tang, D., Zhou, M., & Xu, K. (2019). Adapter-based tuning for pretrained language models. arXiv preprint arXiv:1902.00751.

20. Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623.

21. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3645–3650.

22. Zafrir, O., Boudoukh, G., Itshak, I., & Wasserblat, M. (2019). Q8BERT: Quanized 8-bit BERT. arXiv preprint arXiv:1910.06188.

23. Dettmers, T., Lewis, M., Belk, Y., & Zettlemoyer, L. (2022). GPT3.int8(): 8-bit matrix multiplication for transformers at scale. Advances in Neural Information Processing Systems, 35, 30318–30332.

Memory-Efficient Fine-Tuning of Large Language Models for Enterprise Knowledge Automation

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Current Issue

Information

Make a Submission

Journal Information

Indexing & Infrastructure