Carleton University, Ottawa ON, Canada.
International Journal of Science and Research Archive, 2025, 17(02), 532-538
Article DOI: 10.30574/ijsra.2025.17.2.3052
Received on 24 September 2025; revised on 10 November 2025; accepted on 13 November 2025
The diffusion of large language model (LLM) applications has created the necessity to discover more effective, scale-abstract, and cost-effective ways of implementation. The customization and privacy that is being brought about by the former centralized APIs dependency is cost-constrained, and hence the utilization of self-hosted solutions. In this paper, the author explains how the implementation of the use of adapter-based fine-tuning can be included into the deployment system state-of-the-art like vLLM, an open-source high-performance LLM inference engine, to self-host an LLM in an efficient manner. The paper explores the newly developed orchestration tool, the emission-sensitive customization, the best practice of LLMOps, the multiplexing of the resources, the quantification and on-site implementation, and the abstraction of the middleware. As observed in the paper, the modular and energy-efficient and performance-optimised deployments have been practicable through the provision of comparative analysis, architecture diagram, and empirical calculation of the cost. The review is a reference to the probability of possessing self-hosted democratized access to the capabilities of the LLM with the monumental influence on the control, sustainability, and efficiency of the operations.
The desired keywords will include the following: self-hosting LLM, adapter-based fine-tuning, deploying vLLM, effective inference.
LLM Self-Hosting; Adapter-Based Fine-Tuning; vLLM Deployment; Efficient Inference
Preview Article PDF
Shanmugaraja Krishnasamy Venugopal. Efficient LLM Self-Hosting using Adapters and VLLM Deployment. International Journal of Science and Research Archive, 2025, 17(02), 532-538. Article DOI: https://doi.org/10.30574/ijsra.2025.17.2.3052.
Copyright © 2025 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0







