Efficient LLM Self-Hosting using Adapters and VLLM Deployment

Shanmugaraja Krishnasamy Venugopal

doi:10.30574/ijsra.2025.17.2.3052

Shanmugaraja Krishnasamy Venugopal ^*

Carleton University, Ottawa ON, Canada.

Review Article

International Journal of Science and Research Archive, 2025, 17(02), 532-538

Article DOI: 10.30574/ijsra.2025.17.2.3052

DOI url: https://doi.org/10.30574/ijsra.2025.17.2.3052

Publication history

Received on 24 September 2025; revised on 10 November 2025; accepted on 13 November 2025

Abstract

The diffusion of large language model (LLM) applications has created the necessity to discover more effective, scale-abstract, and cost-effective ways of implementation. The customization and privacy that is being brought about by the former centralized APIs dependency is cost-constrained, and hence the utilization of self-hosted solutions. In this paper, the author explains how the implementation of the use of adapter-based fine-tuning can be included into the deployment system state-of-the-art like vLLM, an open-source high-performance LLM inference engine, to self-host an LLM in an efficient manner. The paper explores the newly developed orchestration tool, the emission-sensitive customization, the best practice of LLMOps, the multiplexing of the resources, the quantification and on-site implementation, and the abstraction of the middleware. As observed in the paper, the modular and energy-efficient and performance-optimised deployments have been practicable through the provision of comparative analysis, architecture diagram, and empirical calculation of the cost. The review is a reference to the probability of possessing self-hosted democratized access to the capabilities of the LLM with the monumental influence on the control, sustainability, and efficiency of the operations.

The desired keywords will include the following: self-hosting LLM, adapter-based fine-tuning, deploying vLLM, effective inference.

Keywords

LLM Self-Hosting; Adapter-Based Fine-Tuning; vLLM Deployment; Efficient Inference

Download Article PDF

https://journalijsra.com/sites/default/files/fulltext_pdf/IJSRA-2025-3052.pdf

Preview Article PDF

How to cite this article

Shanmugaraja Krishnasamy Venugopal. Efficient LLM Self-Hosting using Adapters and VLLM Deployment. International Journal of Science and Research Archive, 2025, 17(02), 532-538. Article DOI: https://doi.org/10.30574/ijsra.2025.17.2.3052.

Copyright information

Efficient LLM Self-Hosting using Adapters and VLLM Deployment

Shanmugaraja Krishnasamy Venugopal *

Preview Article PDF

Shanmugaraja Krishnasamy Venugopal ^*