Home
International Journal of Science and Research Archive
International, Peer reviewed, Open access Journal ISSN Approved Journal No. 2582-8185

Main navigation

  • Home
    • Journal Information
    • Abstracting and Indexing
    • Editorial Board Members
    • Reviewer Panel
    • Journal Policies
    • IJSRA CrossMark Policy
    • Publication Ethics
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Current Issue
    • Issue in Progress
    • Past Issues
    • Become a Reviewer panel member
    • Join as Editorial Board Member
  • Contact us
  • Downloads

ISSN Approved Journal || eISSN: 2582-8185 || CODEN: IJSRO2 || Impact Factor 8.2 || Google Scholar and CrossRef Indexed

Fast Publication within 48 hours || Low Article Processing Charges || Peer Reviewed and Referred Journal || Free Certificate

Research and review articles are invited for publication in January 2026 (Volume 18, Issue 1)

Efficient LLM Self-Hosting using Adapters and VLLM Deployment

Breadcrumb

  • Home
  • Efficient LLM Self-Hosting using Adapters and VLLM Deployment

Shanmugaraja Krishnasamy Venugopal *

Carleton University, Ottawa ON, Canada.

Review Article

International Journal of Science and Research Archive, 2025, 17(02), 532-538

Article DOI: 10.30574/ijsra.2025.17.2.3052

DOI url: https://doi.org/10.30574/ijsra.2025.17.2.3052

Received on 24 September 2025; revised on 10 November 2025; accepted on 13 November 2025

The diffusion of large language model (LLM) applications has created the necessity to discover more effective, scale-abstract, and cost-effective ways of implementation. The customization and privacy that is being brought about by the former centralized APIs dependency is cost-constrained, and hence the utilization of self-hosted solutions. In this paper, the author explains how the implementation of the use of adapter-based fine-tuning can be included into the deployment system state-of-the-art like vLLM, an open-source high-performance LLM inference engine, to self-host an LLM in an efficient manner. The paper explores the newly developed orchestration tool, the emission-sensitive customization, the best practice of LLMOps, the multiplexing of the resources, the quantification and on-site implementation, and the abstraction of the middleware. As observed in the paper, the modular and energy-efficient and performance-optimised deployments have been practicable through the provision of comparative analysis, architecture diagram, and empirical calculation of the cost. The review is a reference to the probability of possessing self-hosted democratized access to the capabilities of the LLM with the monumental influence on the control, sustainability, and efficiency of the operations. 

The desired keywords will include the following: self-hosting LLM, adapter-based fine-tuning, deploying vLLM, effective inference.

LLM Self-Hosting; Adapter-Based Fine-Tuning; vLLM Deployment; Efficient Inference

https://journalijsra.com/sites/default/files/fulltext_pdf/IJSRA-2025-3052.pdf

Preview Article PDF

Shanmugaraja Krishnasamy Venugopal. Efficient LLM Self-Hosting using Adapters and VLLM Deployment. International Journal of Science and Research Archive, 2025, 17(02), 532-538. Article DOI: https://doi.org/10.30574/ijsra.2025.17.2.3052.

Copyright © 2025 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0

For Authors: Fast Publication of Research and Review Papers


ISSN Approved Journal publication within 48 hrs in minimum fees USD 35, Impact Factor 8.2


 Submit Paper Online     Google Scholar Indexing Peer Review Process

Footer menu

  • Contact

Copyright © 2026 International Journal of Science and Research Archive - All rights reserved

Developed & Designed by VS Infosolution