Home
International Journal of Science and Research Archive
International, Peer reviewed, Open access Journal ISSN Approved Journal No. 2582-8185

Main navigation

  • Home
    • Journal Information
    • Abstracting and Indexing
    • Editorial Board Members
    • Reviewer Panel
    • Journal Policies
    • IJSRA CrossMark Policy
    • Publication Ethics
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Current Issue
    • Issue in Progress
    • Past Issues
    • Become a Reviewer panel member
    • Join as Editorial Board Member
  • Contact us
  • Downloads

ISSN Approved Journal || eISSN: 2582-8185 || CODEN: IJSRO2 || Impact Factor 8.2 || Google Scholar and CrossRef Indexed

Fast Publication within 48 hours || Low Article Processing Charges || Peer Reviewed and Referred Journal || Free Certificate

Research and review articles are invited for publication in January 2026 (Volume 18, Issue 1)

Zero-shot aerial scene classification using clip and prompt engineering

Breadcrumb

  • Home
  • Zero-shot aerial scene classification using clip and prompt engineering

Chukwudi Anthony Udemba *, Adekunle Adeoye Eludire and Ayorinde Peters Oduroye

Department of Computer Science, Caleb University, Lagos, Nigeria.

Research Article

International Journal of Science and Research Archive, 2025, 17(02), 005–013

Article DOI: 10.30574/ijsra.2025.17.2.2948

DOI url: https://doi.org/10.30574/ijsra.2025.17.2.2948

Received on 23 September 2025; revised on 28 October 2025; accepted on 31 October 2025

Traditional aerial scene classification models rely heavily on large, labeled datasets and supervised learning, which limits their ability to generalize to new or rare scene types. In this work, we explore a zero-shot approach to aerial scene understanding by leveraging Contrastive Language Image Pretraining (CLIP), a vision-language model trained on vast image-text pairs. Instead of retraining or fine-tuning the model, we use carefully designed natural language prompts to describe scene categories of interest and classify aerial images based on cosine similarity in a shared semantic embedding space. This method enables flexible and scalable scene classification without requiring additional annotation or retraining. Through prompt engineering, we introduce both generic and domain-specific textual descriptions to maximize classification accuracy. Experiments conducted on benchmark aerial datasets demonstrate that the proposed approach effectively distinguishes between complex and visually similar scenes, even in scenarios with limited or no prior class examples. This work highlights the potential of vision-language models for rapid, adaptable, and annotation-free classification in aerial surveillance applications.

Zero-shot learning; Vision-language models; Remote sensing; Semantic embedding; Large language models

https://journalijsra.com/sites/default/files/fulltext_pdf/IJSRA-2025-2948.pdf

Preview Article PDF

Chukwudi Anthony Udemba, Adekunle Adeoye Eludire and Ayorinde Peters Oduroye. Zero-shot aerial scene classification using clip and prompt engineering. International Journal of Science and Research Archive, 2025, 17(02), 005–013. Article DOI: https://doi.org/10.30574/ijsra.2025.17.2.2948.

Copyright © 2025 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0

For Authors: Fast Publication of Research and Review Papers


ISSN Approved Journal publication within 48 hrs in minimum fees USD 35, Impact Factor 8.2


 Submit Paper Online     Google Scholar Indexing Peer Review Process

Footer menu

  • Contact

Copyright © 2026 International Journal of Science and Research Archive - All rights reserved

Developed & Designed by VS Infosolution