Building a Framework for Visual Question Answering Systems

Maya Abu Hamoud; Wasim Safi

Building a Framework for Visual Question Answering Systems [Arabic]

2025-01-23 | Volume 3 Issue 1 - Volume 3 | Research Articles | Maya Abu Hamoud | Wasim Safi

Abstract

VQA (Visual Question Answering) systems are among the latest advancements in the fields of artificial intelligence and deep learning. They integrate image processing with natural language understanding to enable intelligent systems to answer questions related to image content. The significance of these systems lies in their ability to interpret and analyze images in a manner similar to human comprehension, making them applicable to a wide range of critical fields. VQA systems represent a crucial step towards the development of advanced AI systems that bridge the gap between computer vision and human language understanding, fostering a deeper and more integrated interaction with the real world. This study aimed to thoroughly explore and analyze the methods and techniques used in visual question answering. The focus was on developing an advanced model capable of analyzing and understanding images while responding to related queries. In this paper, we developed a VQA system utilizing artificial intelligence and deep learning techniques. We employed the VGG19 model to extract image features, while questions and answers were encoded using GloVe and Label Encoding techniques. The model was trained using the MSCOCO dataset, which contains a variety of images and related questions. The model’s performance was enhanced through multiple experiments to fine-tune the training parameters. The model achieved significant accuracy compared to previous research, with an F1 Score of 44.23% for training accuracy and 42.97% for validation accuracy. The results demonstrated a slight improvement over other models that also utilized VGG19 on the same dataset. Additionally, a web platform was developed to test the system, enabling users to evaluate answer accuracy and use the model on new images or those from the dataset.

Keywords : VQA, VGG19, GLOVE,MSCOCO Dataset

(ISSN - Online)

2959-8591