International Journal of Image, Graphics and Signal Processing(IJIGSP)

ISSN: 2074-9074 (Print), ISSN: 2074-9082 (Online)

Published By: MECS Press

IJIGSP Vol.14, No.4, Aug. 2022

Methodology for Translation of Video Content Activates into Text Description: Three Object Activities Action

Full Text (PDF, 472KB), PP.58-69

Views:3   Downloads:0


Ramesh M. Kagalkar

Index Terms

SVM classification, Computer vision, Gaussian filtering technique, SIFT features.


This paper presents a natural language text description from video content activities. Here it analyzes the content of any video to identify the number of objects in that video content, what actions and activities are going on has to track and match the action model then based on that generate the grammatical correct text description in English is discussed. It uses two approaches, training, and testing. In the training, we need to maintain a database i.e. subject-verb and object are assigned to extract features of images, and the second approach called testing will automatically generate text descriptions from video content. The implemented system will translate complex video contents into text descriptions and by the duration of a one-minute video with three different object considerations. For this evaluation, a standard DB of YouTube is considered where 250 samples from 50 different domains. The overall system gives an accuracy of 93%.

Cite This Paper

Ramesh M. Kagalkar, "Methodology for Translation of Video Content Activates into Text Description: Three Object Activities Action", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.14, No.4, pp. 58-69, 2022. DOI: 10.5815/ijigsp.2022.04.05


[1]Longfei Qin, Palaiahnakote Shivakumara, Tong Lu, Umapada Pal and Chew Lim Tan, “Video Scene Text Frames Categorization for Text Detection and Recognition”, 23rd International Conference on Pattern Recognition (ICPR), México, December 4-8, 2016.

[2]Jun Xu,Tao Mei, Ting Yao and Yong Rui, “MSR-VTT:ALargeVideoDescriptionDatasetforBridgingVideoandLanguage”,   IEEE Conference on Computer Vision and Pattern Recognition, 2016.

[3]Subhashini Venugopalan Marcus Rohrbach ,”Translating Videos to Natural Language Using Deep Recurrent Neural Networks” Annual Conference of the North American Chapter of the ACL, pages 1494–1504, June 2015.

[4]Haiqiang Wang, LoannisKatsavounidis, Jiantong Zhou, Jeonghoon Park, Shawmin Lei, “Videoset: A Large –Scale Video Quality Dataset Based on JND Measurement”, Elsevier 2017.

[5]G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, Siming Li, Y. Choi, A. C. Berg, and Tamara L. Berg, “Baby Talk: Understanding and Generating Simple Image Descriptions”, IEEE Transaction on pattern analysis and machine intelligence, vol. 35, no. 12, Dec 2013. 

[6]N. Krishnamoorthy, G. Malkarnenkar, R. Mooney, K. Saenko, and S. Guadarrama, “Generating Natural-Language Video Descriptions Using Text-Mined Knowledge”, 2013 

[7]Laptev, I., and Perez, P., “Retrieving Actions in Movies”, In Proceedings of the 11th IEEE International Conference on Computer Vision (ICCV), pp. 1–8, 2007. 

[8]Laptev, I. Marszalek, M. Schmid, C. and Rozenfeld, B., “Learning Realistic Human Actions from Movies”, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1–8, 2008. 

[9]Edke, Vandana D., and Ramesh M. Kagalkar. "Review Paper on Video Content Analysis into Text Description." International Journal of Computer Applications National Conference on Advances in Computing. 2015.

[10]Kagalkar R.M., Khot P., Bhaumik R., Potdar S., Maruf D. (2020) SVM Based Approach to Text Description from Video Sceneries. In: Jyothi S., Mamatha D., Satapathy S., Raju K., Favorskaya M. (eds) Advances in Computational and Bio-Engineering. CBE 2019. Learning and Analytics in Intelligent Systems, vol 15. Springer, Cham.

[11]D. Edke, Vandana, M. Kagalkar, Ramesh Video Object Description of Short Videos in Hindi Text Language, International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 12, Number 2 (2016), pp. 103-116 © Research India Publications.

[12]Wankhede, V., Kagalkar, R.M. Conference Paper, ”Efficient approach for complex video description into English text”, Proceedings of 2017 International Conference on Intelligent Computing and Control, I2C2 2017, 2018, 2018-January, pp. 1–7.

[13]G. Cheung, E. Magli, Y. Tanaka, and M. K. Ng, “Graph Spectral Image Processing,” Proc. of the IEEE, vol. 106, no. 5, pp. 907–930, 2018.

[14]M. Otani, Y. Nakahima, E. Rahtu, and J. Heikkil¨a, “Rethinking the Evaluation of Video Summaries,” in 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2019.

[15]T. Doan, J. Monteiro, I. Albuquerque, B. Mazoure, A. Durand,J. Pineau, and D. Hjelm, “On-line Adaptative Curriculum Learning for GANs,” in Proc. of 2019, AAAI Conf. on Artificial Intelligence, March , 2019.

[16]T. Liu, Q. Meng, A. Vlontzos, J. Tan, D. Rueckert, and B. Kainz, “Ultrasound video summarization using deep reinforcement learning,” in Medical Image Computing and Computer Assisted Intervention – MICCAI 2020, A. L. Martel, P. Abolmaesumi, D. Stoyanov, D. Mateus, M. A. Zuluaga, S. K. Zhou, D. Racoceanu, and L. Joskowicz, Eds. Cham: Springer International Publishing, pp. 483–492, 2020.

[17]L. Zhang, M. Wang, M. Liu, and D. Zhang, “A survey on deep learning for neuroimaging-based brain disorder analysis,” Frontiers in Neuroscience, vol. 14, pp. 779, 2020.

[18]J. Gao, P. Li, Z. Chen, and J. Zhang, “A survey on deep learning for multimodal data fusion,” Neural Computation, vol. 32, no. 5, pp. 829–864, 2020.

[19]J. Li, A. Sun, J. Han, and C. Li, “A survey on deep learning for named entity recognition,” IEEE Trans. on Knowledge and Data Engineering, pp. 1–1, 2020.

[20]S. Grigorescu, B. Trasnea, T. Cocias, and G. Macesanu, “A survey of deep learning techniques for autonomous driving,” Journal of Field Robotics, vol. 37, no. 3, pp. 362–386, 2020. [Online]. Available:

[21]C. Huang and H. Wang, “A Novel Key-Frames Selection Framework for Comprehensive Video Summarization,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 30, no. 2, pp. 577–589, 2020.

[22]E. Apostolidis, E. Adamantidou, A. I. Metsai, V. Mezaris, and I. Patras, “Performance over Random: A Robust Evaluation Protocol for Video Summarization Methods,” in Proc. of the 28th ACM Int. Conf. on Multimedia (MM ’20). New York, NY, USA: ACM, pp 1056–1064, 2020.

[23]C. Collyda, K. Apostolidis, E. Apostolidis, E. Adamantidou, A. I. Metsai, and V. Mezaris, “A Web Service for Video Summarization,” in ACM Int. Conf. on Interactive Media Experiences (IMX ’20). New York, NY, USA: ACM, pp 148–153, 2020.

[24]J.-H. Huang and M. Worring, “Query-controllable video  summarization,” in Proc. of the 2020 Int. Conf. on Multimedia Retrieval, ser. ICMR ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 242–250. [Online]. Available:

[25]Y. Tanaka, Y. C. Eldar, A. Ortega, and G. Cheung, “Sampling Signals on Graphs: From Theory to Applications,” IEEE Signal Processing Magazine, vol. 37, no. 6, pp. 14–30, 2020.

[26]P. Soviany, C. Ardei, R. T. Ionescu, and M. Leordeanu, “Image Difficulty Curriculum for Generative Adversarial Networks (CuGAN),” in 2020 IEEE Winter Conf. on Applications of Computer Vision (WACV), 2020, pp. 3452–3461.

[27]J. H. Giraldo, S. Javed, and T. Bouwmans, “Graph Moving Object Segmentation, ”IEEE Trans. on Pattern Analysis and Machine Intelligence,pp. 1–1, 2020.

[28]P. Dixit and S. Silakari, “Deep learning algorithms for cyber security applications: A technological and status review,” Computer Science Review, vol. 39, p. 100317, 2021. [Online]. Available:

[29]Mesut. Polatgil, "Investigation of the Effect of Normalization Methods on ANFIS Success: Forestfire and Diabets Datasets", International Journal of Information Technology and Computer Science (IJITCS), Vol.14, No.1, pp.1-8, 2021. DOI: 10.5815/ijitcs.2022.01.01.

[30]Prashengit Dhar, Md. Shohelur Rahman, Zainal Abedin, "Classification of Leaf Disease Using Global and Local Features", International Journal of Information Technology and Computer Science (IJITCS), Vol.14, No.1, pp.43-57, 2022. DOI: 10.5815/ijitcs.2022.01.05.

[31]Oksana Babich, Viktor Vyshnyvskiy, Vadym Mukhin, Irina Zamaruyeva, Michail Sheleg, Yaroslav Kornaga, "The Technique of Key Text Characteristics Analysis for Mass Media Text Nature Assessment", International Journal of Modern Education and Computer Science (IJMECS), Vol.14, No.1, pp. 1-16, 2022.DOI: 10.5815/ijmecs.2022.01.01.

[32]Monika Arora, Indira Bhardwaj, "Artificial Intelligence in Collaborative Information System", International Journal of Modern Education and Computer Science (IJMECS), Vol.14, No.1, pp. 44-55, 2022.DOI: 10.5815/ijmecs.2022.01.04.

[33]Stephen Akuma, "Eye Gaze Relevance Feedback Indicators for Information Retrieval", International Journal of Intelligent Systems and Applications (IJISA), Vol.14, No.1, pp.57-65, 2022. DOI: 10.5815/ijisa.2022.01.05.