When Handcrafted Features Meet Deep Features: An Empirical Study on Component-Level Image Classification

Full Text (PDF, 1476KB), PP.61-80

Views: 0 Downloads: 0

Author(s)

Tauseef Khan 1,* Ayatullah Faruk Mollah 2,3

1. School of Computer Science & Engineering, VIT-AP University, Amaravati, 522237, Andhra Pradesh, India

2. Department of Computer Science and Engineering, Aliah University, Kolkata, 700160, West Bengal, India

3. Center of New Technologies, University of Warsaw, Poland

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2024.01.05

Received: 9 Jul. 2023 / Revised: 12 Aug. 2023 / Accepted: 8 Sep. 2023 / Published: 8 Feb. 2024

Index Terms

Object recognition, competent handcrafted features, deep features, text non-text classification, text detection

Abstract

Scene text detection from natural images has been a prime focus from last few decades. Classification of foreground object components is an essential task in many scene text detection approaches under uncontrollable environment. As it heavily relies upon robust and discriminating features, several features have been engineered for component-level text non-text classification. Competency of such feature descriptors particularly in respect of deep features needs to be examined. In this paper, we present prospective feature descriptors applicable to component-level text non-text classification and examine their performance along with convolutional neural network based deep features. Series of experiments have been carried out on publicly available benchmark dataset(s) of multi-script document-type, scene-type, and combined text vs. non-text components. Interestingly, feature combination is found to put well-demonstrated deep features into tough competition on most datasets under consideration. For instance, on the combined text non-text classification problem, CNN based deep features yield 97.6%, whereas aggregated features produce an accuracy of 98.4%. Similar findings are obtained on other experiments as well. Along with the quantitative figures, results have been analyzed and insightful discussion is made to ascertain the conjectures drawn herein. This study may cater the need of leveraging potentially strong handcrafted feature descriptors.

Cite This Paper

Tauseef Khan, Ayatullah Faruk Mollah, "When Handcrafted Features Meet Deep Features: An Empirical Study on Component-Level Image Classification", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.16, No.1, pp. 61-80, 2024. DOI:10.5815/ijigsp.2024.01.05

Reference

[1]H. I. Koo, D. H. Kim “Scene text detection via connected component clustering and nontext filtering.” IEEE Transactions on Image Processing, vol. 22, no. 6, pp. 2296-2305, IEEE, 2013.
[2]Y. F. Pan, X. Hou, C. L. Liu “A hybrid approach to detect and localize texts in natural scene images.” IEEE Transactions on Image Processing, vol. 20, no. 3, pp. 800-813, IEEE, 2011.
[3]X. Chen, A. L. Yuille “Detecting and reading text in natural scenes” in proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. II-II, 2004.
[4]C. Shi, C. Wang, B. Xiao, Y. Zhang, S. Gao “Scene text detection using graph model built upon maximally stable extremal regions.” Pattern Recognition Letters, vol. 34, no. 2, pp. 107-116, , Elsevier, 2013.
[5]R.M. Haralick, K. Shanmugam “Textural features for image classification.” IEEE Transactions on Systems, Man, and Cybernetics, vol. 3, no. 6, pp. 610-621, IEEE, 1973.
[6]N. Dalal, B. Triggs “Histograms of oriented gradients for human detection” in proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 886-893, IEEE, 2005.
[7]R. Minetto, N. Thome, M. Cord, N. J. Leite, J. Stolfi “T-HOG: An effective gradient-based descriptor for single line text regions.” Pattern Recognition, vol. 46, no. 3, pp. 1078-1090, 2013.
[8]S. Tian et al. “Multilingual scene character recognition with co-occurrence of histogram of oriented gradients.” Pattern Recognition, vol. 51, pp. 125-134, Elsevier, 2016.
[9]T. Ojala, M. Pietikäinen, D. Harwood “A comparative study of texture measures with classification based on featured distributions.” Pattern Recognition, vol. 29, no. 1, pp. 51-59, Elsevier, 1996.
[10]W. Huang, Z. Lin, J. Yang, J. Wang “Text localization in natural images using stroke feature transform and text covariance descriptors” in proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1241-1248, IEEE.
[11]C.P. Sumathi, G.G. Devi “Automatic text extraction from complex colored images using gamma correction method.” Journal of Computer Science, vol. 10, no. 4, pp.705-715, 2014.
[12]P. Shivakumara, T.Q. Phan, C.L. Tan “A robust wavelet transform based technique for video text detection” in proceedings of the 10th International Conference on Document Analysis and Recognition, pp. 1285-1289, IEEE, 2009.
[13]H. Li, D. Doermann, O. Kia “Automatic text detection and tracking in digital video.” IEEE Transactions on Image Processing, vol. 9, no. 1, pp. 147-156, Elsevier, 2000.
[14]B. Epshtein, E. Ofek, Y. Wexler “Detecting text in natural scenes with stroke width transform” in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2963-2970, IEEE, 2010.
[15]Y. Zhao, T. Lu, W. Liao “A robust color-independent text detection method from complex videos” in proceedings of International Conference on Document Analysis and Recognition, 2011, pp. 374-378, IEEE.
[16]K. Subramanian, P. Natarajan, M. Decerbo, D. Castanon “Character-stroke detection for text-localization and extraction” in proceedings of the 9th International Conference on Document Analysis and Recognition, pp. 33-37, IEEE, 2007.
[17]S. Bhowmik, R. Sarkar, M. Nasipuri, D. Doermann “Text and non-text separation in offline document images: A Survey.” International Journal on Document Analysis and Recognition, vol.21, no. 1-2, pp.1-20, IEEE, 2018.
[18]C. Yu, Y. Song, Y. Zhang “Scene text localization using edge analysis and feature pool.” Neurocomputing, vol. 175, part. A, pp. 652-661, 2016.
[19]H. Cho, M. Sung, B. Jun “Canny text detector: Fast and robust scene text localization algorithm” in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3566-3573, IEEE, 2016.
[20]X. Huang “Automatic video scene text detection based on saliency edge map.” Multimedia Tools and Applications, vol. 78, no. 24, pp. 34819-34838, Springer, 2019.
[21]S. Lu, T. Chen, S. Tian, J. H. Lim, C. L. Tan “Scene text extraction based on edges and support vector regression.” International Journal on Document Analysis and Recognition, vol. 18, no. 2, pp. 125-135, 2015.
[22]P. Shivakumara, T. Q. Phan, S. Lu, C. L. Tan “Gradient vector flow and grouping-based method for arbitrarily oriented scene text detection in video images.” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 10, pp. 1729-1739, 2013.
[23]X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, J. Liang “EAST: an efficient and accurate scene text detector” in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2642-2651, 2017.
[24]J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, X. Xue “Arbitrary-oriented scene text detection via rotation proposals.” IEEE Transactions on Multimedia, vol. 20, no. 11, pp. 3111-3122, 2018.
[25]Y. Dai, Z. Huang, Y. Gao, Y. Xu, K. Chen, J. Guo, W. Qiu “Fused text segmentation networks for multi-oriented scene text detection” in proceedings of the 24th International Conference on Pattern Recognition, pp. 3604-3609, IEEE, 2018.
[26]S. Prasad, A. W. K. Kong “Using object information for spotting text” in proceedings of the European Conference on Computer Vision, pp. 559-576, Springer, 2018.
[27]Q. Ye, D. Doermann “Text detection and recognition in imagery: A survey.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no.7, IEEE, pp. 1480-1500, 2015.
[28]L. Neumann, J. Matas “Real-time scene text localization and recognition” in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3538-3545, IEEE, 2012.
[29]X. Liu, G. Meng, C. Pan “Scene text detection and recognition with advances in deep learning: a survey.” International Journal on Document Analysis and Recognition, vol. 22, no. 2, pp. 143-162, 2019.
[30]J. Greenhalgh, M. Mirmehdi “Real-time detection and recognition of road traffic signs.” IEEE Transactions on Intelligent Transportation Systems, vol. 13, no. 4, IEEE, pp. 1498-1506, 2012.
[31]A. F. Mollah, S. Basu, M. Nasipuri “Text detection from camera captured images using a novel fuzzy-based technique” in proceedings of the 3rd International Conference on Emerging Applications of Information Technology, pp. 291-294, IEEE, 2012.
[32]T. Khan, A.F. Mollah “Distance transform-based stroke feature descriptor for text non-text classification” In Recent Developments in Machine Learning and Data Analytics, pp. 189-200, Springer, 2018.
[33]J. Canny “A computational approach to edge detection.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pp. 679-698, 1986.
[34]X. Bai, B. Shi, C. Zhang, X. Cai, L. Qi “Text/non-text image classification in the wild with convolutional neural networks.” Pattern Recognition, vol. 66, pp. 437-446, 2017.
[35]M. Zhao, R.Q. Wang, F. Yin, X.Y. Zhang, L.L. Huang, J.M. Ogier “Fast text/non-text image classification with knowledge distillation” in proceedings of the International Conference on Document Analysis and Recognition, pp. 1458-1463, IEEE, 2019.
[36]L. Liu, S. Lao, P.W. Fieguth, Y. Guo, X. Wang, M. Pietikäinen “Median robust extended local binary pattern for texture classification.” IEEE Transactions on Image Processing, vol. 25, no. 3, pp.1368-1381, 2016.
[37]M. D. Ansari, S. P. Ghrera “Intuitionistic fuzzy local binary pattern for features extraction.” International Journal of Information and Communication Technology, vol. 13, no. 1, pp. 83-98, 2018.
[38]L. Liu, P. Fieguth, Y. Guo, X. Wang, M. Pietikäinen “Local binary features for texture classification: taxonomy and experimental study.” Pattern Recognition, vol. 62, pp. 135-160, Elsevier, 2017.
[39]T.L. da Silveira, A. J. Kozakevicius, C. R. Rodrigues “Single-channel EEG sleep stage classification based on a streamlined set of statistical features in wavelet domain.” Medical & Biological Engineering & Computing, vol. 55, no. 2, pp. 343-352, 2017.
[40]C. Yan, H. Xie, S. Liu, J. Yin, Y. Zhang, Q. Dai “Effective Uyghur language text detection in complex background images for traffic prompt identification.” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 1, pp. 220-229, 2018.
[41]T. Kasar, A.G. Ramakrishnan “Multi-script and multi-oriented text localization from scene images” in proceedings of the International Workshop on Camera-Based Document Analysis and Recognition, pp. 1-14, Springer, 2011.
[42]H. Goto, M. Tanaka “Text-tracking wearable camera system for the blind” in proceedings of the 10th International Conference on Document Analysis and Recognition, pp. 141-145, IEEE, 2009.
[43]I. Sobel, G. Feldman “A 3x3 isotropic gradient operator for image processing.” A talk at the Stanford Artificial Project, pp. 271-272, 1968.
[44]Y. Wu, P. Natarajan “Self-organized text detection with minimal post-processing via border learning” in proceedings of the IEEE International Conference on Computer Vision, pp. 5010-5019, IEEE, 2017.
[45]D. He, H. Xie, S. Liu, J. Yin, Y. Zhang, Q. Dai “Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild” in proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3519-3528, 2017.
[46]F. Khan, M. A. Tahir, F. Khelifi, A. Bouridane, R. Almotaeryi “Robust off-line text independent writer identification using bagged discrete cosine transform features.” Expert Systems with Applications, vol. 71, pp. 404-415, 2017.
[47]T. Khan, A.F. Mollah “AUTNT-A component level dataset for text non-text classification and benchmarking with novel script invariant feature descriptors and D-CNN.” Multimedia Tools and Applications, vol. 78, no. 22, pp. 32159-32186, 2019.
[48]Y. LeCun, L. Bottou, Y. Bengio, P. Haffner “Gradient-based learning applied to document recognition” in proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[49]K. Simonyan, A. Zisserman “Very deep convolutional networks for large-scale image recognition” in proceedings of the 3rd International Conference on Learning Representations, pp. 1-14, San Diego, USA, 2015.
[50]A. Krizhevsky, I. Sutskever, G.E. Hinton “ImageNet classification with deep convolutional neural networks.” Advances in Neural Information Processing Systems, pp. 1097-1105, 2012.
[51]C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich “Going deeper with convolutions” in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.
[52]K. He, X. Zhang, S. Ren, J. Sun “Deep residual learning for image recognition” in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
[53]A. Sherstinsky “Fundamentals of Recurrent Neural Network (RNN) and Long Short-term Memory (LSTM) network” Physica D Nonlinear Phenomena, vol. 404, no. 8, pp. 1-43, 2020.
[54]G.E. Hinton “Deep belief networks.” Scholarpedia, vol. 4, no. 5, pp. 5947, 2009.
[55]H. Liu, A. Guo, D. Jiang, Y. Hu, B. Ren “PuzzleNet: Scene Text Detection by Segment Context Graph Learning.” arXiv preprint arXiv:2002.11371, 2020.
[56]C. Ma, L. Sun, Z. Zhong, Q. Huo “ReLaText: Exploiting Visual Relationships for Arbitrary-Shaped Scene Text Detection with Graph Convolutional Networks.” arXiv preprint arXiv:2003.06999, 2020.
[57]Z. Tian, M. Shu, P. Lyu, R. Li, C. Zhou, X. Shen, J. Jia “Learning shape-aware embedding for scene text detection” in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4234-4243, 2019.
[58]X. Wang, Y. Jiang, Z. Luo, C. L. Liu, H. Choi, S. Kim “Arbitrary shape scene text detection with adaptive text region representation” in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6449- 6458, 2019.
[59]T. Kobchaisawat, T. H. Chalidabhongse, S. I. Satoh “Scene Text Detection with Polygon Offsetting and Border Augmentation.” Electronics, vol. 9, no.1, pp. 117-132, 2020.
[60]P. Yang, G. Yang, X. Gong, P. Wu, X. Han, J. Wu, C. Chen “Instance Segmentation Network with Self-Distillation for Scene Text Detection.” IEEE Access, vol. 8, pp. 45825-45836, 2020.
[61]T. Khan, A.F. Mollah “Text non-text classification based on area occupancy of equidistant pixels.” Procedia Computer Science, vol. 167, pp. 1889-1900, Elsevier, 2020.
[62]J.J. Lee, P.H. Lee, S.W. Lee, A. Yuille, C. Koch “Adaboost for text detection in natural scene” in proceedings of the International Conference on Document Analysis and Recognition, pp. 429-434, 2011.
[63]A. Coates, B. Carpenter, C. Case, S. Satheesh, B. Suresh, T. Wang, D. J. Wu, A.Y. Ng “Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning” in proceedings of the IEEE International Conference on Document Analysis and Recognition, pp. 440-445, 2011.
[64]K. Wang, B. Babenko, S. Belongie “End-to-end scene text recognition” in proceedings of the IEEE International Conference on Computer Vision, pp. 1457-1464, 2011.
[65]T. Khan, A.F. Mollah “A novel text localization scheme for camera captured document images” in proceedings of 2nd International Conference on Computer Vision & Image Processing, pp. 253-264, IIT Roorkee, Springer, 2018.
[66]C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu “Detecting texts of arbitrary orientations in natural images” in proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083-1090, IEEE, 2012.
[67]V.P. Le, N. Nayef, M. Visani, J. M. Ogier, C. De Tran “Text and non-text segmentation based on connected component features” in proceedings of the 13th International Conference on Document Analysis and Recognition, pp. 1096-1100, IEEE, 2015.
[68]Aliah University Text Non-text dataset, https://github.com/iilabau/AUTNTdataset
[69]S. Dey, P. Shivakumara, K. S. Raghunandan, U. Pal, T. Lu, G. H. Kumar, C.S. Chan “Script independent approach for multi-oriented text detection in scene image.” Neurocomputing, vol. 242, pp. 96-112, 2017.
[70]A. Sain, A.K. Bhunia, P.P. Roy, U. Pal “Multi-oriented text detection and verification in video frames and scene images.” Neurocomputing, vol. 275, pp. 1531-1549, 2018.
[71]GLCM Properties. https://in.mathworks.com/help/images/ref/graycoprops.html. Accessed December 21, 2022.
[72]H. Breu, J. Gil, D. Kirkpatrick, M. Werman “Linear time Euclidean distance transform algorithms.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 5, pp. 529-533, 1995.
[73]T. Khan, R. Sarkar, A.F. Mollah “Deep learning approaches to scene text detection: a comprehensive review.” Artificial Intelligence Review, vol. 54, no. 5, pp. 3239-3298, 2021.
[74]T. Khan, A.F. Mollah “A Novel Multi-scale Deep Neural Framework for Script Invariant Text Detection.” Neural Processing Letters, vol. 54, no. 2, pp. 1371-1397, 2022.
[75]J. Xiao, G. Wu “A robust and compact descriptor based on Center-symmetric LBP” in proceedings of the International Conference on Image and Graphics, pp. 388-393, IEEE, 2011.
[76]T. Ojala, M. Pietikainen, T. Maenpaa “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971-987, 2002.
[77]M. Heikkila, M. Pietikainen “A texture-based method for modeling the background and detecting moving objects” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 657-662, 2006.
[78]B. Zahran, J. Al-Azzeh, Z. Alqadi, M. A. Al-zoghoul, S. Khawatreh “A Modified LBP Method to Extract Features from Color Images. Journal of Theoretical & Applied Information Technology, vol. 96, no. 10, 2018.
[79]B. Vishnyakov, V. Gorbatsevich, S. Sidyakin, Y. Vizilter, I. Malin, A. Egorov “Fast moving objects detection using iLBP background model” International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 40, no. 3, pp. 347-350, 2014.
[80]Tan, X. and Triggs, B “Enhanced local texture feature sets for face recognition under difficult lighting conditions” IEEE Transactions on Image Processing, vol. 19, no. 6, pp. 1635-1650, 2010.
[81]A. Chahi, Y. Ruichek, R. Touahni “Local directional ternary pattern: A new texture descriptor for texture classification” Computer vision and image understanding, vol. 169, pp.14-27, 2018.
[82]T. Ojala, M. Pietikäinen, T. Mäenpää “Gray scale and rotation invariant texture classification with local binary patterns” in Proceedings of the 6th European Conference on Computer Vision, pp. 404-420, Ireland, Springer Berlin Heidelberg, 2000.
[83]J. Ma, X. Jiang, A. Fan, J. Jiang, J. Yan “Image matching from handcrafted to deep features: A survey” International Journal of Computer Vision, vol. 129, pp. 23-79, 2021.
[84]S.E. Bekhouche, F. Dornaika, A. Benlamoudi, A. Ouafi, A. Taleb-Ahmed “A comparative study of human facial age estimation: handcrafted features vs. deep features” Multimedia Tools and Applications, vol. 79, pp. 26605-26622, 2020.