IJEM Vol. 13, No. 6, 8 Dec. 2023
Cover page and Table of Contents: PDF (size: 643KB)
Full Text (PDF, 643KB), PP.53-61
Views: 0 Downloads: 0
Semantic segmentation, Autonomous driving, UNet, road scenes, VGG19, ResNet101, EfficientNetb7
Semantic segmentation is an essential tool for autonomous vehicles to comprehend their surroundings. Due to the need for both effectiveness and efficiency, semantic segmentation for autonomous driving is a difficult task. Present-day models’ appealing performances typically come at the cost of extensive computations, which are unacceptable for self-driving vehicles. Deep learning has recently demonstrated significant performance improvements in terms of accuracy. Hence, this work compares U-Net architectures such as UNet-VGG19, UNet-ResNet101, and UNet-EfficientNetb7, combining the effectiveness of compound-scaled VGG19, ResNet101, and EfficientNetb7 as the encoders for feature extraction. And, U-Net decoder is used for regenerating the fine-grained segmentation map. Combining both low-level spatial information and high-level feature information allows for precise segmentation. Our research involves extensive experimentation on diverse datasets, including the CamVid (Cambridge-driving Labeled Video Database) and Cityscapes (a comprehensive road scene understanding dataset). By implementing the UNet-EfficientNetb7 architecture, we achieved notable mean Intersection over Union (mIoU) values of 0.8128 and 0.8659 for the CamVid and Cityscapes datasets, respectively. These results outshine alternative contemporary techniques, underscoring the superior precision and effectiveness of the UNet-EfficientNetb7 model. This study contributes to the field by addressing the crucial challenge of efficient yet accurate semantic segmentation for autonomous driving, offering insights into a model that effectively balances performance and computational demands.
Anagha K J, Sabeena Beevi K, "Advancing Road Scene Semantic Segmentation with UNet-EfficientNetb7", International Journal of Engineering and Manufacturing (IJEM), Vol.13, No.6, pp. 53-61, 2023. DOI:10.5815/ijem.2023.06.05
[1]Baheti, B., Innani, S., Gajre, S., Talbar, S. (2020). Eff-unet: A novel architecture for semantic segmentation in unstructured environment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 358-359).
[2]Baheti, B., Gajre, S., Talbar, S. (2019, October). Semantic scene understanding in unstructured environment with deep convolutional neural network. In TENCON 2019-2019 IEEE Region 10 Conference (TENCON) (pp. 790-795). IEEE.
[3]Simonyan, K., Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556..
[4]He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[5]Tan, M., Le, Q. (2019, May). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR.
[6]Ronneberger, O., Fischer, P., Brox, T. (2015, October). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234- 241). Springer, Cham.
[7]Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., ... Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213-3223).
[8]Menze, M., Geiger, A. (2015). Object scene flow for autonomous vehicles. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3061-3070).
[9]Brostow, G. J., Fauqueur, J., Cipolla, R. (2009). Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters, 30(2), 88-97.
[10]Long, J., Shelhamer, E., Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440).
[11]Noh, H., Hong, S., Han, B. (2015). Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision (pp. 1520- 1528).
[12]Badrinarayanan, V., Kendall, A., Cipolla, R. (2017). Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12), 2481-2495.
[13]Yu, F., Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122.
[14]Yu, F., Koltun, V., Funkhouser, T. (2017). Dilated residual networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 472-480).
[15]Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A. L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062.
[16]Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4), 834-848.
[17]Chen, L. C., Papandreou, G., Schroff, F., Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587.
[18]Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 801- 818
[19]Romera, E., Alvarez, J. M., Bergasa, L. M., Arroyo, R. (2017). Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 19(1), 263-272.
[20]Liu, W., Rabinovich, A., Berg, A. C. (2015). Parsenet: Looking wider to see better. arXiv preprint arXiv:1506.04579.
[21]Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2881-2890).
[22]Paszke, A., Chaurasia, A., Kim, S., Culurciello, E. (2016). Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147.
[23]Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J. (2018). Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the European conference on computer vision (ECCV) (pp. 405-420).
[24]Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510-4520).