2024
Journal article  Open Access

Perceptual quality assessment of NeRF and neural view synthesis methods for front-facing views

Liang H., Wu T., Hanji P., Banterle F., Gao H., Mantiuk R., Oztireli C.

Image and Video Processing (eess.IV)  FOS: Electrical engineering  Electrical Engineering and Systems Science - Image and Video Processing  Image and video acquisition  electronic engineering  46 Information and Computing Sciences  Computer Vision and Pattern Recognition (cs.CV)  FOS: Computer and information sciences  4603 Computer Vision and Multimedia Computation  Image-based rendering  Perception  information engineering  Computer Science - Computer Vision and Pattern Recognition 

Neural view synthesis (NVS) is one of the most successful techniques for synthesizing free viewpoint videos, capable of achieving high fidelity from only a sparse set of captured images. This success has led to many variants of the techniques, each evaluated on a set of test views typically using image quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research on how NVS methods perform with respect to perceived video quality. We present the first study on perceptual evaluation of NVS and NeRF variants. For this study, we collected two datasets of scenes captured in a controlled lab environment as well as in-the-wild. In contrast to existing datasets, these scenes come with reference video sequences, allowing us to test for temporal artifacts and subtle distortions that are easily overlooked when viewing only static images. We measured the quality of videos synthesized by several NVS methods in a well-controlled perceptual quality assessment experiment as well as with many existing state-of-the-art image/video quality metrics. We present a detailed analysis of the results and recommendations for dataset and metric selection for NVS evaluation.

Source: COMPUTER GRAPHICS FORUM, vol. 43 (issue 2)


[1] Pontus Andersson, Jim Nilsson, Tomas Akenine-Mo¨ller, Magnus Oskarsson, Kalle A˚stro¨m, and Mark D. Fairchild. Flip: A difference evaluator for alternating images. Proc. ACM Comput. Graph. Interact. Tech., 3(2), aug 2020.
[2] Alessandro Artusi, Rafal K. Mantiuk, Thomas Richter, Pavel Korshunov, Philippe Hanhart, Touradj Ebrahimi, and Massimiliano Agostinelli. JPEG XT: A compression standard for HDR and WCG images [standards in a nutshell]. IEEE Signal Process. Mag., 33(2):118-124, 2016.
[3] Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 5460-5469. IEEE, 2022.
[4] Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470-5479, 2022.
[5] Martin Cˇad´ık, Robert Herzog, Rafał K. Mantiuk, Karol Myszkowski, and Hans-Peter Seidel. New Measurements Reveal Weaknesses of Image Quality Metrics in Evaluating Graphics Artifacts. ACM Transactions on Graphics (proc. of SIGGRAPH Asia), 31(6):147, 2012.
[6] Alexandre Chapiro, Robin Atkins, and Scott Daly. A luminance-aware model of judder perception. ACM Trans. Graph., 38(5), jul 2019.
[7] Shenchang Eric Chen and Lance Williams. View interpolation for image synthesis. In Proceedings of the 20th annual conference on Computer graphics and interactive techniques, pages 279-288, 1993.
[8] Blender Online Community. Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam, 2018.
[9] Luca De Luigi, Damiano Bolognini, Federico Domeniconi, Daniele De Gregorio, Matteo Poggi, and Luigi Di Stefano. Scannerf: a scalable benchmark for neural radiance fields. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 816-825, 2023.
[10] Gyorgy Denes and Rafa K. Mantiuk. Predicting visible lficker in temporally changing images. Electronic Imaging, 2020(11):233-1-233-8, 2020.
[11] Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. Image quality assessment: Unifying structure and texture similarity. IEEE transactions on pattern analysis and machine intelligence, 44(5):2567-2581, 2020.
[12] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
[13] Gabriel Eilertsen, Robert Wanat, Rafal K. Mantiuk, and Jonas Unger. Evaluation of tone mapping operators for hdrvideo. Comput. Graph. Forum, 32(7):275-284, 2013.
[14] John Flynn, Michael Broxton, Paul Debevec, Matthew DuVall, Graham Fyffe, Ryan Overbeck, Noah Snavely, and Richard Tucker. Deepview: View synthesis with learned gradient descent. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2367- 2376, 2019.
[15] Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 5491-5500. IEEE, 2022.
[16] Param Hanji, Rafal Mantiuk, Gabriel Eilertsen, Saghi Hajisharif, and Jonas Unger. Comparison of single image hdr reconstruction methods - the caveats of quality assessment. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1-8, 2022.
[17] Param Hanji, Fangcheng Zhong, and Rafał K Mantiuk. Noise-aware merging of high dynamic range image stacks without camera calibration. In European Conference on Computer Vision, pages 376-391. Springer, 2020.
[18] Rasmus Jensen, Anders Dahl, George Vogiatzis, Engil Tola, and Henrik Aanaes. Large scale multi-view stereopsis evaluation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 406-413. IEEE, 2014.
[19] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[20] Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG), 36(4):1-13, 2017.
[21] Patrick Ledda, Alan Chalmers, Tom Troscianko, and Helge Seetzen. Evaluation of tone mapping operators using a high dynamic range display. ACM Trans. Graph., 24(3):640-648, 2005.
[22] Zhi Li, Anne Aaron, Ioannis Katsavounidis, Anush Moorthy, and Megha Manohara. Toward a practical perceptual video quality metric, June 2016.
[23] Zhi Li, Christos Bampis, Anne Aaron Julie Novak, Kyle Swanson, Anush Moorthy, and Jan De Cock. Vmaf: The journey continues, October 2018.
[24] Hanhe Lin, Vlad Hosu, and Dietmar Saupe. KADID-10k: A Large-scale Artificially Distorted IQA Database. In 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), volume 161, pages 1-3. IEEE, jun 2019.
[25] Joe Yuchieh Lin, Tsung-Jung Liu, Eddy Chi-Hao Wu, and C.-C. Jay Kuo. A fusion-based video quality assessment (fvqa) index. In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014, Chiang Mai, Thailand, December 9-12, 2014, pages 1-5. IEEE, 2014.
[26] Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Christian Theobalt. Neural sparse voxel fields. Advances in Neural Information Processing Systems, 33:15651-15663, 2020.
[27] Rafał Mantiuk, Kil Joong Kim, Allan G Rempel, and Wolfgang Heidrich. HDR-VDP-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions. ACM Transactions on graphics (TOG), 30(4):1-14, 2011.
[28] Rafał K. Mantiuk, Gyorgy Denes, Alexandre Chapiro, Anton Kaplanyan, Gizem Rufo, Romain Bachy, Trisha Lian, and Anjul Patney. Fovvideovdp: A visible difference predictor for wide field-of-view video. ACM Trans. Graph., 40(4), jul 2021.
[29] Rafał K. Mantiuk, Gyorgy Denes, Alexandre Chapiro, Anton Kaplanyan, Gizem Rufo, Romain Bachy, Trisha Lian, and Anjul Patney. Fovvideovdp: A visible difference predictor for wide field-of-view video. ACM Trans. Graph., 40(4), jul 2021.
[30] Aliaksei Mikhailiuk, Clifford Wilmot, Maria Perez-Ortiz, Dingcheng Yue, and Rafal Mantiuk. Active sampling for pairwise comparisons via approximate message passing and information gain maximization. In 2020 IEEE International Conference on Pattern Recognition (ICPR), Jan 2021.
[31] Ben Mildenhall, Pratul P Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG), 38(4):1-14, 2019.
[32] Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
[33] Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. No-reference image quality assessment in the spatial domain. IEEE Transactions on Image Processing, 21(12):4695-4708, 2012.
[34] Anish Mittal, Rajiv Soundararajan, and Alan C. Bovik. Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters, 20(3):209-212, 2013.
[35] Christopher Z. Mooney and Robert D. Duval. Bootstrapping: A Nonparametric Approach to Statistical Inference. Sage, 1993.
[36] Manish Narwaria, Matthieu Perreira Da Silva, and Patrick Le Callet. Hdr-vqm: An objective quality measure for high dynamic range video. Signal Processing: Image Communication, 35, 05 2015.
[37] NVIDIA, Pe´ter Vingelmann, and Frank H.P. Fitzek. Cuda, release: 10.2.89, 2020.
[38] Edwin Olson. Apriltag: A robust and flexible visual fiducial system. In 2011 IEEE international conference on robotics and automation, pages 3400-3407. IEEE, 2011.
[39] Maria Perez-Ortiz and Rafal K Mantiuk. A practical guide and software for analysing pairwise comparison experiments. arXiv preprint arXiv:1712.03686, 2017.
[40] Nikolay Ponomarenko, Lina Jin, Oleg Ieremeiev, Vladimir Lukin, Karen Egiazarian, Jaakko Astola, Benoit Vozel, Kacem Chehdi, Marco Carli, Federica Battisti, and C.-C. Jay Kuo. Image database TID2013: Peculiarities, results and perspectives. Signal Processing: Image Communication, 30:57-77, jan 2015.
[41] Mar´ıa Pe´rez-Ortiz, Aliaksei Mikhailiuk, Emin Zerman, Vedad Hulusic, Giuseppe Valenzise, and Rafał K. Mantiuk. From pairwise comparisons and rating to a unified quality scale. IEEE Transactions on Image Processing, 29:1139- 1151, 2020.
[42] Johannes Lutz Scho¨nberger and Jan-Michael Frahm. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[43] Jonathan Shade, Steven Gortler, Li-wei He, and Richard Szeliski. Layered depth images. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pages 231-242, 1998.
[44] H.R. Sheikh and A.C. Bovik. Image information and visual quality. IEEE Transactions on Image Processing, 15(2):430- 444, 2006.
[45] Harry Shum and Sing Bing Kang. Review of image-based rendering techniques. In Visual Communications and Image Processing 2000, volume 4067, pages 2-13. SPIE, 2000.
[46] Rajiv Soundararajan and Alan Bovik. Video quality assessment by reduced reference spatio-temporal entropic differencing. Circuits and Systems for Video Technology, IEEE Transactions on, 23:684-694, 04 2013.
[47] Mohammed Suhail, Carlos Esteves, Leonid Sigal, and Ameesh Makadia. Light field neural rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8269-8279, 2022.
[48] Cheng Sun, Min Sun, and Hwann-Tzong Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 5449-5459. IEEE, 2022.
[49] Tijmen Tieleman and Geoffrey Hinton. Rmsprop: Divide the gradient by a running average of its recent magnitude. Coursera: Neural Networks for Machine Learning, 4(2):26- 31, 2012.
[50] Venkatanath N, Praneeth D, Maruthi Chandrasekhar Bh, Sumohana S. Channappayya, and Swarup S. Medasani. Blind image quality evaluation using perception based features. pages 1-6, 2015.
[51] Delio Vicini, Se´bastien Speierer, and Wenzel Jakob. Differentiable signed distance function rendering. Transactions on Graphics (Proceedings of SIGGRAPH), 41(4):125:1- 125:18, July 2022.
[52] Chen Wang, Angtian Wang, Junbo Li, Alan Yuille, and Cihang Xie. Benchmarking robustness in neural radiance ifelds, 2023.
[53] Peihao Wang, Xuxi Chen, Tianlong Chen, Subhashini Venugopalan, Zhangyang Wang, et al. Is attention all nerf needs? arXiv preprint arXiv:2207.13298, 2022.
[54] Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul P. Srinivasan, Howard Zhou, Jonathan T. Barron, Ricardo Martin-Brualla, Noah Snavely, and Thomas A. Funkhouser. Ibrnet: Learning multi-view image-based rendering. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 4690-4699. Computer Vision Foundation / IEEE, 2021.
[55] Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600-612, 2004.
[56] Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process., 13(4):600-612, 2004.
[57] Z. Wang, E.P. Simoncelli, and A.C. Bovik. Multiscale structural similarity for image quality assessment. In The ThritySeventh Asilomar Conference on Signals, Systems Computers, 2003, volume 2, pages 1398-1402 Vol.2, 2003.
[58] Suttisak Wizadwongsa, Pakkapon Phongthawee, Jiraphon Yenphraphai, and Supasorn Suwajanakorn. Nex: Real-time view synthesis with neural basis expansion. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 8534-8543. Computer Vision Foundation / IEEE, 2021.
[59] Yao Yao, Zixin Luo, Shiwei Li, Jingyang Zhang, Yufan Ren, Lei Zhou, Tian Fang, and Long Quan. Blendedmvs: A largescale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1790-1799, 2020.
[60] Lior Yariv, Yoni Kasten, Dror Moran, Meirav Galun, Matan Atzmon, Basri Ronen, and Yaron Lipman. Multiview neural surface reconstruction by disentangling geometry and appearance. Advances in Neural Information Processing Systems, 33, 2020.
[61] Lin Yen-Chen. Nerf-pytorch. https://github.com/ yenchenlin/nerf-pytorch/, 2020.
[62] Dingcheng Yue, Muhammad Shahzeb Khan Gul, Michel Ba¨tz, Joachim Keinert, and Rafał Mantiuk. A benchmark of light field view interpolation methods. In 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pages 1-6, 2020.
[63] Lin Zhang, Lei Zhang, Xuanqin Mou, and David Zhang. Fsim: A feature similarity index for image quality assessment. IEEE Transactions on Image Processing, 20(8):2378- 2386, 2011.
[64] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
[65] Zhengyou Zhang. A flexible new technique for camera calibration. IEEE Transactions on pattern analysis and machine intelligence, 22(11):1330-1334, 2000.
[66] Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. Stereo magnification: Learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817, 2018.

Metrics



Back to previous page
BibTeX entry
@article{oai:iris.cnr.it:20.500.14243/499665,
	title = {Perceptual quality assessment of NeRF and neural view synthesis methods for front-facing views},
	author = {Liang H. and Wu T. and Hanji P. and Banterle F. and Gao H. and Mantiuk R. and Oztireli C.},
	doi = {10.1111/cgf.15036 and 10.17863/cam.106658 and 10.48550/arxiv.2303.15206},
	year = {2024}
}

RealVision
Hyperrealistic Imaging Experience

Machine learning methods for rendering on a high-dynamic-range multi-focal-plane display
Machine learning methods for rendering on a high-dynamic-range multi-focal-plane display


OpenAIRE