2010
Journal article
Open Access
A comparison of perceptually-based metrics for objective evaluation of geometry processing
Lavoue G, Corsini MRecent advances in 3D graphics technologies have led to an increasing use of processing techniques on 3D meshes, such as filtering, compression, watermarking, simplification, deformation and so forth. Since these processes may modify the visual appearance of the 3D objects, several metrics have been introduced to properly drive or evaluate them, from classic geometric ones such as Hausdorff distance, to more complex perceptually-based measures. This paper presents a survey on existing perceptuallybased metrics for visual impairment of 3D objects and provides an extensive comparison between them. In particular, different scenarios which correspond to different perceptual and cognitive mechanisms are analyzed. The objective is twofold: (1) catching the behavior of existing measures to help Perception researchers for designing new 3D metrics and (2) providing a comparison between them to inform and help Computer Graphics researchers for choosing the most accurate tool for the design and the evaluation of their mesh processing algorithms.Source: IEEE TRANSACTIONS ON MULTIMEDIA, vol. 12 (issue 7), pp. 636-649
DOI: 10.1109/tmm.2010.2060475Metrics:
See at:
IEEE Transactions on Multimedia
| IEEE Transactions on Multimedia
| CNR IRIS
| CNR IRIS
2006
Software
Metadata Only Access
HPTMBrowser
Corsini MIl pacchetto software permette di visualizzare PTM (polynomial Texture Maps) ad alta risoluzione in rete.
See at:
CNR IRIS
2006
Software
Metadata Only Access
PhotoTuner
Massimiliano CorsiniIl prodotto software permette di calibrare il colore di un set di immagini digitali utilizzando la tavola di MacBeth.
See at:
CNR IRIS
2013
Journal article
Open Access
Perceptual metrics for static and dynamic triangle meshes
Corsini M, Larabi Mc, Lavoué G, Petrík O, Vása L, Wang KAlmost all mesh processing procedures cause some more or less visible changes in the appearance of objects represented by polygonal meshes. In many cases, such as mesh watermarking, simplification or lossy compression, the objective is to make the change in appearance negligible, or as small as possible, given some other constraints. Measuring the amount of distortion requires taking into account the final purpose of the data. In many applications, the final consumer of the data is a human observer, and therefore the perceptibility of the introduced appearance change by a human observer should be the criterion that is taken into account when designing and configuring the processing algorithms. In this review, we discuss the existing comparison metrics for static and dynamic (animated) triangle meshes. We describe the concepts used in perception-oriented metrics used for 2D image comparison, and we show how these concepts are employed in existing 3D mesh metrics. We describe the character of subjective data used for evaluation of mesh metrics and provide comparison results identifying the advantages and drawbacks of each method. Finally, we also discuss employing the perception-correlated metrics in perception-oriented mesh processing algorithms.Source: COMPUTER GRAPHICS FORUM (ONLINE), vol. 32 (issue 101), p. 125
DOI: 10.1111/cgf.12001Metrics:
See at:
Computer Graphics Forum
| Computer Graphics Forum
| Hyper Article en Ligne
| Hyper Article en Ligne
| CNR IRIS
| CNR IRIS
| onlinelibrary.wiley.com
2014
Conference article
Restricted
The common implementation framework as service - Towards novel applications for streamlined presentation of 3D content on the Web
Aderhold A, Wilkosinska K, Corsini M, Jung Y, Graf H, Kuijper AWe solve a standing issue of the recently published Common Implementation Framework (CIF) for Online Virtual Museums: programmatic access to the transcoding, optimization and template rendering infrastructure of the CIF. We propose a method that enables researchers and developers to build novel systems on top of the CIF infrastructure beyond its current Cultural Heritage workflow. Therefore, we introduce a way to programmatically access the powerful backend of the CIF through a universal access layer, addressable by standards like HTTP and the JSON Data Interchange Format. In order to demonstrate our approach, we present two different use cases in which the CIF pipeline is utilized as a service through the proposed resource-based access layer: a native mobile iOS application for browsing 3D model repositories realizing just-in-time optimization of large models, and a MeshLab plugin to asynchronously convert and prepare a model for the Web.DOI: 10.1007/978-3-319-07626-3_1Project(s): V-MUST.NET
Metrics:
See at:
doi.org
| CNR IRIS
| CNR IRIS
2021
Journal article
Open Access
Multimodal attention networks for low-level vision-and-language navigation
Landi F, Baraldi L, Cornia M, Corsini M, Cucchiara RVision-and-Language Navigation (VLN) is a challenging task in which an agent needs to follow a language-specified path to reach a target destination. The goal gets even harder as the actions available to the agent get simpler and move towards low-level, atomic interactions with the environment. This setting takes the name of low-level VLN. In this paper, we strive for the creation of an agent able to tackle three key issues: multi-modality, long-term dependencies, and adaptability towards different locomotive settings. To that end, we devise "Perceive, Transform, and Act" (PTA): a fully-attentive VLN architecture that leaves the recurrent approach behind and the first Transformer-like architecture incorporating three different modalities -- natural language, images, and low-level actions for the agent control. In particular, we adopt an early fusion strategy to merge lingual and visual information efficiently in our encoder. We then propose to refine the decoding phase with a late fusion extension between the agent's history of actions and the perceptual modalities. We experimentally validate our model on two datasets: PTA achieves promising results in low-level VLN on R2R and achieves good performance in the recently proposed R4R benchmark.Source: COMPUTER VISION AND IMAGE UNDERSTANDING, vol. 210
DOI: 10.1016/j.cviu.2021.103255Metrics:
See at:
CNR IRIS
| ISTI Repository
| www.sciencedirect.com
| CNR IRIS
| CNR IRIS
2021
Conference article
Open Access
Watch your strokes: improving handwritten text recognition with deformable convolutions
Cojocaru I., Cascianelli S., Baraldi L., Corsini M., Cucchiara R.Handwritten Text Recognition (HTR) in free-layout pages is a valuable yet challenging task which aims to automatically understand handwritten texts. State-of-the-art approaches in this field usually encode input images with Convolutional Neural Networks, whose kernels are typically defined on a fixed grid and focus on all input pixels independently. However, this is in contrast with the sparse nature of handwritten pages, in which only pixels representing the ink of the writing are useful for the recognition task. Furthermore, the standard convolution operator is not explicitly designed to take into account the great variability in shape, scale, and orientation of handwritten characters. To overcome these limitations, we investigate the use of deformable convolutions for handwriting recognition. This type of convolution deform the convolution kernel according to the content of the neighborhood, and can therefore be more adaptable to geometric variations and other deformations of the text. Experiments conducted on the IAM and RIMES datasets demonstrate that the use of deformable convolutions is a promising direction for the design of novel architectures for handwritten text recognition.DOI: 10.1109/icpr48806.2021.9412392Metrics:
See at:
IRIS UNIMORE - Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia
| IRIS UNIMORE - Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia
| iris.unimore.it
| doi.org
| CNR IRIS
| ieeexplore.ieee.org
| CNR IRIS
| iris.unimore.it
2020
Journal article
Open Access
Explaining digital humanities by aligning images and textual descriptions
Cornia M., Stefanini M., Baraldi L., Corsini M., Cucchiara R.Replicating the human ability to connect Vision and Language has recently been gaining a lot of attention in the Computer Vision and the Natural Language Processing communities. This research effort has resulted in algorithms that can retrieve images from textual descriptions and vice versa, when realistic images and sentences with simple semantics are employed and when paired training data is provided. In this paper, we go beyond these limitations and tackle the design of visual-semantic algorithms in the domain of the Digital Humanities. This setting not only advertises more complex visual and semantic structures but also features a significant lack of training data which makes the use of fully-supervised approaches infeasible. With this aim, we propose a joint visual-semantic embedding that can automatically align illustrations and textual elements without paired supervision. This is achieved by transferring the knowledge learned on ordinary visual-semantic datasets to the artistic domain. Experiments, performed on two datasets specifically designed for this domain, validate the proposed strategies and quantify the domain shift between natural images and artworks.Source: PATTERN RECOGNITION LETTERS, vol. 129, pp. 166-172
DOI: 10.1016/j.patrec.2019.11.018Metrics:
See at:
IRIS UNIMORE - Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia
| IRIS UNIMORE - Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia
| CNR IRIS
| www.sciencedirect.com
| Pattern Recognition Letters
| CNR IRIS
| CNR IRIS
| Pattern Recognition Letters
2019
Conference article
Open Access
Artpedia: a new visual-semantic dataset with visual and contextual sentences in the artistic domain
Stefanini M., Cornia M., Baraldi L., Corsini M., Cucchiara R.As vision and language techniques are widely applied to realistic images, there is a growing interest in designing visual-semantic models suitable for more complex and challenging scenarios. In this paper, we address the problem of cross-modal retrieval of images and sentences coming from the artistic domain. To this aim, we collect and manually annotate the Artpedia dataset that contains paintings and textual sentences describing both the visual content of the paintings and other contextual information. Thus, the problem is not only to match images and sentences, but also to identify which sentences actually describe the visual content of a given image. To this end, we devise a visual-semantic model that jointly addresses these two challenges by exploiting the latent alignment between visual and textual chunks. Experimental evaluations, obtained by comparing our model to different baselines, demonstrate the effectiveness of our solution and highlight the challenges of the proposed dataset. The Artpedia dataset is publicly available at: http://aimagelab.ing.unimore.it/artpedia.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 11752, pp. 729-740. Trento, Italy, 9-13 September, 2019
DOI: 10.1007/978-3-030-30645-8_66Metrics:
See at:
IRIS UNIMORE - Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia
| IRIS UNIMORE - Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia
| iris.unimore.it
| doi.org
| CNR IRIS
| link.springer.com
| link.springer.com
2019
Conference article
Metadata Only Access
Embodied vision-and-language navigation with dynamic convolutional filters
Landi F., Baraldi L., Corsini M., Cucchiara R.In Vision-and-Language Navigation (VLN), an embodied agent needs to reach a target destination with the only guidance of a natural language instruction. To explore the environment and progress towards the target location, the agent must perform a series of low-level actions, such as rotate, before stepping ahead. In this paper, we propose to exploit dynamic convolutional filters to encode the visual information and the lingual description in an efficient way. Differently from some previous works that abstract from the agent perspective and use high-level navigation spaces, we design a policy which decodes the information provided by dynamic convolution into a series of low-level, agent friendly actions. Results show that our model exploiting dynamic filters performs better than other architectures with traditional convolution, being the new state of the art for embodied VLN in the low-level action space. Additionally, we attempt to categorize recent work on VLN depending on their architectural choices and distinguish two main groups: we call them low-level actions and high-level actions models. To the best of our knowledge, we are the first to propose this analysis and categorization for VLN.
See at:
CNR IRIS
| researchr.org
2009
Journal article
Restricted
eNVyMyCar: a multiplayer car racing game for teaching computer graphics
Ganovelli F, Corsini MThe development of a computer game is widely used as a way of conveying concepts regarding Computer Science. There are several reasons for this: it stimulates creativity, it provides an immediate sense of achievement (when the code works), it typically covers all the aspects of an introductory course, and it is easy to find ideas just by looking around and finding stimulation from one's environment and from fellow students. In this paper we present eNVyMyCar, a framework for the collaborative/competitive development of a computer game, and report the experience of its use in two Computer Graphics courses held in 2007. We developed a multiplayer car racing game where the student's task is just to implement the rendering of the scene, while all the other aspects, communication and synchronization are implemented in the framework and are transparent to the developer. The innovative feature of our framework is that all on-line users can see the views produced by their fellow students. This motivates students to improve their work by comparing it with other students and picking up ideas from them. It also gives students an opportunity to show off to their classmates.Source: COMPUTER GRAPHICS FORUM (PRINT), vol. 28 (issue 8), pp. 2025-2032
DOI: 10.1111/j.1467-8659.2009.01425.xMetrics:
See at:
Computer Graphics Forum
| CNR IRIS
| CNR IRIS
| onlinelibrary.wiley.com
2007
Conference article
Restricted
CENOBIUM Cultural Electronic Network Online: Binding Up Interoperably Usable Multimedia
Baracchini C, Callieri M, Corsini M, Dellepiane M, Dercks U, Keultjes D, Montani C, Scognamiglio M, Scopigno R, Sigismondi R, Wolf GCENOBIUM project is a multimedia presentation of Romanesque cloister capitals from the Mediterranean region. High-resolution digital photographs, 3-D models, and panoramas will virtually link the capitals to their original surroundings, thus representing them within their original architectural and conceptual contexts. The cloister of Monreale is the starting point of this project, which combines classical and innovative methods of Art History with the latest in multimedia data technology. The paper describes the different acquisition and documentation; it also outlines the main components of the system which will allow the user to virtually explore the cloister.
See at:
CNR IRIS
| CNR IRIS
2008
Conference article
Restricted
EnVyMyCar: a multi-player car racing game for teaching computer graphics
Ganovelli F, Corsini MThe development of a computer game is widely used as a means to convey Computer Sciences concepts. There are several reasons for that: it is stimulates creativity, it provides an immediate sense of achievement when the code works, it typically covers all the aspects of an introductory course, it is easy to find ideas just looking around. In this paper we present NVMC (EnVy My Car), a framework for collaborative/competitive development of a computer game and report the experience in using it in two computer graphics courses held in the year 2007 by the authors. We developed a multiplayer car racing game where the student is only asked to implement the rendering of the scene, while all the other aspects, communication and synchronization are implemented in the framework and transparent to the developer. The novelty of our framework is that all the clients on-line are able to see the views provided by the other clients, which serves to motivate the students to improve their work by comparing it with the other clients, as a means to pick up ideas from the others and finally to show off with their classmates.DOI: 10.2312/eged.20081000Metrics:
See at:
diglib.eg.org
| CNR IRIS
| CNR IRIS
2013
Other
Restricted
Completing sparse reconstruction in few strokes
Baldacci A, Bernabei D, Ganovelli F, Corsini MWe present a novel interactive framework for creating complete and accurate 3D models starting from low-quality results of multi-view stereo matching 3D reconstruction techniques. Our framework is motivated by the fact that even state-of-the-art solutions may provide very poor results, for example noise, missing parts, holes, in certain conditions. For example, litte overlap between images, bad lighting conditions, moving occluders, homoge- neous appearance, are some of the conditions that prevent those algorithms to obtain accurate and reliable reconstruction. In this paper we propose a frame- work that allows to take one such reconstruction and turn it into a complete model by requiring limited user interaction. The framework is based on two novel techniques that are the main contrbution of this work. Ths first is a multi-view segmentation algorithm that enables the user to select a region on a single image and propagate such selection jointly to the other images and the corresponding 3D points. The second is a GPU based algorithm for the reconstruction of smooth surfaces from multiple views that may incorporate user given hints on the type of surface. We show how the proposed framework may be resolutive in several situation where state-of-the-art method perform poorly or fail altogether.
See at:
CNR IRIS
| CNR IRIS
2014
Conference article
Open Access
Painting with Bob: assisted creativity for novices
Benedetti L, Winnemoeller H, Corsini M, Scopigno RCurrent digital painting tools are primarily targeted at professionals and are often overwhelmingly complex for use by novices. At the same time, simpler tools may not invoke the user creatively, or are limited to plain styles that lack visual sophistication. There are many people who are not art professionals, yet would like to partake in digital creative expression. Challenges and rewards for novices differ greatly from those for professionals. In this paper, we leverage existing works in Creativity and Creativity Support Tools (CST) to formulate design goals specifically for digital art creation tools for novices. We implemented these goals within a digital painting system, called Painting with Bob. We evaluate the efficacy of the design and our prototype with a user study, and we find that users are highly satisfied with the user experience, as well as the paintings created with our system.DOI: 10.1145/2642918.2647415Metrics:
See at:
University of Bath's research portal
| dl.acm.org
| doi.org
| CNR IRIS
| CNR IRIS
| CNR IRIS