Upper limb segmentation in egocentric vision is a challenging and nearly unexplored task that extends the well-known hand localization problem and can be crucial for a realistic representation of users’ limbs in immersive and interactive environments, such as VR/MR applications designed for web browsers that are a general-purpose solution suitable for any device. Existing hand and arm segmentation approaches require a large amount of well-annotated data. Then different annotation techniques were designed, and several datasets were created. Such datasets are often limited to synthetic and semi-synthetic data that do not include the whole limb and differ significantly from real data, leading to poor performance in many realistic cases. To overcome the limitations of previous methods and the challenges inherent in both egocentric vision and segmentation, we trained several segmentation networks based on the state-of-the-art DeepLabv3+ model, collecting a large-scale comprehensive dataset. It consists of 46 thousand real-life and well-labeled RGB images with a great variety of skin colors, clothes, occlusions, and lighting conditions. In particular, we carefully selected the best data from existing datasets and added our EgoCam dataset, which includes new images with accurate labels. Finally, we extensively evaluated the trained networks in unconstrained real-world environments to find the best model configuration for this task, achieving promising and remarkable results in diverse scenarios. The code, the collected egocentric upper limb segmentation dataset, and a video demo of our work will be available on the project page1.
Volumetric segmentation of medical images is an essential tool in treatment planning and many longitudinal studies. While machine learning approaches promise to fully automate it, they most often still depend on manually labeled training data. We thus present a GPU-based volumetric region growing approach for semi-automatic brain tumor segmentation that can be interactively tuned. Additionally, we propose multidimensional transfer functions for ray tracing that allow users to judge the quality of the grown region. Our implementation produces a full brain tumor segmentation within a few milliseconds on consumer hardware. The visualization uses adaptive resolution scaling and progressive, asynchronous shading computation to maintain a stable 60 Hz refresh rate.
Navigating through a virtual environment is one of the major user tasks in the 3D web. Although hundreds of interaction techniques have been proposed to navigate through 3D scenes in desktop, mobile and VR headset systems, 3D navigation still poses a high entry barrier for many potential users. In this paper we discuss the design and implementation of a test platform to facilitate the creation and fine-tuning of interaction techniques for 3D navigation. We support the most common navigation metaphors (walking, flying, teleportation). The key idea is to let developers specify, at runtime, the exact mapping between user actions and virtual camera changes, for any of the supported metaphors. We demonstrate through many examples how this method can be used to adapt the navigation techniques to various people including persons with no previous 3D navigation skills, elderly people, and people with disabilities.
Extended reality (XR) collaboration enables collaboration between physical and virtual spaces. Recent XR collaboration studies have focused on sharing and understanding the overall situation of the objects of interest (OOIs) and its surrounding ambient objects (AOs) rather than simply recognizing the existence of OOI. The sharing of the overall situation is achieved using three-dimensional (3D) models that replicate objects existing in the physical workspace. There are two approaches for creating the models: pre-reconstruction and real-time reconstruction. The pre-reconstruction approach takes considerable time to create polygon meshes precisely, and the real-time reconstruction approach requires a considerable time to install numerous sensors to perform accurate 3D scanning. In addition, these approaches are difficult to be used on the collaboration in a location beyond the reconstructed space, making them impractical to an actual XR collaboration. The approach proposed in this study separates the objects that form the physical workspace into OOI and AO, models only the OOI as a polygon mesh in advance, and reconstructs the AO into a point cloud using light detection and ranging technology for collaboration. The reconstructed point cloud is shared with remote collaborators through WebRTC, a web-based peer-to-peer networking technology with low latency. Each remote collaborator collects the delivered point cloud to form a virtual space, so that they can intuitively understand the situation at a local site. Because our approach does not create polygon meshes for all objects existing at the local site, we can save time to prepare for collaboration. In addition, we can improve the practicality of XR collaboration by eliminating the need to install numerous sensors at the local site. We introduce a prototype and an example scenario to demonstrate the practicality of our approach.
Combinations of immersive media and graphic portrayals can enable the human subjective sense of Presence. This paper collects our experiences and evaluations from six projects that use Extensible 3D (X3D) interactive graphics to deliver spatial experiences across the WWW. X3D enables the combination of spherical panoramas with 3D models and maps to visually transport users to a specific real location at a specific time. Remote users have access to these worlds through a Web-browser or other immersive device; local users in a CAVE can collaborate with natural physical gestures;. We reflect on the graphical and interactive requirements of these projects and provide guidance for future applications. In the face of physical lock-downs and distancing due to the CoVID pandemic, such platforms illustrate the opportunities and challenges in the design and delivery of spatial visualizations, especially for remote learning.
The Extensible 3D (X3D) Graphics Architecture ISO/IEC 19775-1 International Standard version 4.0 is a major release that includes extensive support for version 2.0 of glTF (glTF2), a standard file format for 3D scenes, and models supporting geometry, appearance, a scene-graph hierarchical structure and model animation. X3D version 4 (X3D4) authors have the option to Inline (i.e. load) glTF models directly or else utilize native X3D nodes to create corresponding 3D models. Physically-based rendering (PBR) and non-photorealistic rendering (NPR), with corresponding lighting techniques, are each important additions to the X3D4 rendering model. These lighting models are compatible and can coexist in real-time with legacy X3D3 and VRML97 models (which utilize classic Phong shading and texturing) either independently or composed together into a single scene. The X3D4 approach to representing glTF2 characteristics is defined in complete detail in order to be functionally identical, thus having the potential to achieve the greatest possible interoperability of 3D models across the World Wide Web. Nevertheless, a persistent problem remains in that glTF renderers do not always appear to produce visually identical results. Best practices for mapping glTF structures to X3D4 nodes are emerging to facilitate consistent conversion. This paper describes a variety of techniques to help confirm rendering consistency of X3D4 players, both when loading glTF assets and when representing native X3D4 conversions. Development of an X3D4 conformance archive corresponding to the publicly available glTF examples archive is expected to reinforce the development of visually correct software renderers capable of identical X3D4 and glTF presentation.
New standards such as WebXR enable cross-platform VR experiences, relying on the ubiquity of the modern web browser. However, upon measuring performance of WebXR scenes, we found users can suffer from high latency while waiting for all 3D objects appear in their field-of-view. This is because storage and fetching of 3D objects in WebXR (and its underlying WebGL libraries) are agnostic to the user’s orientation and location, leading to latency issues. Specifically, fetching of texture files in arbitrary order results in 3D objects waiting on their texture dependencies, and the storage of all objects’ geometry data in one large file blocks individual objects from rendering even if their texture dependencies are satisfied. To address these issues, we propose a systematic prioritization of which 3D objects and their dependencies should be fetched first, based on the user’s position and orientation in the VR scene. To improve efficiency, the geometry data belonging to each 3D object are optimally grouped together to minimize the average latency. Our experiments with various WebXR scenes under different network conditions show that our scheme can significantly reduce the time to all 3D objects appearing in the user’s field-of-view, by up to 50%, compared the default WebXR behavior.
Modeling of sound propagation is very important for virtual acoustics and virtual reality applications. For that reason, highly advanced computer models for the prediction of sound fields in rooms have already been proposed. However, most of them are complex and require a skilled acoustician to use them effectively. In other words, there is a need for simpler models that are accurate and quick to use, but most importantly should provide a simpler syntax with a hierarchical structure that makes it easy to map user actions to audio parameters. This paper aims to integrating acoustic properties associated with geometric shapes together with 3D spatial sound in the X3D Graphics standard, by exploiting the structure and functionality of Web Audio API, using the audio graph model in order to simplify the development of audio applications. Examples for evaluation lead to useful conclusions and areas for future model development.
Deviations from the original blueprints could often be found on building sites. To easily identify such constructional modifications, this paper proposes a smart digital twin for construction site monitoring that supports web-based collaboration with external experts to discuss possibly required solutions to this problem. Therefore, a system pipeline was designed that allows automatic detection and visualization of deviations between real and planning data. In addition, the pipeline allows generating an interactive difference visualization that uses the original planning data as well as the 3D scanned real data. The latter is recorded using an AR-capable mobile device or headset like the Microsoft HoloLens, whereas the mentioned differences are obtained via a point-cloud-based algorithm. The differences between real and planning data are then emphasized on the respective client, which uses X3DOM for visualization. Besides the server part of the pipeline, the application consists of several clients that enable a collaborative discussion between the various stakeholders, such as an AR-based mobile viewer for the craftsman on site and a web-based viewer for the building owner at home or the architect.
Art and culture, at their best, lie in the act of discovery and exploration. This paper describes Resurrect3D, an open visualization platform for both casual users and domain experts to explore cultural artifacts. To that end, Resurrect3D takes two steps. First, it provides an interactive cultural heritage toolbox, providing not only commonly used tools in cultural heritage such as relighting and material editing, but also the ability for users to create an interactive “story”: a saved session with annotations and visualizations others can later replay. Second, Resurrect3D exposes a set of programming interfaces to extend the toolbox. Domain experts can develop custom tools that perform artifact-specific visualization and analysis.
The WebXR Device API allows the creation of web browser-based eXtended Reality (XR) applications, i.e. Virtual Reality (VR) and Augmented Reality (AR) applications, by providing access to input and output capabilities from AR and VR devices. WebXR applications can be experienced through the WebXR supported browsers of standalone and PC-connected VR headsets, AR headsets, mobile devices with or without headsets, and personal computers or desktops. WebXR has been growing in popularity since its introduction due to the several benefits it promises such as allowing creators to utilise WebGL's rich development ecosystem; create cross-platform and future proof XR applications; and create applications that can be experienced in VR or AR with minimal code changes. However, research has not been conducted to understand the experiences of WebXR creators. To address this gap, we conducted a qualitative study that involved interviews with 11 WebXR creators with diverse backgrounds aimed at understanding their experiences, and in this paper, we present 8 key challenges reported by these creators.
A UNESCO World Heritage Site, the 4th-Century CE Villa Romana del Casale near Piazza Armerina, Sicily contains the largest collection of mosaics in the Roman world. However, due to accessibility issues (e.g., remote location, weak online presence), the Villa remains nearly unknown in comparison to popular sites like Pompeii, despite its cultural importance. VILLAE, a collaboration between archaeologists, classicists, and game designers at the University of South Florida and the University of Arkansas, aims to build academic and public engagement with the Villa through a serious game played directly online using WebGL. Addressing the issues of accuracy in 3D reconstruction versus digital embodiment and meaningful game play, this paper outlines the project's pipeline for synthesizing the extensive 3D documentation of the site to create the digital prototype for an immersive narrative that unfolds the Villa's history against the development of modern archaeology in Italy and focuses the human story and professional life of a pioneering female archaeologist, Ersilia Caetani-Lovatelli.
The recent diffusion of new formats of archaeological publication systems results from ad hoc solutions conceived and realized for serving specific communication needs. These new formats are customized for facilitating the fruition of specific archaeological information and can be tailored and structured for hosting datasets and results. Unlike standard publications, they do not represent the final result of the investigation but rather a dynamic public space in which different interpretations can be revised and formulated. A review of the available online publication systems for Culture Heritage, with a focus on the publication systems the author worked with, can help define their potential advantages and flaws and steer the design and development of further advancements.
The Invisible Heritage – Analysis and Technology (IH-AT) project aimed to design and develop a portal comprised of reliable and efficient technology-ready tools for the visualization, documentation, and analysis of the UNESCO listed churches in the Troodos area of Cyprus applying geophysics, 3D modeling techniques, and visualization methods, supported by art-historical and archaeological research.
We present our short series of interactive animations on directional derivatives and level curves. This visualisation was developed for students of first-year mathematics courses at the Delft University of Technology. The interactive animation is an animated video that can be interacted with while it is paused or playing. The user can change, for example, the function that is plotted, drag points of interest or change the angle of the camera to explore the scene.