This paper presents a neural network-based learning approach that enables seamless generation of 3D human motion in-between photos to accelerate the process of 3D character motion authoring. This new approach allows users to freely edit (replace, insert, or delete) input photos and specify the transition length to generate a kinematically coherent sequence of 3D human poses and shapes in-between the given photos. We demonstrate through qualitative and subjective evaluations that our approach is capable of generating high-fidelity, natural 3D pose and shape transitions.
Children’s National Hospital is a leading pediatric teaching hospital with medical students, residency programs, fellowships, and research initiatives. We develop virtual 3D patients to improve medical training by simulating plausible, life-threatening scenarios in infants and children for Pediatric Hospitalists. Medical simulation is widely used to enhance the readiness of medical professionals [Davila and Price 2023; Motola et al. 2013]. Children’s National’s Division of Pediatric Hospital Medicine’s Resuscitation Simulation Team (HRST) leads training for attending Hospitalists responsible for the inpatient care of sick and injured children across our entire regional network. Our 3D patient simulations provide realistic depictions of nuanced physical findings that are essential to identify uncommon, yet life threatening medical conditions in children. 3D models applicable for pediatric simulations are underrepresented compared to those used for adult patients.
This work proposes a motion style transfer network that transfers motion style between different motion categories using variational autoencoders. The proposed network effectively transfers style among various motion categories and can create stylized motion unseen in the dataset. The network contains a content-conditioned module to preserve the characteristic of the content motion, which is important for real applications. We implement the network with variational autoencoders, which enable us to control the intensity of the style and mix different styles to enrich the motion diversity.
We propose a method of learning a policy for human-like locomotion via deep reinforcement learning based on a human anatomical model, muscle actuation, and biologically inspired rewards, without any inherent control rules or reference motions. Our main ideas involve providing a dense reward using metabolic energy consumption at every step during the initial stages of learning and then transitioning to a sparse reward as learning progresses, and adjusting the initial posture of the human model to facilitate the exploration of locomotion. Additionally, we compared and analyzed differences in learning outcomes across various settings other than the proposed method.
Controlling agent behaviors with Reinforcement Learning is of continuing interest in multiple areas. One major focus is to simulate multi-agent crowds that avoid collisions while locomoting to their goals. Although avoiding collisions is important, it is also necessary to capture realistic anticipatory navigation behaviors. We introduce a novel methodology that includes: 1) an RL method for learning an optimal navigational policy, 2) position-based constraints for correcting policy navigational decisions, and 3) a crowd-sourcing framework for selecting policy control parameters. Based on optimally selected parameters, we train a multi-agent navigation policy, which we demonstrate on crowd benchmarks. We compare our method to existing works, and demonstrate that our approach achieves superior multi-agent behaviors.
"FORMS" is a new digital art concept that combines the fields of dance movement and machine learning techniques, specifically human pose detection, to create a real-time and interactive visual experience. This project aims to explore the relationship between dance and visual art by creating a framework that generates abstract and literal visual models from the dancers’ movements. The main objective of this project is to enhance the perception of dance movement by providing a new layer of visual composition. The proposed framework provides different visual forms based on human pose detection, creating a novel and real-time visual expression of the dance movement. The human pose detection model used in this project is based on state-of-the-art deep learning techniques, which analyze the positions and movements of different parts of the human body in real-time. This model allows the framework to capture movements of the dancers and translate them into unique visual forms. The case study showcases the potential of "FORMS" by demonstrating how professional young dancers can use the framework to enrich their performance and create new visual perceptions of dance movement. This study contributes to the cultivation of body awareness, understanding of the dance movement and overall enrichment of the art experience. The use of machine learning techniques showcases the potential of technology to enhance and expand the boundaries of artistic expression. The "FORMS" project is a novel and interdisciplinary approach that bridges the fields of art and technology, providing a new way to experience and perceive the dance movement.
Metro Re-illustrated is a project that explores incremental generation of stylized paintings of city metro maps using neural networks. It begins with an interactive system for labeling time-series data on city metro maps and generating reference images. These images are fed into a neural painter that incrementally generates oil painting-like strokes on virtual canvases. The generated paintings demonstrate blending and layering features of oil paintings while capturing the progressive nature of urban development.
Communication between humans and everyday objects is often implicit. In this paper, we create speculative scenarios where users could have a conversation, as explicit communication, with everyday objects that usually may not have conversational artificial intelligence (AI) installed, such as a book. We present a design fiction narrative and a conversational system about conversations with everyday objects. Our user test showed a positive acceptance of conversation with everyday objects, and users tried to build a human-like relationship with them.
In recent years, metaverse has received considerable attention. We believe that as this technology develops, humanity can dine in a virtual space while maintaining a sense of immersion. Therefore, we investigated whether the taste of food is influenced by the number of polygons of CG models using VR/AR technology. We created CG models and overlaid the image onto the actual food via HMD. Then the subjects consumed the food which CG image overlaid and answered a questionnaire. Results showed that the higher the number of polygons, the less hardness was perceived and the toon-like model was more likely to affect the taste.
This study proposes Bstick-Mark2, a handheld virtual reality (VR) haptic controller that monitors and controls input from five fingers in real-time with pressure sensors and linear motors. Bstick-Mark2 is designed and fabricated to enable users to use both hands along with a head-mounted display (HMD) while freely roaming based on Bluetooth technologies. When a user holds the haptic controller with fingers, the data input from five pressure sensors is transmitted to microcontroller unit (MCU) to independently control the movements of five linear motors. The pressure and position data of linear motors are sent to a computer connected to a VR display through a Bluetooth module embedded in the controller and utilized in interaction with a virtual object and virtual hand movements using the Unity game engine. Bstick-Mark2 can withstand 22 N of force per finger to maintain the pressing force of a male’s finger and is compact to enable users to easily handle using their hands. It enables to make sensations of grabbing and controlling while interacting with VR content.
Alice in Wonderland Syndrome (AIWS) is a rare perceptual disorder affecting visual processing, the perception of one's body and the experience of time. This condition can be congenital or result from various insults to the brain. There is growing interest in AIWS in providing a window into how different areas of the brain work together to construct reality. We developed a virtual reality (VR) simulation of this condition as a psychoeducational tool for students in the psychological and medical sciences and care givers to experience the different perceptual distortions common in AIWS and an opportunity to reflect on the nature of perception.
Although VR concerts offer a unique experience for audience to watch a performance, people still tend to participate in a live performance physically for the co-presence and shared experience are difficult to perceive in VR. To address this issue, we propose Actualities, a live XR performance system that integrates onsite and online concerts to create a seamless experience across multiple displays. Our system utilizes various sensors to detect signals from musical instruments and onsite audiences, digitalizing onsite performance elements into a virtual world. We project the visuals onto screens and live-stream the content for audiences to watch through various devices, and we also designed several interactive elements for the audience to interact with the public display. To evaluate our system, we conducted exploratory research to help us refine our system and improve the cross-reality experience.
In this work, we proposed a proof-of-concept system that combines real and virtual racing, allowing professional racing drivers and e-racers to compete in real time. The real race car is imported into a racing game, and the virtual game car is exported into the real world in AR. This allows e-racers to compete against professional racing drivers and gives the audience an exciting experience from the grandstand with the use of AR. Our system was deployed during an actual racing event, and we gathered initial feedback from users. Both e-racers and audience expressed excitement about our system, indicating a strong potential for this new form of mixed reality racing.
In this paper, an MR visualization method based on sound field modeling is proposed. Using a small quantity of measurement data, the sound field was modeled using equivalent sources and room shapes acquired via SLAM. From the modeled sound field, the estimated room impulse responses at the target grid points were then animated to visualize the sound field using MR technology. Consequently, the animation of the sound field in MR clearly represented how sound propagates, including reflections.
Walking in larger virtual environments than the physical one can lead to collisions with physical boundaries. Multiple locomotion techniques like Redirected Walking (RDW) and Overlapping Architecture (OA) aim to overcome this limitation. Combining these two has yet to be investigated in large physical spaces with resets. In this work, a hybrid locomotion method was implemented that combines RDW and OA. A user study was conducted where participants collected items in a virtual environment with multiple rooms. The study showed that the distance walked between resets was increased substantially, thus showing the solid advantages of combining OA and RDW.
Virtual space can eliminate the physical constraints of real space. Three-dimensional (3D) digital twins enable us to experience manipulation of cultural assets that are inaccessible in real space. However, most 3D models obtained using current reconstruction techniques are static; they cannot be manipulated dynamically. In this study, we reproduce a dynamic 3D model of a simple articulated object with a rotating joint from a static point cloud only with reference to a video of the motion and a manually added rotation axis. The reconstructed actual cultural asset is used for first-person experiences in an augmented reality environment.
In this paper, we present a novel approach to remove object motion blur in 3D scene renderings using deformable neural radiance fields. Our technique adapts the hyperspace representation to accommodate shape changes induced by object motion blur. Experiments on Blender-generated datasets demonstrate the effectiveness of our method in producing higher-quality images with reduced object motion blur artifacts.
Automatic line-drawings colorization of anime characters is a challenging problem in computer graphics. The previous fully automatic colorization method suffers from colorization accuracy and costs colorization artists to validate and correct colorization errors. We propose to improve the colorization accuracy by introducing “pre-colorization” step into our production pipeline that requests user to manually colorize partial regions in line-drawings before doing automatic colorization. The pre-colorized regions work as clues to colorize the other regions and improve the colorization accuracy. We found an optimal region to be pre-colorized to obtain the best automatic colorization performance.
We present a novel free-viewpoint video(FVV) framework for capturing, processing and compressing the volumetric content for immersive VR/AR experience. Compared to previous FVV capture systems, we propose an easy-to-use multi-camera array consisting of mobile phones with time synchronization. In order to generate photo-realistic FVV results with sparse multi-camera input, we improve the novel view synthesis method by introducing visual hull guided neural representation, called VH-NeRF. Our VH-NeRF combines the advantages of both explicit models by traditional 3D reconstruction and the notable implicit representation of Neural Radiance Field. Each dynamic entity’s VH-NeRF is learned and supervised by the visual hull reconstructed data, and can be further edited for complex and large-scale dynamic scenes. Moreover, our FVV solution can do both effective compression and transmission on multi-perspective videos, as well as real-time rendering on consumer-grade hardware. To the best of our knowledge, our work is the first solution for photo-realistic FVV captured by sparse multi-camera array, and allow real-time live streaming of large-scale dynamic scenes for immersive VR and AR applications on mobile devices.
One of the intuitive instruction methods in robot navigation is a pointing gesture. In this study, we propose a method using an omnidirectional camera to eliminate the user/object position constraint and the left/right constraint of the pointing arm. Although the accuracy of skeleton and object detection is low due to the high distortion of equirectangular images, the proposed method enables highly accurate estimation by repeatedly extracting regions of interest from the equirectangular image and projecting them onto perspective images. Furthermore, we found that training the likelihood of the target object in machine learning further improves the estimation accuracy.
This work introduces SegAnimeChara, a novel system of transforming AI-generated anime images into game characters while retaining unique features. Using volume-based body pose segmentation, SegAnimeChara can efficiently, zero-shot, segment body parts from generative images based on OpenPose human skeleton. Furthermore, this system integrates a semantic segmentation pipeline based on the text prompts of the existing Text2Image workflow. The system conserves the game character’s unique outfit and reduces the redundant duplicate text prompts for semantic segmentation.
This paper introduces a visualization system of 3D sound pressure distribution using a minimum variance distortionless response (MVDR) beamformer with Light Detection and Ranging (LiDAR) technology to estimate the sound source localization. By using LiDAR to capture 3D data, the proposed system calculates the time-averaged output power of the MVDR beamformer at the virtual source position for each point in the point cloud data. The results are then superimposed onto the 3D data to estimate sound sources. The proposed system provides a more visually comprehensible display of the sound pressure distribution in 3D.
With increased environmental protection activities, smartphone-enabled cleaning activities to deter street littering are gaining attention. We propose a method to analyze litter-on-road images captured by a smartphone camera mounted on a bicycle for users who do not require conscious care (Fig. 1). First, the user mounts the smartphone on a bicycle and starts the developed application, which creates a still image by capturing videos. The still images were then categorized using machine learning, and the type of trash was annotated in the images. Finally, to predict the distribution of trash, the probability of its influence on the environment, such as convenience stores and bars, was calculated using the machine learning model. This paper discusses our developed system’s efficacy for acquiring and analyzing methods on the road. As a fast effort, we verify the accuracy of tagging PET bottles, cans, food trays, and masks using a learning model generated by Detectron2.
This study presents a novel system that enhances air cannon tactile perception using synchronous galvanic vestibular stimulation (GVS). We conducted a user study with a within-subjects design to evaluate the enhancement effects of synchronous GVS on air cannon tactile sensations across multiple body locations. Results demonstrated significant improvements without affecting the magnitude of physical body sway, suggesting potential applications in virtual reality, particularly for augmenting existing air vortex ring haptics use cases.
We proposed sPellorama, an immersive tool that enables Virtual Reality (VR) content creators to quickly prototype scenes based on verbal input. The system first converts voice input to text, then utilizes a text-guided panorama generation model to produce the described scene. The panorama is later applied to Skybox in Unity. Previously generated panorama will be preserved in the photosphere and ready to be viewed. The pilot study shows that our tool can enhance the process of discussion and prototyping for VR content creators.
Children with autism may have difficulties in learning daily living skills due to repetitive behavior, which poses a challenge to their independent living training. Previous studies have shown the potential of using interactive technology to help children with autism train daily living skills. In this poster, we present Tidd, an interactive device based on desktop augmented reality projection, designed to support children with autism in daily living skills training. The system combines storytelling with Applied Behavior Analysis (ABA) therapy to scaffold the training process. A pilot study was conducted on 13 children with autism in two autism rehabilitation centers. The results showed that Tidd helped children with autism learn bed-making and dressing skills while engaging in the training process.
Crossed mirror arrays (CMAs) have recently been employed in simple retinal projection augmented reality (AR) devices owing to their wide field of view and nonfocal nature. However, they remain inadequate for AR devices for everyday use owing to the limited visibility of the physical environment. This study aims to enhance the transmittance of the CMA by fabricating it with half-silvered acrylic mirrors. Further, we evaluated the transmittance and quality of the retinal display. The proposed CMA successfully achieved sufficient retinal projection and higher see-through capability, making it more suitable for use in AR devices than conventional CMAs.
We propose a method to improve the speed of the rendering of glossy surfaces including anisotropy reflection, using prefiltering. The key idea is to prefilter an environment light map for multiple primary normals. Furthermore, we propose an interpolation method to smoothly connect the boundaries generated by switching multiple prefiltered environment maps. As a result, we are able to render objects with glossy surface more accurately even in real-time.
Lenticular lenses exhibit the color changing effect depending on the viewing angle and the vanishing effect in certain directions. In this study, we propose two fabrication methods for edible lenticular lenses. One is the mold forming method, and another is the knife cutting method using a knife with the inverse structure of a lenticular lens created by an SLA 3D printer. We also evaluate the properties of the end products. The IOR of material is optimized by using ray tracing simulation.
Neural Radiance Fields (NeRF) trained on pre-rendered photorealistic images represent complex medical data in a fraction of the size, while interactive applications synthesize novel views directly from the neural networks. We demonstrate a practical implementation of NeRFs for high resolution CT volume data, using differentiable rendering for training view selection.
We present Reverse Projection, a novel projective texture mapping technique for painting a decal directly to the texture of a 3D object. Designed to be used in games, this technique works in real-time. By using projection techniques that are computed in local space textures and outward-looking, users using low-end android devices to high-end gaming desktops are able to enjoy the personalization of their assets. We believe our proposed pipeline is a step in improving the speed and versatility of model painting.
We share our experience of using containers (Docker and Singularity) in computer graphics teaching and research support, as well as the use of the same containers in HPC. We use OpenISS sample containers for this purpose, containers that are open-source and publicly available.
Improvements in the science and art of computer-graphics rendering, particularly with a shift in recent decades toward more physically driven models in both real-time and offline rendering, have motivated improvements in material models. However, real-world materials are often still significantly more complex in their observable light scattering than current shading models used to represent them in renderers. In order to represent these complexities at higher visible fidelity, improved methods for material acquisition and representation are desired, and one important area of continued study is capture and representation of properties of spatially varying physical materials. We present developing efforts toward acquiring and representing those spatially varying properties that build on recent work concerning parameterization techniques to improve the efficiency of material acquisition.
This short paper introduces VirtualVoxel, a novel rasterization-based voxel renderer that enables real-time rendering of 4M3 resolution voxel scenes. VirtualVoxel combines the benefits of virtual texture and virtual geometry, allowing for efficient storage and rendering of texture and geometry information simultaneously.
Similar to the VirtualTexture approach, VirtualVoxel streams the appropriate level of detail into GPU memory, ensuring optimal performance. However, unlike previous graph-based voxel visualization methods, VirtualVoxel stores data linearly and employs the rasterization rendering pipeline, resulting in highly efficient modification of large scenes.
VirtualVoxel’s high-performance capabilities make it ideal for a wide range of applications, from video games to scientific visualizations. It can render scenes at resolutions of up to 4M3 voxels in real-time, providing artists and designers with a powerful new tool for creating stunning graphics.