We present a novel approach to generate emissive textures for luminous objects, using direct 3D supervision from a 3D model dataset. To this end, we construct Emissive Objaverse, a dataset based on the recently proposed Objaverse dataset, and propose 3D Lighter, a method using neural fields with generative latent optimization.
Our group proposed a method of aerial display using a flying screen suspended from a drone and dynamic projection mapping on the screen. In this study, we propose two methods as extensions of this previous study. First, we propose a structure of a large screen with a specially designed LED marker. The proposed screen structure is suitable for suspending from a drone. The LED marker has a special structure that allows the screen center position to be estimated from the captured image. Successful and stable projection of laser patterns onto the screen prototype suspended from a flying drone was demonstrated. In this experiment the distance between the screen and the projection system was approximately 16 m. Second, we propose a method that enables long range dynamic projection mapping using a high-brightness projector. Stable projection of an animation of a 2D character on a manually moving screen approximately 16 meters away from the projection system was also successfully demonstrated.
One problem with walking in AR is less readability of displayed text. Head shaking causes the displayed text to shake. The screen coordinate system(SCS) or world coordinate system(WCS) is used for displaying text with different effective distances. We propose methods to correct text shaking by combining SCS and WCS.
Individuals are besieged by anxiety and stress throughout the world. Solutions to improve mental well-being can vary, and Autonomous Sensory Meridian Response (ASMR) is one potential method that has proven to be able to reduce stress. To that end, we introduce asmVR, a novel approach to enhancing ASMR experiences using multi-modal triggers. Combining online and offline modes, asmVR enhances ASMR tingling, offering immersive VR environments and remote ASMRist embodiments. Initial user testing shows tingles enhance and stress relief potential, along with new possibilities for VR in psychological therapy.
We propose an automatic auditory VR generative system based on natural language input and attempt to apply it to VR exposure therapy, a promising treatment for Post-Traumatic Stress Disorder (PTSD). The system consisted of the user interface, developed based on the Large Language Model (LLM), the auditory event dataset, which has metadata of “subject” and “verb” and spatial audio generator.
This research aims to create an interactive experience that alleviate anxiety of pediatric patients and cause empathy within their family and medical community. We developed the novel medical preparation system through the integration of projective and tangible interfaces. Through our work, children can intuitively understand their illnesses and medical procedures in an age-appropriate and engaging manner.
The Proteus effect is known as the effect that the self-image evoked by an avatar influences the behavior. Previous studies have shown that the Proteus effect influences food-related behaviors, but it is not clear whether it influences perceptions of food. Therefore, in this study, we investigated whether the avatar’s body shape affects beverage perception when drinking a beverage in a virtual environment. Specifically, we measured the sense of body ownership toward the avatar, the taste of the beverage, the amount of the beverage consumed, and the purchase intention of the beverage. The results showed that a gradual transition in body shape with drinking produced a significant improvement in the sense of body ownership, particularly in terms of feeling that the virtual body belonged to oneself. We also report the larger body shape significantly increases the purchase intention. Furthermore, we found that image congruence between the avatar and cola induces better taste perception.
In this study, we propose Conversation Echo, a system that reflects the topics of conversation in a VR environment in real time.This method uses AI to convert speech data into text, extract conversation topics, and generate panoramic images to generate a VR environment, which dynamically changes the environment in real-time. This method aims to realize an experience that generates conversation topics and inspiration.
With the development of mixed reality technology, the convergence of virtual and real experiences has emerged as a trend in the interactive narratives. In this paper, we introduce “Crossing Narrative”, an interactive narrative experience that seamlessly blends virtuality and reality by utilizing real-world views and bystanders. We discuss specific methods for designing cross-reality narrative experience, focusing on three key aspects of cross-reality interactions: diegetic objects, transition effects, and bystander's avatar. Our design methods aim to enhance audience social interactions and understanding of the immersive narrative.
This paper describes a research project on the development of a VR fire training system. We aimed to create a multi-sensory experience that simulates a real-world fire scene. To achieve this, we developed a motion simulator to provide physical stimulation, haptic firefighting nozzles to provide a realistic sense of tool use, and a firefighting suit that can transmit the hot/cold sensations of VR content to the entire body. We evaluated firefighter and public satisfaction with the VR firefighting experience, and outlined future research challenges to improve usability in the field.
We propose the utilization of motion capture and immersive virtual reality to explore embodiment and user perception associated with prosthetic limbs. We developed a virtual reality simulation where a user could control an amputated avatar in first-person view with full body motion capture to experimentally investigate how the movement speed of an autonomous prosthetic limb affects its embodiment, usability, competence, warmth, and discomfort. In a within-subjects design experiment, participants performed a reaching task using a virtual prosthetic lower arm that moved at six different speeds in a minimum jerk trajectory. Results showed that extremely fast movements and extremely slow movements equally reduce embodiment, usability, and competence while movements at moderate speeds maximize embodiment and usability while reducing discomfort. Our findings provide insights into developing autonomous prosthetic limbs with higher user satisfaction that take embodiment, user perception, and usability into consideration.
This paper presents a novel augmented reality tourism system using miniature dioramas. It offers a unique, immersive experience simulating aerial exploration of tourist attractions. Our system employs advanced 3D scanning and tracking for user interaction with attraction features, along with informative voice guides. This fresh approach enhances engagement and learning in diverse contexts.
Human dietary experiences are influenced by multiple senses. To promote healthy eating and reduce the consumption of unhealthy food, we have developed FoodMorph, a virtual reality system that immerses users in visually simulated food textures that are inedible. By presenting users with these textures, FoodMorph aims to diminish interest in and intake of unhealthy food while also assessing the dining enjoyment associated with common non-food textures. We found that concrete textures on food tend to lead to the lowest enjoyment score, which potentially effects their dietary habits.
In this work, we propose a novel approach to texture generation, making use of recent advancements in Latent Diffusion models, [Rombach et al. 2022] unlocked by [Zhang and Agrawala 2023], introducing control inputs to generation pipelines via ControlNet. We find that a special condition, where the mesh is encoded into UV space, can serve as a control input, producing textures that are geometrically and visually coherent, and of high quality. Using this approach, we are able to generate a unique look guided by text for existing meshes in a matter of seconds.
We present Ignis, a GPU Eulerian fluid solver capable of simulating and rendering at VR resolutions and refresh rates. Core to our approach are an approximate shadowing technique and an adaptive dithering technique, which allow us to cheaply render fully lit volumes at high resolutions and under a range of visibility conditions. We discuss the design decisions which enable our solver to run at interactive rates.
We present an interactive approach to estimating the relative camera pose of two panoramas shot in the same indoor environment. Compared to the trivial interactive baseline, which would require the user to precisely select 8 or more pairs of matching points by mouse clicks, our method just needs the user to select a pair of matching walls with two mouse clicks or keyboard strokes. Our method is based on the key observation that, in most cases, there exist at least one or multiple pairs of roughly matched walls in the room layouts estimated by neural networks - which alone are sufficient to generate accurate relative camera poses. Tested on a real-world indoor panorama dataset, our method outperforms current state-of-the-art automatic methods by large margins, compensating the additional human efforts. Through user studies, we found that matched wall-wall pairs can be easily recognized and selected by humans in relatively short time, indicating that such an interactive approach is practical.
In this paper, we proposed a generative model that learns to synthesize the 4D facial expression with the neutral landmark. Existing works mainly focus on the generation of sequences guided by expression labels, speech, etc, while they are not robust to the change of different identities. Our LM-4DGAN utilizes neutral landmarks to guide the facial expression generation while adding an identity discriminator and a landmark autoencoder to the basic WGAN for achieving better identity robustness. Furthermore, we add a cross-attention mechanism to the existing displacement decoder which is suitable for the given identity.
Medical models play an instrumental role in replicating a patient’s unique anatomy, enhancing surgical outcomes, and fostering effective communication between doctors and patients. While 3D printing technology offers bespoke solutions for such models, producing designs with multiple overhanging tissues often requires advanced 3D printers – making it unattainable for conventional FDM fabrication methods. To bridge this gap, we present a cost-efficient, multi-stage, printing-molding hybrid manufacturing technique that incrementally solidifies complex medical models. Leveraging our adaptive optimization algorithm, we determine the optimal molding direction and the most streamlined manufacturing process with the fewest stages. As a testament to our method’s efficacy, we successfully printed a liver model featuring overhanging tumors.
Generative Artificial Intelligence (AI) has been a fast-growing technology, well known for generating high-quality design drawings and images in seconds with a simple text input. However, users often feel uncertain about whether generative art should be considered created by AI or by themselves. Losing the sense of ownership of the outcome might impact the learning process and confidence of novice designers and design learners who seek to benefit from using Generative Design Tools. In this context, we propose OwnDiffusion, a design pipeline that utilizes Generative AI to assist in the physical prototype ideation process for novice product designers and industrial design learners while preserving their sense of ownership. The pipeline incorporates a prompt weight assessing tool, allowing designers to fine-tune the AI’s input based on their sense of ownership. We envision this method as a solution for AI-assisted design, enabling designers to maintain confidence in their creativity and ownership of a design.
Legibility refers to the ease with which handwritten content can be read and understood accurately. However, existing approaches to handwriting beautification either rely on the result of handwriting recognition and accumulate errors from the recognition system or do not address the alignment problem and are difficult to generalize to other languages. This paper presents a novel approach to improve handwriting legibility by straightening the written content. It utilizes a recurrent neural network that operates without the need for recognition, supports connected writing, and accommodates various writing styles. The results obtained with this method demonstrate significant improvements in handwriting alignment. Moreover, a single neural network model can effectively cater to multiple languages within the same writing system.
In this poster, we present a method for recovering surface details from blurry images. We achieve this by transforming input position features of the neural implicit surface and radiance field using a blur kernel and simulating the motion blur process through weighted averaging of different transformations. Then, we can obtain more clear surface reconstruction results by discarding the blur kernel during the testing phase. Experimental results on the blurred DTU dataset demonstrate that our approach exhibits robustness to blurry inputs and effectively reconstructs surface details.
In this study, we propose a projection mapping technique designed to connect rooms in disparate locations virtually, creating a continuous, immersive space.
Despite technologies related to remote communication becoming widespread in recent years, the physical presence of display screens often hinders the sense of realism and immersion. Therefore, this study aims to create a sense of continuity between a local room and a remote room, irrespective of display device constraints. We achieve this by utilizing a wide-area image projection technique in a room. First, we realize a projection mapping system that can project images over a wide area, including in limited indoor spaces. This system improves the sense of connectivity with remote locations by expressing a seamless illusion of continuity between the remote and local environments. We conduct user studies and show the effectiveness of the proposed method.
The Rule of Thirds is a well known heuristic in photo composition. The professional photography community both uses it and derides it. We report on an experiment to test the validity of the Rule of Thirds in the simplest case: composition of a single object. Our results show that our participants overwhelmingly preferred a centered object in the image to one positioned according to the Rule of Thirds. We speculate why this is so and point to other research that addresses how we can take advantage of this “salient centeredness”.
We propose a novel and comprehensive general-purpose object tracking system named Self-supervised Centric Open-set Object Tracking or ‘SCOOT’. Our SCOOT encompasses a self-supervised appearance model, a fusion module for combining textual and visual features, and an object association algorithm based on reconstruction and observation. Through this system, we unlock new possibilities for enhancing the capability of open-set object tracking with the aid of language cues in real-world scenarios.
Music serves as a tangible embodiment of the performer’s expression, bearing a distinct musicality that transcends auditory perception. This study delves into the distinctive musicality inherent to musicians through physical data analysis. It introduces a novel approach to the auditory experience by incorporating tactile stimulation via vibration and pressure to present an innovative channel for conveying musicality to the audience.
The primary aim is to enhance the realm of music appreciation by amplifying the scope of performers’ expressive capacities, thereby revolutionizing the conventional paradigm of music experiences.
Text-driven methods have recently gained substantial attention in the realm of image and 3D model generation. A critical aspect of these methods is CLIP (Contrastive Language-Image Pre-training), which computes semantic similarities between input texts and resultant images. This paper introduces a text-driven approach to tree modeling, adopting an optimization technique with CLIP. Tree models are generated through L-System. We adopt genetic algorithms for optimization, determining fitness through CLIP. The efficacy of our method is demonstrated through various examples.
Redirected Walking (RDW) is a locomotion technique that enables users to explore extensive virtual environments while confined to limited real-world environment by manipulating orientation and coordinates within the virtual environment. Previous research has shown that RDW can be made more effective by wearing a knee-tightening device. In this study, we examined the effects of two different knee supporters—band-type and soft-type—on the applicable gain of RDW. A significant difference in applicable gain between the two supporter conditions was confirmed. This result indicates that knee tightening affects the applicable gains of RDW and suggest that a more comfortable RDW experience may be possible with the use of appropriately shaped supporters.
Understanding visual perception of materials is critical for informing image-based approaches to real-time rendering. This poster presents a new cue to translucency that can be efficiently modeled using graphical rendering.
Recently, Neural Implicit Representations (NIRs) have gained popularity for learning-based 3D shape representation. General representations, i.e. ones that share a decoder across a family of geometries have multiple advantages such as ability of generating previously unseen samples and smoothly interpolating between training examples. These representations, however, impose a trade-off between quality of reconstruction and memory footprint stored per sample. Globally conditioned NIRs suffer from a lack of quality in capturing intricate shape details, while densely conditioned NIRs demand excessive memory resources. In this work we suggest using a Neural Network to approximate a grid of latent codes, while sharing the decoder across the entire category. Our model achieves a significantly better reconstruction quality compared to globally conditioned methods, while using less memory per sample to store single geometry.
Virtual reality (VR) shopping has been attracting attention for its potential to offer people a new purchasing experience. However, most previous studies on VR shopping have dealt with systems that imitate real stores. Therefore, we focused on VR shopping systems that do not imitate real stores (non-store type systems) and evaluated their usability.First, we conducted an experiment to compare the shopping experience between a store type system and a circularly displayed non-store type system. The results revealed that the non-store type system was superior in terms of usefulness and ease of mobility but inferior in terms of spatial recognition and likability. On the basis of these results, we then proposed a retractable non-store type system and conducted an experiment to clarify its usability. As a result, the retractable non-store type system was rated better in terms of visibility and ease of mobility than the store type system. To create a better system, we have fixed and improved some problems with the retractable system pointed out by the experiment participants.
This study focuses on the oil painting brush style transfer in deep convolutional network-based style painting models. We proposes an SVG gradient vectorization process to preserve brush stroke structures while avoiding the generation of a large number of paths. Most of the images in non-photorealistic rendering painting are raster images, suffering from blurriness and quality degradation when zoomed in, whereas vector graphics offer advantages such as scalability and detail preservation. However, existing SVG vectorization methods struggle with images containing gradient colors. The proposed method involves vectorizing each brush, analyzing the positions of main color tones, and incorporating gradient color control points. Finally, the vectorized brush results are stacked and merged. Experimental results demonstrate that this process can preserve the brush stroke structure and present gradient color effects in non-photorealistic rendering style transfer, enhancing editing flexibility and printing quality for brush style transfer.
Majority of the existing methods of music visualization utilized mostly frequency, tempo and volume which are rendered in realtime as animated images for the music being played. Visualization of music as static images is rarely addressed. In this paper, we propose visual signatures – static images which are generated using artificial intelligence to visualize the music mood.