One popular form of a tele-presence guidance VR/MR system features local trainees utilizing immersive interfaces (such as headset and controllers), while the remote trainer employs the desktop interface, for the sake of familiarity, efficiency and convenience, to control the presence and actions of one’s avatar in the third person perspective. Nowadays, the desktop (or non-VR) interfaces may extend from the mere keyboard/mouse to include speech and upper body/arm/hand gestures as well. Given this, however, it is not clear whether there exist any guideline in terms of what kind of interfaces might offer the best usability for effective avatar control. This poster presents a pilot study investigating such a problem with regards to using the keyboard, speech, and hand gestures (and properly mixing them) for a MR-based military training environment where the user controlled a squad leader character for making tactical signals to following squad members. Results indicated that appropriately mixing the interfaces by the characteristics of the subtask produced the highest usability and preference, reduced world load, and even the sense of embodiment with their avatar.
The ChoreoSurf system is a scalable surface system with a shape-memory alloy actuator that can bend in eight directions. This system can mount actuators on the surface layers of various three-dimensional shapes. Applications include a tabletop system, interactive wall, tentacles tower, kinetic dress, and kinetic wig.
Interactive Media Art (IMA) offers a promising avenue for preserving and promoting cultural heritage by engaging audiences through immersive experiences. However, its potential to support the creation of cultural content and empower practitioners has been largely underutilized. Taking shadow puppetry, a traditional art form often featured in IMA, as a cut point to examine how current IMA practices can be reimagined to better serve the needs of cultural custodians, this paper adopting a practitioner-centered design approach, aim to develop IMA content that not only showcases traditional art but also supports its sustainable innovation and development. While our research is still in its early stages, short-term experiments suggest that IMA can serve as an effective tool for creative support, assisting artists in producing their work. The long-term impact of these systems on artistic output remains to be explored.
This study investigates the design of response layouts for large language models in XR environments for vehicle settings. Our formative study identified usability challenges with the current linear layout, such as difficulties in extensive scrolling and mid-air interactions. Based on these findings, we designed an improved layout that utilizes multiple windows for efficient access to diverse content, with a cohesive placement to minimize motion sickness. A user study with 24 participants compared the proposed interface against a conventional web browser interface through various information-seeking tasks. The results demonstrated that the proposed interface enhanced overall usability compared to the baseline.
The acquisition of motor skills is intrinsically related to human movement dynamics. Traditional methods for measuring muscle activation via electromyography (EMG) sensors are often prohibitively expensive. Previous approaches developed algorithms to infer muscular information from other modalities, though predominantly concentrating on gross motor activities. In contrast, dexterous skills, such as piano performance, mostly engage small muscles, for which reliable and cost-effective solutions remain underexplored. We introduce PianoKeystroke-EMG, an approach to estimate EMG data from readily accessible data, such as piano keystroke motions. Further evaluation demonstrates that our system outperforms baselines in accuracy and versatility, potentially offering a practical tool for pianists to enhance their performance techniques.
This paper proposes a system based on a multimodal large language model (MLLM) to assist non-expert users without prior experience in machine learning (ML) development. The MLLM assistant in our system interactively helps users compile their requirements and create appropriate training data while building an ML model. It has been reported that users often struggle to define training data that comprehensively covers all samples or aligns with their needs. To prevent such failures, the MLLM assistant monitors the user’s interaction process and translates users’ vague needs into concrete ML formulations through chat, ultimately facilitating the creation of appropriate training data.
This work explores the idea of speculative ambient media, which integrates reflexive and speculative elements into environmental objects. Unlike traditional ambient media that provides background information, speculative ambient media subtly promotes reflexivity on societal and environmental issues. Through the case, we illustrate how such media might inspire contemplation without being intrusive. The installation features slow-changing projections from plastic waste to implicate and raise environmental awareness and challenge perceptions of human impact on nature. Speculative ambient media enhances environmental consciousness and fosters dialogue. Future research will investigate its application in various contexts to further assess its potential for stimulating discussion and reflection.
This study proposes a system to recognize finger-pointing gestures and estimate their 3D coordinates to provide an intuitive and natural interface in the field of human-computer interaction. The study delineates a method commonly used in human conversations to describe locations and adapts it for interface application. It focuses on pose estimation using a stereo camera and provides mathematical and technical explanations, demonstrating the application of this interface to a four-wheeled mobile robot. The interaction between the user and the robot is detailed, alongside demonstrations and elucidations of the errors and their causes during the process. The study proposes methods to minimize these errors and explores additional potential applications of the finger-pointing gesture interface, highlighting its feasibility.
Immersive head-mounted display (HMD) provide a highly immersive experience by blocking the view of the real space, but using them for an extended period in daily life is challenging due to difficulties in switching tasks between virtual and real spaces and communicating with neighbors. We propose HidEye, an interaction method that enables users to easily switch contents by covering one eye. The goal is to seamlessly integrate tasks in virtual and real space through natural movements, without the need for special sensors or other devices on the HMD.
For professional players, observing their opponents’ serves in detail typically involves watching video recordings, usually from broadcast footage. However, broadcast videos often capture players from a side view and lack immersiveness. In this work, we aim to reconstruct 3D table tennis games, incorporating AI-predicted player movements and ball placements, based on any given broadcast table tennis video. Moreover, an editing tool is developed for setting the table position and correcting the wrongly estimated player or ball placement positions. The ball trajectory is then simulated based on the corrected player and ball placement positions. With the proposed simulation system, players can train their reactions and responses to different types of serves in a VR environment from the first-person perspective. They can also observe their opponent’s as well as their own movements from any chosen viewing angle, enhancing their training experience.
We introduces V-Wire, an single-wire system designed to simplify hardware prototyping and improve fault detection in educational settings. V-Wire addresses key challenges in hardware development by reducing wiring complexity, eliminating port assignment constraints, and simplifying troubleshooting. The system utilizes a single closed loop wire for two-way communication and power supply, allowing for easy connection of multiple modules and straightforward detection of disconnections. This paper describes the system’s architecture, implementation, and its significant advantages in educational applications. This system can support beginner developers in connecting a hardware device easily.
Inspired by the human ability to sense the presence of people walking behind them through floor vibrations, we propose a concept to convey presence by transmitting footsteps and vibrotactile cues from one space to another. Based on this concept, we developed a prototype that transmits footsteps captured by multiple microphones installed on the floor surface to another space and presents a moving vibration source by switching between multiple transducers on the floor’s underside. Preliminary experiments with the prototype confirmed that the direction and trajectory of the vibration source can be perceived to some extent.
Curtains are functional textiles that play a crucial yet invisible role in our daily lives. Despite their widespread use for decoration, blocking heat, privacy, etc., they are still passive in our surroundings. This paper presents a design exploration and development of an interactive curtain interface using capacitive sensing. We augment the capabilities of everyday curtains into touch-sensitive surfaces to facilitate embodied interactions through physical manipulation and gestures. By interfacing these curtains with a smart home environment to control lights, fans, and other appliances, we show that interactive curtains are technically feasible, thus paving the way for novel Curtain UIs as a medium of tangible, embodied, and embedded interaction in a ubiquitous computing scenario.
A human-robot collaboration system “Telecobot” is proposed, in which a remote user allocates tasks to multiple telepresence robots via digital twin spaces. The digital twin spaces of local sites where robots work is used as a medium to observe the real spaces from a remote site and to give instructions to the smart robots on the local site. Telecobot supports a remote user understanding the real-time status of local sites distributed geographically and allocating tasks to each of the robots. We assume that the robots are smart enough to execute the series of tasks assigned by the remote user. This division of roles is a major characteristic of Telecobot, where a remote user is responsible for judging the situation and building a series of tasks via the digital twin space, while multiple robots working in each of the local sites executes the individual tasks following the information built by the remote user in the digital twins of the local sites. An experiment assuming telecollaboration with smart robots working at nursing care sites demonstrated the division of roles for the care works and showed the potential of increasing the productivity of the work by using the Telecobot interface.
We present Signal2Hand, a method that reconstructs hand-depth images from body-worn sensor signals. To do this, we trained VQ-VAE using a pair of body-worn sensor signals and a hand-depth image. We conducted a preliminary study of the method and evaluated the reconstruction results using four evaluation metrics.
This study presents “Affective Wings,” a concept involving a soft floating robot designed to enable proximal interactions and direct physical contact with humans to support emotional connection in an indoor environment. Leveraging the capabilities of lighter-than-air robots combined with a controllable wing mechanism, we provide a novel approach to human-robot interaction prioritizing safety and rich dynamic movements. The robot, designed to fly freely within human living spaces, can engage in both physical contact and non-physical contact behaviors, such as hugging, tapping, perching, mimicking, and waving.
Time Light is an innovative interface developed as part of a Mixed Reality (MR) system designed to deepen the understanding of the renowned Takamatsuzuka Kofun’s murals. This interface allows users to compare and observe both the severely deteriorated murals, affected by looting and plaster degradation at the time of discovery, and the estimated reconstructed images that illustrate how the murals might have appeared when originally created. Although Time Light was developed to enhance the understanding of the Takamatsuzuka Kofun, it can be applied to any murals with both discovered and reconstructed images. By using Time Light, users can experience a unique perspective on the historical and cultural significance of these national treasures.
This proposal presents a pseudo-force feedback design based on spatially localized sound, tailored for the visually impaired. Sound location is adjusted to create an auditory conflict between the perceived hand position in virtual space and its actual position in the real world, thus inducing a force sensation. Two water-stream test scenarios were crafted in an Augmented Reality application built with MediaPipe for hand tracking and OpenAL for sound rendering.
Traditional text-based methods for learning thermodynamics are often inefficient according to neuroscientific principles. Multi-sensory inputs, particularly visual and haptic, have been shown to enhance memory retention and comprehension by engaging the hippocampus. This paper presents Thermiapt, a haptic device designed to integrate sensory experiences of quantitative thermodynamic concepts, such as heat capacity. The device is not only simple and reproducible but also has the potential to significantly enhance understanding and retention in educational settings by leveraging multi-sensory learning.
Large language models (LLMs) are increasingly recognized for their potential, but their application in fostering children’s poetry learning remains underexplored. This paper introduces Li Bai the Youth, an interactive installation that leverages a virtual agent powered by an LLM. We present a technical pipeline incorporating text-to-speech, lip synchronization, and other functionalities to enable real-time conversation between users and the virtual agent, Li Bai. Additionally, we implemented strategies to enhance the user experience during interaction. Li Bai the Youth offers a novel approach to poetry learning, promoting cultural heritage engagement.
With the development of mid-air interaction, the digital preservation and interactive learning of intangible cultural heritage (ICH) have become increasingly significant. However, the intrinsic significance of the cultural symbols embedded within numerous ICH remains largely obscure to the general public. This paper introduces "Echoes of Antiquity", an interactive installation that utilizes Leap Motion for gesture recognition and generative AI for image processing to vividly illustrate the symbolic elements of Guqin culture, thus bridge the existing chasm in public understanding and appreciation of Guqin’s rich cultural heritage. Specially, our system utilizes Leap Motion to capture gestures and deliver AI-generated images as feedback, thereby enhancing the understanding and retention of the Guqin’s cultural heritage through the seamless integration of motion and visual cues.
In skill acquisition, it is crucial not only to mimic an expert’s body movements but also to understand their underlying intentions, which can be achieved by observing their hand movements and gaze. However, it was difficult to understand the difference between one’s posture and that of the expert in previous systems that assisted in understanding the intent of the expert. To overcome this limitation, we propose Jaku-in, which displays the expert’s hand movements and gazes in a three-dimensional reconstructed space using a point cloud. The system’s pass-through functionality of head-mounted displays allows novices to observe and emulate a life-size expert from multiple viewpoints: first-person, third-person, and behind-view. This research initially focuses on the Japanese tea ceremony, a practice where the precision and beauty of physical movements are fundamental. This showcases the effectiveness of ’Jaku-in’ in this domain and inspires potential applications in other intricate skill domains.
This study proposes a novel screen that focuses on the freezing phenomenon of soap films. Although soap films are thin and transparent, in low-temperature environments, beautiful ice crystals form on the surface, reducing transparency and enabling image projection onto the film. This study explores the application of soap films as multi-layered projection screens that leverage the aesthetic qualities of the freezing process, and discusses potential expansion technologies. We also explore the possibility of applying frozen soap films as a novel screen.
Paper has long been an indispensable part of our lives, and various types of paper are used in both everyday situations and in specialized fields, regardless of the field. However, paper needs to be used for different purposes, and we believe that this limitation is due to the fixed physical parameters of paper. Therefore, we focused on the fact that the traditional manufacturing process of paper has a unified process of dissolving pulp in water and drying it, regardless of the physical properties of the paper. By combining the papermaking process with an extension technique that regulates the moisture content of the paper, we were able to vary the physical parameters of the paper so that our paper is dynamic and possesses multifunctional properties not found in conventional paper. In this paper, we propose an application of our new paper’s functionality to improve the interaction between computers and humans.
This paper introduces a real-time holographic media system that converts 2D or RGBD videos into 3D holograms. The core of this system includes a Linux host that extracts depth information from 2D images and transmits it via packets, and a holography processor leveraging high-bandwidth memory (HBM) to generate volumetric holograms with 8-layer depths at 30 frames per second (fps). Demonstrations highlight the potential of the proposed system for advancing future holographic media technologies.
We propose an optical system displaying mid-air images that move between the inside and outside of a mirror with backward glance in the mirror. Previous research has not been able to achieve all three of these goals: the standing image floats in space, moves across the mirror’s surface, and is reflected in the mirror. Proposed system “MMM” achieves all three of these goals and presents a worldview of the mirror as an adjacent virtual world. A magic mirror application that brings characters into the real world from a display have been developed. In this application, the mirror space connects the cyberspace inside the display with the real world.
Displaying large-scale holograms with a wide field of view (FoV) requires ultra-high-resolution data, often reaching tera-scale sizes. We propose an out-of-core diffraction method that utilizes multiple SSDs simultaneously to manage tera-scale holography within limited memory constraints. To enhance I/O bandwidth, we introduce a block-wise I/O approach for diffraction computation. Additionally, our method employs a compressed data transfer scheme to reduce I/O costs. As a result, our method performs diffraction up to 4.2 faster compared to alternative methods. We also validate our approach by generating a 256K2 resolution hologram, which would typically require 4TB of memory using conventional diffraction methods, on a GPU with only 24 GB of memory.
This paper presents DiskPlay, a projection-based platform designed for extended holographic displays through projection techniques on rotating multi-layered disks. This system performs dynamic projection mapping on rotating disks according to their shape, projecting a different image for each layer. It provides special visual effects such as stereoscopic images and interactions such as rearranging the disks or manually adjusting the rotation speed to change the image style. This paper details the implementation methodology, and showcases two types of applications, stacking with depth and interaction with rearranging disks. It also discusses the usefulness and limitations of the system.
“Flying Your Imagination" investigates the innovative integration of Virtual Reality (VR), Artificial Intelligence (AI) technology, and embodied interaction design to revitalize and preserve China’s significant intangible cultural heritage (ICH) of kite culture. We designed a VR kite simulation environment that allows users to immerse themselves in the joy of kite flying and making within a virtual space. The system incorporates AI to generate visual designs for kites and offers diverse interactive scenarios, opening up new avenues for transmitting cultural heritage.
In this work, we proposed a new spectator experience for engaging racing fans, even when they cannot be at the race track or, on non-racing days. Audiences can watch an e-racer driving on a simulator race against the pre-recorded car data from an actual race at the Suzuka Circuit. Using mobile phones as AR displays, users can watch the two cars race against each other on a virtual 3D race track, replicating the real race track in AR. This experience was deployed at a motor show, allowing visitors to try this experience. Feedback was also gathered from the visitors, with many of them reporting feeling excited with such kind of racing, and feeling a sense of immersion with such AR experience, indicating strong potential of this new spectator experience.
This study presents Media Bus prototype for facilitating digital storytelling experiences through extended reality (XR), head mounted displays (HMDs), and transparent OLEDs within the city of Seoul. The system offers an immersive tourism experience by integrating precise location tracking, which combines Visual Positioning Service (VPS) and Global Positioning System (GPS), along with TOLED displays and wearable augmented reality (AR) content. Two rounds of expert interviews were conducted to guide the development and assess the prototype. The preliminary interview provided foundational concepts and technical requirements, while the subsequent interview focused on evaluating the prototype and suggesting potential improvements. The results indicate that the proposed system has significant potential for enhancing urban tourism experiences in an innovative manner. However, challenges remain in improving location accuracy and refining user interfaces. This research presents a novel application of XR and is anticipated to advance smart city initiatives and cultural tourism efforts.
Utilizing volumetric video capture technology and augmented reality, we have developed an extended reality prototype to support the development of a hybrid Japanese Noh performance involving neurodiverse sound artists and a professional Noh singer. Through augmented reality, the volumetric video capture of the Noh performer is integrated into the live production to accommodate the needs of the neurodiverse artists. The recorded performer augmented within a physical diorama of a Noh theatre stage is used flexibly as a surrogate virtual avatar by the sound artists in developing, rehearsing, and performing. Through the artists’ reflections we offer their preliminary insights in process of conceiving a performance that is mediated through extended reality tools.
In texture representation by projection mapping, it is crucial that the appearance of the presented virtual object matches that of the real object to maintain the quality of the user experience. This study investigates the feasibility of reproducing a virtual object whose appearance perceptually matches that of a real object with fine texture, achieved by using high pixel density projection within an optically reduced projection area of the projector.
This paper explores the application of Generative Artificial Intelligence (GenAI) techniques to the design of Role Playing Games (RPGs) in Extended Reality (XR) environments. We developed the game Reborn of the White Bone Demon, which utilizes AI speech emotion recognition technology to generate story lines and game assets in real-time based on the player’s conversations with NPCs, enhancing the player’s immersion and personalized experience, demonstrating the potential of GenAI in enhancing the XR gaming experience.
Direct Volume Rendering (DVR) effectively visualizes volumetric data, allowing focus on specific regions of interest (ROIs) while retaining surrounding context. In mixed reality (MR), DVR offers an immersive and intuitive way to interact with volumetric data. However, the ability to adjust transfer functions (TF), which control the appearance of DVR, has been limited in MR settings, often relying on predefined TFs that restrict real-time customization. In this paper, we introduce a novel system designed to enhance DVR within MR environments by providing an interactive TF editor. This system allows users to intuitively adjust the optical properties of volumetric data through a user-friendly interface. Key features include node creation and deletion for defining ROIs, a color palette and opacity control for appearance settings. Immediate visual feedback is achieved by converting the user-defined 1D TF into a 2D texture that is applied in real-time using a DVR Unity shader.
NatureBlendVR is an immersive experience designed to enhance well-being through forest bathing. This system integrates XR technology with interactive, bio-responsive physical elements. Through the experience, participants find themselves in a blended environment where physical components of nature are integrated into their immediate surroundings. The experience gradually transitions from the real world to a designed virtual forest, where digital elements align perfectly with the physical environment. This fosters a deep sense of presence, allowing users to feel fully immersed as their embodied sensations align with the virtual interactions they encounter.
This project focuses on designing an Augmented Reality (AR) application for climate justice communication. It examines ways to use spatial graphics/animation/sound for multi-user simulations with strategies regarding economic and environmental conflicts.
This study introduces an innovative mixed reality (MR) system for managing tremor disorders, including essential tremor and Parkinson's disease. The system combines an ergonomic hand motion assistance device that mimics natural hand movements to stabilize tremors with MR-based rehabilitation exercises that dynamically adjust in difficulty. Telehealth capabilities enable remote monitoring and consultations, enhancing accessibility. An iterative design process informed by feedback from medical professionals, patients, and design experts led to significant improvements in motor control, task performance, and user autonomy. This comprehensive MR intervention represents a substantial advancement in tremor disorder treatment, offering a versatile and user-friendly solution.
Our VR game simulates the experiences of takeaway riders, using real-time data to create scenarios that foster empathy between consumers and riders while highlighting the challenges faced by riders. It employs narrative prompts, empathy-based rewards, and assessment scales to measure cognitive and affective empathy improvements. The VR format elicits stronger emotional resonance than 2.5D prototypes, demonstrating the potential of immersive technologies to foster mutual understanding across different social groups.
In this study, we propose a novel illegal parking detection system based on simultaneous localization and mapping (SLAM). The system identifies the presence of illegal parking in real-time based on user-defined parking zones. Furthermore, considering the increasing deployment and pilot testing of unmanned patrol vehicles, we explore the application of this system to enhance the efficiency of automated illegal parking detection and management. The proposed system demonstrates high accuracy and real-time detection capabilities in identifying illegal parking and is expected to significantly contribute to addressing the issue of illegal parking in urban centers in the future.
Precision pose detection is increasingly demanded in fields such as personal fabrication, Virtual Reality (VR), and robotics due to its critical role in ensuring accurate positioning information. However, conventional vision-based systems used in these systems often struggle with achieving high precision and accuracy, particularly when dealing with complex environments or fast-moving objects. To address these limitations, we investigate Laser Speckle Imaging (LSI), an emerging optical tracking method that offers promising potential for improving pose estimation accuracy. Specifically, our proposed LSI-Based Tracking (SpecTrack) leverages the captures from a lensless camera and a retro-reflector marker with a coded aperture to achieve multi-axis rotational pose estimation with high precision. Our extensive trials using our in-house built testbed have shown that SpecTrack achieves an accuracy and of 0.31° (std = 0.43°), significantly outperforming state-of-the-art approaches and improving accuracy up to \(200\%\) .
Recent advancements in font generation models include GANs, diffusion models, and transformers. This study introduces a GAN-based model, zi2zi self-attention, which enhances the zi2zi model by incorporating Residual Blocks (ResNet Block) and Self-Attention Layers in the encoder and decoder. These enhancements improve the ability to capture complex font details, mimicking the writer’s style while preserving the source image’s content. The study also introduces Perceptual Loss, using high-level features from the VGG-19 network to improve visual consistency and quality. The Adam optimizer with appropriate learning rates and Beta parameters is employed to enhance convergence speed and stability. For comparison, four other mainstream models are evaluated: MX-Font, CF-Font, SDT, and Font-Diffuser. These models differ from zi2zi in their focus on extracting content characters and writing style features rather than solely using adversarial networks. Experimental results show the potential and effectiveness of zi2zi-Self-Attention in generating high-quality, visually appealing handwritten fonts.
Automated floor plan generation that aligns with exterior wall boundaries is crucial in architectural design. Existing methods using GANs lack accuracy and require raster-to-vector conversion. Our study employs Diffusion Models and Transformers, incorporating self-attention mechanisms between exterior wall coordinates and room corners. This approach effectively generates diverse, accurate floor plans in vector format conditioned on exterior boundaries, addressing key challenges unmet by previous methods.
In this study, we aim to quantify the sense of depth and perceived position in mid-air images. In the experiment, the participants were instructed to indicate the nearest frontal point and zenith of both mid-air images and actual spherical objects. The results suggest that the frontal surface of mid-air images is generally perceived on the image plane or behind it. However, when depth is recognized, it appears slightly less pronounced compared to actual objects.
A space variant Gaussian blur is used to render blur brush effects. Regions of an image or video are blurred for stylistic effect or to obscure information. Exact methods of rendering this effect are typically too slow to provide user feedback. We propose a novel approximating technique which is used to produce both real-time and final renders.
Previous work in Human-Food Interaction has investigated how to embed tags into food. One promising approach involves controlling the internal structure using food 3D printers. However, this method is not widely accessible due to the current limitations of food 3D printers. This paper explores alternative fabrication techniques, molding and stamping, for embedding unobtrusive tags inside foods. Our preliminary evaluations showed that the proposed methods can embed tags and have the potential to employ a wide range of materials with lower costs compared with the food 3D printing technique. These findings suggest that low-cost edible tagging technologies could become more accessible and versatile.
We propose a novel approach to achieve huge speed-ups for shortest path computations on 2D binary images at the cost of slight inaccuracies. This is done by migrating the arc flag-based shortest path methods, which can deliver large speed-ups for run-time shortest path computations but needed a computationally expensive preprocessing step, to work on downsampled versions of the input images. Through extensive tests, we found that our approach hugely reduces the costs of the preprocessing step, therefore resolved a performance bottleneck of the arc flag-based methods.
Can we estimate 3D human pose from ultra-low resolution thermal images (e.g., 8 × 8 pixels)? This study explores this possibility. Thermal images capture radiation intensity, minimizing personal information exposure, and are commonly used in devices like air conditioners. We propose a framework that uses 8 × 8 thermal images for 3D human pose estimation, enhancing privacy and efficiency. To overcome challenges from subject and ambient temperature variations, we employ adversarial learning with discriminators for subject and temperature, ensuring robust and invariant feature extraction.
Estimating room layouts from 360° panoramic images is an essential task in computer vision, enhancing 3D scene understanding from 2D images. To achieve and handle high-quality panoramas necessitates a Super-Resolution (SR) technique that is both light and capable of arbitrary-scale SR, optimized for efficient inference. In this study, we propose a simple model that employs a modular arbitrary-scale technique. Additionally, our model incorporates quantization to maximize efficiency during user inference, making it well-suited for processing high-quality panoramic images.
The coloring process in anime (Japanese animation) production constitutes a critical yet time-consuming aspect of creating animated works. This study introduces a colorization system for anime line art using a reference image. The proposed system colors line art based on the features of corresponding regions in the reference image, significantly reducing the time required for the coloring task.
In practical applications of GAN for image generation, large size training samples may be collected from random databases or web- pages. For example, if we collect human face images for training set it may include unexpected impure samples such as cartoon or anime human face images that hinder realistic and high quality human face generation. Manually removing such impure samples is extremely time-consuming and impractical due to the large volume of data. In this work, we propose a novel method to au- tomatically remove impure samples in GAN training.
Separating material and illumination from images allows for realistic manipulation, which is crucial for image editing and augmented reality. In this work, we propose a training strategy for intrinsic decomposition networks that combines unsupervised domain adaptation with self-supervised constraints and sparse-guided fine-tuning in the target domain. Through two complex optimization steps, even a lightweight CNN can achieve outstanding material and illumination separation in real single images. The final separation results and various realistic image manipulation applications demonstrate the effectiveness and potential of our proposed method.
In this work, Multimodal Autoencoder is proposed in which images are reconstructed using both image and text inputs, rather than just images. Two new loss terms are introduced: Image-Text-loss (Lim − te) that measures similarity between input image and input text, and Text-Similarity-loss (Lte − sim) that measures similarity between generated text from the reconstructed image and input text. Our experiments demonstrate that images reconstructed using both modalities (image and text) are of significantly higher quality than those reconstructed from either modality alone, highlighting how features learned from one modality can enhance another in the reconstruction process.
We propose a wirelessly powered and communicable position tracker implemented in a 10 mm cubic volume. Measurement results with a prototype show that the proposed adaptive power receiver (APR) maintains constant wireless power delivery regardless of the tracker’s position. Additionally, the system achieves a maximum localization error of 4.75 mm.
We propose automatic generation of sound signatures — short pseudowords which can be associated with certain features of geometric shapes such as roundness, spikiness, etc. This research is related to sound symbolism, a sub-track of synesthesia where people describe information with different sounds. First, we propose a method which allows us to automatically calculate a measure of roundness of geometric shapes. We verify the obtained results with the numerous user tests conducted over the century by the sound symbolism researchers. Next, we propose a deep learning-based method of new sound signatures generation for different shapes.
In Hand-Object Interactions (HOIs), contact information between the hand and the object can be utilized to estimate the area of deformation area and to improve the relative position of the hand and the object. However, mapping contact information to unknown and deformable objects is still challenging due to their shape transformation. In this study, we present a preliminary attempt to reconstruct the surface of a soft object and identify the contact area on that surface in HOIs. By using our system composed of multiple RGB cameras and a single thermal camera, our proposed method successfully visualizes the areas on the object’s surface.
We propose a novel method to reconstruct 3D Gaussian scene from images that contain both static and dynamic distractors, e.g., vehicles and pedestrians, by using text-guided segmentation and inpainting techniques. Our approach generates masks for distractors, detects heavily masked regions and inpaints them to suppress defects and artifacts for scene optimization while ignoring distractors.
We propose an algorithm to align the center of mass of an input shape to a prescribed point. Unlike previous approaches that rely on hollowing the inner structure, our method deforms the outer shape by incorporating As-Rigid-As-Possible (ARAP) energy. Our method provides two editing modes: one prioritizing speed and the other focusing on accuracy. Combining these two modes, users can interactively explore the design space of shapes while achieving the desired center of mass positioning.
Single-view 3D scene reconstruction is challenging due to limited single-image information. Existing methods often overlook the importance of image feature extraction. We propose a novel approach that integrates semantic segmentation with an implicit function, using a semantic-guided image encoder to enhance feature extraction for improved background and object reconstruction. A categorical attention module based on semantic segmentation further aids object reconstruction. Experimental results show our method’s strong performance in both background and object reconstruction.
In this paper, we present an approach that takes point cloud data of a scene and automatically produces a lightweight 3D reconstruction of the buildings and their facade details found in the scene. Our 3D reconstructions are described using a procedural grammar to allow for light-weight storage and processing, as well as scalability. The preliminary evaluation of our approach considers the terrestrial laser scan dataset semantic3d.net that includes a wide range of outdoor scenes and feature considerably variety in terms of the density of the point clouds. Our results show that our approach is able to extract most of the buildings and correctly detect most of the facade openings.
In this study, we present a method that enables individuals to generate 3D garment designs from basic sketches, thus making a previously specialized creative field more accessible. Our approach takes a single freehand sketch as input and employs generative deep learning techniques to automatically create high-fidelity 3D garment models using a conditional diffusion 3D generation network. Our proposed method comprises two key components: 1) a pre-training phase that uses a 3D prior, drawing on a varied dataset of clothing shapes in 3D to grasp the essential characteristics of these forms, and 2) a sketch-to-prior mapping module that establishes a connection between the sketches and the pre-trained manifold, facilitating the effective generation of 3D shapes. Our method not only overcomes existing limitations but also introduces new possibilities for shape prototyping and exploration in fashion design, thus broadening the scope of creativity for a more diverse audience.
In digital light processing (DLP) 3D printing, fixed control protocol parameters limit adaptability and lead to conservative settings, reducing printing efficiency. Existing methods that optimize parameters through physical models linked to the printing process do not account for real-time environmental changes, often resulting in potential print failures. This paper proposes a dynamic control scheme for DLP 3D printing that combines physical models with real-time data. Initially, a physical model of the printing process is developed to pre-generate the printing protocol. Then, a multimodal data capture scheme is designed to access the current state of the material and model. Finally, the printing protocol parameters are dynamically updated based on the analysis. Experimental results demonstrate that this method optimizes both printing time and success rate, significantly enhancing overall printing performance.
Prefracture method is a practical implementation for real-time object destruction that is hardly achievable within performance constraints, but can produce unrealistic results due to its heuristic nature. To mitigate it, we approach the clustering of prefractured mesh generation as an unordered segmentation on point cloud data, and propose leveraging the deep neural network trained on a physics-based dataset. Our novel paradigm successfully predicts the structural weakness of object that have been limited, exhibiting ready-to-use results with remarkable quality.
Real-time 3D reconstruction is crucial for remote collaboration, but adapting to dynamic environments remains challenging. We introduce Incremental Gaussian Splatting (I-GS), a novel approach for efficient 3D reconstruction in changing scenes. I-GS focuses on human-induced changes, using images from an operator-worn monocular camera. By processing only regions consistent with the current state, I-GS achieves sequential updates. Our evaluation demonstrates that I-GS outperforms conventional Gaussian Splatting, providing accurate spatial representations resilient to moving objects. This advance significantly improves the quality of remote collaboration in dynamic environments.
We present a simple yet powerful method for segmenting 3D Gaussian splatting models using 2D segmentation masks from a limited number of viewpoints. Our approach leverages inference-time gradient backpropagation, making it both easy to implement and fast enough for interactive segmentation. The underlying formulation ensures high accuracy, making this method a highly effective tool for 3D segmentation and related downstream tasks such as AR, VR, 3DGS editing, and digital twins.
This study aims to extend the applicability of motion style transfer methods to be robust for diverse and complex motions akin to those found in real-world data. We introduce a new training method without style labels which are metadata needed by conventional motion style transfer methods and enable a model to learn from almost motion datasets. This contribution mitigates performance degradation for motions absent in style-motion datasets. Compared to conventional methods, the proposed method demonstrates superior transfer performance, particularly under conditions featuring more diverse and complex motions.
Smear effects are a visualization of 2D motion blur. In Japanese animation, because each keyframe of fast-moving objects might remain on the screen for more than one frame, significant differences often appear between consecutive frames, resulting in a choppy effect. To eliminate this, smear effects play an important role. In Japanese hand-drawn animation, this technique creates jagged outlines at the edges of fast-moving objects. This not only eliminates the choppy effect but also emphasizes the intensity and speed of the object’s movement. The proposed method is specifically designed to accommodate the common hand-drawn technique of "Animating on Ones, Twos, and Threes" [Ryan 2021], which often appears alongside the smear effect. Our method is based on skeletal animation techniques, combined with a hierarchical processing mechanism to displace model vertices. Unlike other methods that only displace vertices in a specific direction, our method includes a bending mechanism to represent curved motion trajectories. Additionally, our method allows users to freely choose which vertices in the model to apply the effects to. Finally, the deformed model is rendered from 3D to 2D, achieving this Japanese animation-style smear effect.
We consider the task of controllable and diverse motion synthesis from a single sequence as an alternative to the data-dependent text-to-motion methods, which pose ambiguities in data ownership and privacy. Recent works in hierarchical single-shot synthesis have paved the path for unconditional generation and editing tools, however the methods that focus on 3D animation have failed to control the diversity of the generated motions. In this paper we propose the integration of the variational inference in single-shot GANs, aiming to encode and control the low-frequency generating factors of the single motion sample. Our experiment showcases the ability of our VAE-GAN model to control the diversity of its generations, while preserving their plausibility and quality.
3D animators traditionally use a "pose to pose" approach, whereas motion capture (MoCap) tools generate a pose for every frame, making the motion challenging to edit. We argue that current keyframe extraction methods are inadequate for human editing. To address this, we propose a novel approach that bridges the gap between MoCap animations and traditional 3D artist tools. Our contributions include learning a neural control rig as a differentiable proxy for more accurate key pose interpolation and formulating the task as an optimization problem, solved efficiently with a greedy dynamic programming algorithm.
Decomposing human body motion content (such as walking, jumping, punching) and style (such as angry, old, proud) is a challenging task in style transfer, often causing unrealistic content-style combinations. This study demonstrates a straightforward method obtaining statistical style differences (ΔStyle) which in turn effectively replace original style from a motion content to another target style. The strength of transferred style is adjusted using a predefined parameter of our model. Experimental results show that proposed ΔStyle significantly improves style transfer quality particularly in cross-content scenarios.
Traditionally, style transfer methods have extracted style element from user-provided " Style motion" and content element from "Content motion", and combining these two elements to achieve transferred style in the content motion. In the process, they attempted to separate movements into style and content elements in a binary way. However, ambiguity exists between the definitions of style and content. An action of crossing your arms when you are angry (style) while walking (content) is not clearly categorized as either style or content. In this work we define "Gesture Style" referring to a short-term action revealing style (not content) in a motion sequence. Crossing arms in the example falls in Gesture Style, as it represents anger but is performed for short time. We propose a new style transfer method with Gesture Style Generator. It transfers style to the output motion in the conventional style transfer manner while also incorporating generated Gesture Style.
Fluid simulations in Computational Fluid Dynamics (CFD) often involve solving complex Partial Differential Equations (PDEs) with varying parameters, leading to high computational costs and slow inference times. Graph neural networks (GNNs) offer a promising alternative by efficiently representing mesh spaces and reducing these costs. In this work, we introduce a pioneering Physics-Informed Graph Neural Network (PIGNN) approach, utilizing a multi-GNN processor architecture that assigns each output parameter to a dedicated GNN processor with individual decoders. This innovative method reduces the training epochs for PIGNN by half while preserving accuracy. Additionally, we adopted mixed-precision training and achieved further improvements in reducing the training time to one-quarter of the original network while maintaining VRAM usage.
This paper presents a novel control method for steady fluid flows, such as rivers and waterfalls, simulated using SPH. Our system allows the user to place target points through which the fluids should pass, and the fluid flows can be controlled with these points. This is achieved through the control of repulsive forces on terrain surfaces. We detect positions on the terrain that affect fluid particles approaching the target points, and repulsive forces at these positions are automatically adjusted through a feedback control mechanism. Our method achieves the local control of steady fluid flows, and we demonstrate its effectiveness through several examples.
We developed a display that presents high-resolution stereoscopic images floating in mid-air. Conventional methods have not yet achieved the presentation of realistic images floating in mid-air due to issues such as resolution and the prominence of the display medium. Our method realizes a thin directional display that can present directional images at the observer’s position and display high-resolution naked-eye stereoscopic images. Additionally, by combining our method with a curved transparent reflective panel, we can present stereoscopic images in mid-air that are observable from all directions.
In interactive environments, the notion of space as a cognitive entity with the ability to autonomously perceive and explore its surroundings has yet to be fully developed. This paper presents a lightweight real-time interactive projection system that utilizes Particle Life algorithms and grid computing to enable spatial awareness and understanding of the internal environment, allowing for immediate projection adjustments in response to changes. The results show that interactive environments created by light as an information medium possess flexibility and broad application potential.
This paper presents an advanced automotive holographic head-up display (HUD) system for providing three-dimensional holograms, extended virtual image distance (>10m), large field of view (>15 degrees), and wide monocular eye-box (>2cm) without distortion and optical aberrations. Our approach ensures high-quality holographic image projection and comfortable viewing, providing a practical solution for next-generation automotive HUDs.
Tingle Tennis, a menstrual period sensory simulation game, leverages Virtual Reality (VR) technology, haptic feedback, and IoT techniques to create an immersive experience that highlights the physical, psychological, and social challenges athletes face during their menstrual cycles. Providing users with a situational life experience, the VR application aims to foster empathy and raise awareness about the importance of addressing menstruation-related issues in the psychology of sports and exercise. The overall experience is divided into tutorial, preparation, tennis training, and final match stages, with users assuming the role of a tennis athlete over a span of three days. The project’s novelty lies in its comprehensive approach to addressing menstruation-related issues in sports by using hardware devices to mimic and control the pain of menstrual cramps based on the user’s condition while experiencing the mental state of an athlete going through periods. The users also have to use various items to help users live with this situation and be fully prepared for the final match.
Recent advancements in virtual and mixed reality have created opportunities to explore how sensory experiences influence emotional and psychological states. Our installation, Sensory Cravings, leverages mixed reality to create immersive environments simulating the emotional impact of everyday consumables like coffee, alcohol, and desserts. Using Meta’s hand-tracking technology and Arduino-based sensor feedback, this work integrates visual, auditory, gustatory, olfactory, and tactile feedback to mitigate modern stress and promote emotional well-being. By incorporating real food choices that guide the virtual experience, the system enhances interactivity and immersion while offering innovative ways to perceive the real world within VR settings, adding to the existing body of work on multisensory integration in virtual cognitive environments.
GentlePoles are wooden pole-like actuators that gently rotate to guide people without relying on text signs or staff. By arranging poles with individually controllable rotation direction and speed, the system gently and subtly directs people. For example, changes in rotation can convey messages such as "go" or "stop". In environments like airports, these poles can serve as triggers for passengers to access the information they need, while providing visual reassurance to surrounding passengers. The goal is to gently present information in public spaces with low cognitive load, addressing challenges such as staff shortages and information overload.
We developed a prototype system called the "Individual Diffusion Auralize Display," which independently generates each instrument’s sound using multiple parametric array loudspeakers (PAL) and monitor speakers. The system adjusts the reflection points of the sounds based on the players’ positions. In this study, we demonstrated that the system meets the expected requirements, with the monitor particularly offering an optimal acoustic environment for multi-person viewing
We present a technique for fitting high dynamic range illumination (HDRI) sequences using anisotropic spherical Gaussians (ASGs) while preserving temporal consistency in the compressed HDRI maps. Our approach begins with an optimization network that iteratively minimizes a composite loss function, which includes both reconstruction and diffuse losses. This allows us to represent all-frequency signals with a small number of ASGs, optimizing their directions, sharpness, and intensity simultaneously for an individual HDRI. To extend this optimization into the temporal domain, we introduce a temporal consistency loss, ensuring a consistent approximation across the entire HDRI sequence.
This paper proposes a method for relighting a single terrain image to match user-specified times of day or weather conditions by estimating albedo and depth using deep learning. By employing a two-stage network to remove fog and lighting effects, our method enables accurate albedo estimation even for terrain images containing fog, which has been difficult for previous methods. We validate the effectiveness of our approach by demonstrating several examples and comparisons with an existing method.
Stroke-based rendering aims to reconstruct an input image into an oil painting style by predicting brush stroke sequences. Conventional methods perform this prediction stroke-by-stroke or require multiple inference steps due to the limitations of a predictable number of strokes. This procedure leads to inefficient translation speed, limiting their practicality. In this study, we propose MambaPainter, capable of predicting a sequence of over 100 brush strokes in a single inference step, resulting in rapid translation. We achieve this sequence prediction by incorporating the selective state-space model. Additionally, we introduce a simple extension to patch-based rendering, which we use to translate high-resolution images, improving the visual quality with a minimal increase in computational cost. Experimental results demonstrate that MambaPainter can efficiently translate inputs to oil painting-style images compared to state-of-the-art methods. The codes are available at this URL.
Translucent objects exhibit distinct appearances due to subsurface scattering, but rendering them realistically often involves time-consuming parameter adjustments. This paper proposes an efficient visualization method, using Principal Component Analysis (PCA) and Radial Basis Function (RBF), to allow users to explore subsurface scattering parameters interactively, enabling quicker computation and optimal settings.
This paper presents a novel approach to optimizing realistic fur rendering for CG animation using Unreal Engine (UE). We introduce a progressive method combining three key techniques to enhance rendering efficiency while maintaining high quality. These strategies significantly optimize GPU memory use, enabling more complex and realistic results. Our approach streamlines production workflows, reducing both time and costs, offering broader potential applications in the film industry.
In projection mapping, it is not possible to adequately represent fine, three-dimensional surface textures such as fur. To solve this problem, we propose a method to add minute high-speed vibrations in the depth direction to the projection target and to project images at high speed synchronized with the depth position of the projection target at each instant. In this paper, we report on the proposed method and its projection results.
The method of superimposed projection, which involves using a projector array, allows for projection from various directions. This technique helps to minimize the appearance of shadows caused by obstacles. This paper explores the implementation of a multidirectional superimposed projection system and analyzes the projection results through both actual projection and simulation.
We present a novel gradient traversal algorithm for real-time volume rendering of large, unstructured, dense datasets. Our key contributions include a two-pass approach consisting of a gradient estimation pass with random offsetting and a divergent gradient traversal refinement pass, achieving significant improvements over traditional methods per traversal step. By leveraging modern GPU capabilities and maintaining uniform control flow, our method enables interactive exploration of complex, dynamic, unstructured volumetric data under real-time constraints, addressing a critical challenge in scientific visualization and medical imaging.
This paper describes research on applying latent correction to image outpainting using a diffusion model. The purpose of this study is to eliminate unnecessary tendencies that frequently occur when outpainting an artwork, which lower the completeness of the resulting work. The core of the proposed method is to apply the error correction value calculated in the already known input region to the area to be generated during the denoising process in the latent space. The effectiveness of the proposed method was verified quantitatively and qualitatively through experiments.
By integrating heritage, design, and technology, “Alive Yi" innovatively empowers intangible cultural heritage through interactive generative design. Using our extensive pattern database, we visualize traditional Yi minority embroidery by applying particle and 3D effects in TouchDesigner with Leap Motion, emphasizing their symbolic meanings. This interactivity allows users to transform heritage visuals in real-time, fostering deeper engagement and appreciation. Future plans include using AI tools to create designs appealing to youth. Despite technical challenges, this approach promises cultural resilience in the digital age, offering a blueprint for leveraging computational creativity to preserve and sustain intangible culture.
AI art generators such as DALL-E and MidJourney create artwork after man-made art styles in seconds, leveraging developments in machine learning. Traditional art, such as painting and sculpture, has long been a cornerstone of cultural expression. Human value-centered design considerations are essential when implementing AI for traditional artists, balancing automation and meaningful human control levels to produce a positive sense of creation.
This research explores artistic agency in human-computer collaborative painting through embodied AI. We present INDRA, an Interactive Deep-Dreaming Robotic Artist to enhance fine art capabilities and assist users dynamically. INDRA is designed to be an artist-friendly AI agent inspired to improve the relationship between AI and creatives.
This study aims to analyze public awareness of environmental sustainability, particularly focusing on the ecological importance of bees and their impact on human food production. Real-time search query data were collected and analyzed to identify trends related to bees and environmental keywords. Artificial intelligence techniques and data visualization methods were employed to create new forms of media art. The results of the study emphasize the importance of bees and can be used as a tool to promote a shift in public awareness regarding environmental issues.
This study evaluates the effectiveness of a proposed visualization method for enhancing communication regarding skill improvement between wheelchair users and coaches. Two measurements were conducted to observe changes in communication dynamics and skill development. The study involved recording and analyzing a coach’s evaluation of a user’s 40-m sprint utilizing video, pose estimation, and sensor data graphs. The interactions between the coach and the user were subsequently analyzed to elucidate how communication evolved and to identify any improvements in the skills of users. The findings suggest that integrating these visualization tools may enhance discussions on skill enhancement and contribute to more effective training outcomes.
This research proposes a dynamic wall art piece that utilizes color changes through photoelasticity. To achieve this effect, we have developed a device that stretches a single sheet of gel material from multiple points. This device allows for gentle, organic color transitions that cannot be replicated by pixel-based displays like LCDs, offering a unique aesthetic experience as wall art.
Our approach aims to use electromyography (EMG) and vocal metrics to enhance vocal physiological feedback, comparing a professional singer and students to analyze muscle control and quality.
With pick-and-roll (PnR) being a commonly used tactic in basketball, providing a thorough analysis of it proves to be beneficial to improving a basketball team’s overall performance. In this work, we propose PnRInfo, a tool for automatically detecting PnR plays within basketball footage and visualizing respective tactical information in a 3D simulation space to facilitate interactive discussion of PnR tactics.
This study investigates whether awareness of color blindness can be enhanced through a virtual colorblind gaming experience. We conducted two user studies—one with undergraduates and one with working adults—using ‘colorblind’ color schemes in a digital game to explore whether such an experience enhances their assessment of their own knowledge about color blindness and awareness of its associated disadvantages in society, workplaces, and private life. The results with undergraduates showed increases in general, but not much in workplace disadvantages. In contrast, the results with working adults showed a significant improvement in the assessment of knowledge but not in other aspects. Thus, a virtual colorblind gaming experience can enhance awareness of color blindness, yet the interpretation of the experience may vary significantly depending on the players’ backgrounds.