Thanks to the optical duality between a projector and a camera, an image from the viewpoint of the projector can be computationally generated using camera images;however, to achieve dual photography in real time, the geometric configuration of the projector and the camera must be strongly constrained. In this research, to substantially relax this constraint, we improved the optical components of the projector–camera system by employing a stabilizer to guarantee that both epipolar planes are parallel. As a result, it is possible to move the projector to change the synthesized virtual viewpoint more freely in real–time.
Extracting the direct light component from light transport helps to enhance the visibility of a scene. In this paper, we describe a method to improve the visibility of the target object by capturing transmissive rays without scattering rays using a synchronized projector–camera system. A rolling shutter camera and a laser raster scanning projector are placed side-by-side, and both epipolar planes are optically aligned on a screen plane which is place between the projector and camera. This paper demonstrates that our method can visualize an internal object inside diluted milk.
Extending the non-uniform rational B-spline (NURBS) representation to arbitrary topology is one of the most important steps to define the iso-geometric analysis (IGA) suitable geometry. The approach needs to be NURBS-compatible and can handle non-uniform knot intervals. To achieve this goal, we present a novel patching solution which define one Bézier patch for each non-zero knot interval control grid face. The construction can reproduce the bi-cubic NURBS in the regular face and define bi-quintic Bézier patches for irregular faces. The method can also support non-uniform sharp features along the extraordinary points. Experimental results demonstrate that the new surfaces can improve the surface quality for non-uniform parameterization.
”Bee Wave” is a kinetic art sculpture driven by a shaped memory alloy. It is driven by the material properties of the memory alloy, which is heated and contracted to pull the lever on the drive to pull the acrylic wings, making them sway like the wings of a bee in nature (Figure 1). Since no motor is used, it is quiet and elegant; the cell of the triangular cone is like a cellular automata, which has a mechanism to receive, store and process signals transmitted between cells and cells, and can influence the neighboring cells, from the individual triangular cone to influence the three neighboring cells, through the rule of triggering all the way to the group, like a wave cycle; and in the process of implementation into the nature of the existence of In the process of the actual work, the biological characteristics and appearance of the bionic design are incorporated, such as the adoption of a biological appearance, the mechanism of the bee to capture the hidden audio conversion behavior, the memory alloy of the shape like a living creature, and the function of the cellular automaton to transmit signals in the game of life [Conway 1970], and the above mechanisms are ”mixed” to create this creation (Bee Wave).
This work proposes the wirelessly powered underwater display that can control its position by changing its own buoyancy. The proposed display is equipped with a motor-based actuator to control its effective volume for buoyancy control. The newly proposed underwater AC wireless power transfer (WPT) function drives the power-consuming OLED display and actuation motor. This work confirmed that the proposed WPT function could deliver mW-class power even to hundreds of mm3-class volume LEDs.
Micro-motions are often difficult to incorporate in Virtual Reality (VR) while macro-motions are a popular interaction method, due to technological limitations with VR tracking methods. In this poster, we introduce VRTwitch, a forearm-mounted wearable device that is able to sense micro hand motions. VRTwitch uses an array of reconfigurable miniaturized radar sensors placed around the hand to capture subtle finger movements for gesture detection towards enhanced interaction in VR space. We created a simple interactive VR shooting game that requires precise finger motion for virtual gun manipulation as a demonstration.
Existing haptic interfaces providing both tactile and kinesthetic feedback for virtual object manipulation are still bulky, expensive and often grounded, limiting users’ motion. In this work, we present a wearable, lightweight and affordable embedded system aiming to provide both tactile and kinesthetic feedback in 3D applications. We created a PCB for the circuitry and used inexpensive components. The kinesthetic feedback is provided to the user’s hand through a 3D-printed exoskeleton and five servo motors placed on the back of the glove. Tactile feedback is provided to the user’s hand through fifteen coin vibration motors, placed in the inner side of the hand and vibrating at three levels. The system is ideal for prototyping and could be customized, thus, making it scalable and upgradable.
This work introduces Self-Stylized Neural Painter (SSNP) creating stylized artworks in a stroke-by-stroke manner. SSNP consists of digit artist, canvas, style-stroke generator (SSG). By using SSG to generate style strokes, SSNP creates different styles paintings based on the given images. We design SSG as a three-player game based on a generative adversarial network to produce pure-color strokes that are crucial for mimicking the physical strokes. Furthermore, the digital artist adjusts parameters of strokes (shape, size, transparency, and color) to reconstruct as much detailed content of the reference image as possible to improve the fidelity.
Fast Denoising Monte Carlo path tracing is very desirable. Existing learning-based real-time methods concatenate auxiliary buffers (i.e., albedo, normal, and depth) with noisy colors as input. Such structures cannot effectively extract rich information from auxiliary buffers, however. In this work, we facilitate the U-shape kernel-prediction network with a sparse auxiliary feature encoder. Sparse convolutions can focus solely on regions whose inputs have changed and reuse the history features in other regions. With sparse convolutions, the computational complexity of the auxiliary feature encoder is reduced by 50-70% without apparent performance drops.
Recently Top-k fake sample selection has been introduced to provide better gradients for training Generative Adversarial Networks. Since the method does not guarantee class balance of selected samples in class conditional GANs, certain classes can be completely ignored in the training. In this work, we propose class standardized critic score based sample selection which enables class balanced sample selection. Our method achieves improved FID score and Intra-FID score compared to prior Top-k selection.
This paper introduces a U-Net-based neural network to determine line structure beyond junctions more accurately than the previous work [Guo et al. 2019]. In addition to the input of the previous work, we input 3D information to our neural network. We also propose a method to generate the training dataset automatically. The rendering results by the stylized line rendering [Uchida and Saito 2020] show that our neural network improves the streams of strokes.
We propose a transformer-based model to learn Square Word Calligraphy to write English words in the format of a square that resembles Chinese characters. To achieve this task, we compose a dataset by collecting the calligraphic characters created by artist Xu Bing, and labeling the position of each alphabet in the characters. Taking the input of English alphabets, we introduce a modified transformer-based model to learn the position relationship between each alphabet and predict the transformation parameters for each part to reassemble them as a Chinese character. We show the comparison results between our predicted characters and corresponding characters created by the artist to indicate our proposed model has a good performance on this task, and we also created new characters to show the “creativity” of our model.
In recent years, texture mapping in 3D modeling has been remarkably improved for realistic rendering. However, small error in reconstructed 3D geometry makes serious error in texture mapping. To address the problem, most of prior methods are devoted to refinement of 3D geometry without visual clues. In this work, we refine 3D geometry based on color consistency and surface normal using a deep neural network. Our method optimizes the location of each vertex maximizing the quality of related textures.
In recent studies, object classification with deep convolutional neural networks has shown poor generalization with occluded objects due to the large variation of occlusion situations. We propose a part-aware deep learning approach for occlusion robust object classification. To demonstrate the robustness of the method to unseen occlusion, we train our network without occluded object samples in training and test it with diverse occlusion samples. Proposed method shows improved classification performance on CIFAR10, STL10, and vehicles from PASCAL3D+ datasets.
In this poster we present eyemR-Talk, a Mixed Reality (MR) collaboration system that uses speech input to trigger shared gaze visualisations between remote users. The system uses 360° panoramic video to support collaboration between a local user in the real world in an Augmented Reality (AR) view and a remote collaborator in Virtual Reality (VR). Using specific speech phrases to turn on virtual gaze visualisations, the system enables contextual speech-gaze interaction between collaborators. The overall benefit is to achieve more natural gaze awareness, leading to better communication and more effective collaboration.
We propose a drone flight training system (DFT MR) with a novel mixed reality (MR) technique to lower the risk of drone crashes for drone flight beginners and enhance their actual feelings of drone control. Recently, virtual reality (VR) or augmented reality (AR) has been widely employed in training systems. However, the current drone training systems in AR usually lack physical feedback from wind or air resistance. In addition, the VR training environment and the real world were considered to be inequivalent. The proposed MR system integrates the advantages of both VR and AR by transmitting the high-fidelity simulated physical control feedback, such as obstacle collision and wind influence, from VR to the MR device. The system not only benefits those who have VR sickness but also provides novice drone operators virtual crash-free drones for training to get their drone licenses.
Existing stage monitor mixing systems are inefficient and cannot accommodate the communication between the musicians and sound engineers. We introduce ARMixer, which allows musicians to perform self-stage monitor mixing through gestures in augmented reality to provide an intuitive mixing experience. We performed two usability tests and found that ARMixer is acceptable to the user and has excellent psychoacoustic intuitiveness in terms of mixing parameter controls by gestures and identifying mixing target.
Augmented Reality (AR) and Virtual Reality (VR) techniques are widely used for people to experience and interact with 3D digital content. However, the user has to choose either AR or VR separately with different devices for different interaction modalities. This work presents ARinVR, a novel concept that seamlessly and synchronously integrates handheld mobile AR and head-worn VR to allow users to see an augmented scenario on a phone screen and in a VR space at the same time. We developed a prototype system that enabled users to operate a smartphone and observe mobile AR scenarios in VR environment as naturally as in the real world. The system extends the AR space and occludes the physical surroundings, enabling users to stay focused on task performance and enhancing spatial perception.
In this research, we investigate the impact of real-time biofeedback-based emotion adaptive Virtual Reality (VR) environments on the immersiveness, game engagement, and flow state using physiological information such as Electroencephalogram (EEG), Electrodermal Activity (EDA), and Heart Rate Variability (HRV). For this, we designed VR-Wizard, a personalized emotion-adaptive VR game akin to a Harry Potter experience with an objective to collect items in the forbidden forest. The users initially train the system through a calibration process. Next, they explore the forest with adapting environmental factors based on a ’MagicMeter’ indicating the user’s real-time emotional states. The overall goal is to provide more personalized, immersed, and engaging emotional virtual experiences.
We present an MR system for experiencing the cartoon-like event of sliding down a cliff and stopping by sticking a knife, with enhancement of tactile and muscle stimuli. A player grabs a knife device and presses it against a wall device that rotates upwards in contrast to the player who virtually slides down. We demonstrate an MR content in which a player experiences sliding down the virtual cliff while handling the knife device on the rotating wall device.
Augmented Reality (AR) and Artificial Intelligence (AI) technologies have become ubiquitous in the public’s daily lives. In the AR space, many notable industry players are engaged in bringing smaller, more ergonomic, and more powerful AR headsets and smart-glasses to the market, with the eventual goal of these devices to replace mobile phones for a wide variety of tasks. With the improvements in computing power on AR devices, a similar surge in AI-powered technologies running on these devices is also evident.
In this work, we present an AR application that explores Human-AI interaction in a computer vision context by providing an integrated platform for video analytics and AI model fine-tuning.
In this paper, we present a prototype system that is capable of associating real objects with virtual models and turning the table top into imaginary virtual scenes. A user can interact with these objects when she or he is immersed in the virtual environment. To accomplish this goal, a vision-based system is developed to online recognize and track the real objects in the scene. The corresponding virtual models are retrieved based on their tactile shapes. They are displayed and moved on a head-mounted display (HMD) according to tracked object poses. The experiment demonstrates that our prototype system can find reasonable association between real and virtual objects, and users are interested in the novel interaction.
The recent emergence of point cloud streaming technologies has spawned new ways to digitally perceive and manipulate live data of users and spaces. The graphical rendering limitations prevent state-of-the-art interaction techniques from achieving segmented bare-body user input to manipulate live point cloud data. We propose BridgedReality, a toolkit that enables users to produce localized virtual effects in live scenes, without the need for an HMD nor any wearable devices or virtual controllers. Our method uses body tracking and an illusory rendering technique to achieve large scale, depth-based, real time interaction with multiple light field projection display interfaces. This toolkit circumvented time-consuming 3D object classification, and packaged multiple proximity effects in a format understandable by middle schoolers. Our work can offer a foundation for multidirectional holographic interfaces, GPU simulated interactions, teleconferencing and gaming activities, as well as live cinematic quality exhibitions.
Currently, training in restaurants often take the form of on-the-job training (OJT), where trainees are accompanied by trainers who guide them as they serve the customers. However, it is difficult for trainers to understand the cognitive and decision-making processes of the trainees objectively, which makes OJT guidance extremely difficult. Objective measurement indices are also needed. To mitigate these issues, we developed a VR job-training system that focuses on two elements that pose challenges for the actual situation: awareness and priority-setting.
This research is based on the concept of computing being embedded within the tangible product that acts as both input and output device eliminating the need of traditional computers for any feedback or guidance.The idea is inspired from traditional geoboard that focuses on the age group from 5 to 8 years old. The main goal is to integrate technology seamlessly into physical manipulative and while using this product kids will be able to make complex shapes that offer kids with memorable learning experience.
Small, local group of users who share private resources (e.g., families, university labs, business departments) usually have limited usable authentication needs. For these entities, existing authentication solutions either require excessive personal information (e.g., biometrics), do not distinguish each user (e.g., shared passwords), or lack security measures when the access key is compromised (e.g., physical keys). We propose an alternative solution by designing HWAuth: an inclusive group authentication system with a shared text that is uniquely identifiable for each user. Each user shares the same textual password, but individual handwriting styles of the text are used to distinguish each user. We evaluated the usability and security of our design through a user study with 30 participants. Our results suggest that (1) users who enter the same shared passwords are discernible from one another, and (2) that users were able to consistently login using HWAuth.
This work introduces a prediction-driven real-time dynamics method that uses a graph-based state buffer to minimize the cost of mispredictions. Our technique reduces the average time needed for dynamics computation on the main thread by running the solver pipeline on a separate thread, enabling interactive multimedia applications to increase the computational budget for graphics at no cost perceptible to the end user.
This paper presents a novel method for synthesizing sound effects for fluid animation. Previous approaches synthesize the sound for fluids by physically-based simulation, but these approaches need a huge computational cost. To address this, we propose a data-driven method for synthesizing sound effects for fluids. In this paper, we focus on liquid sound. A liquid sound database which consists of a set of recorded sound clips is prepared in advance, and then the most suitable sound clip for an input liquid motion is automatically retrieved from the database. The retrieval is achieved by comparing a waveform of the sound and a history of velocity of the liquid motion computed by the simulation. The velocity history is computed for the regions where the liquid sound is expected to occur. Then, a distance between the velocity history and the waveform of each sound clip in the database is calculated, and our system chooses the clip with the minimum distance. Our method achieves fast synthesis of liquid sound for simulated liquid motion, once the database is prepared. In this paper, we use a small database containing a few sound clips and evaluate the effectiveness of our retrieval approach, as a preliminary experiment of this research.
This paper proposes an interactive semi-automatic system for manga colorization. In our system, users can colorize monochrome manga images interactively by scribbling the desired colors. The proposed method creates a high quality colorized image by inputting the original monochrome image after grayscale adjustment and the flat colored image generated from the scribbles to a colorization network. Experiments show that the colorized results yielded by the proposed method are much better than existing methods.
We propose a hierarchical abstract drawing method for many 3D objects which are densely arranged and have almost the same shape. First, an abstract drawing of all the 3D objects is drawn based on a couple of the primal colors, which is one of the global properties of all the 3D objects. Then, an abstract painting is added focusing on a part of each object as local properties. Several results of abstractly rendered scenes are shown, and are confirmed the effectiveness of our method.
MatCap is an expressive approach to shade 3D models and is promising for the production of typical Japanese-style cel-shaded animations. However, we experienced an asset management problem in our previous short film production; we needed to manually create many MatCap assets to achieve variations shot by shot or even frame by frame. In this work, we identify requirements for shading systems in Japanese animation production and describe our procedural MatCap system, which satisfies the requirements. Procedural MatCap generates customizable MatCap assets fully procedurally in run time, which drastically improves the asset manageability.
There is high demand for dynamic and occlusion-robust illumination to improve lighting quality for portrait photography and assembly. Multiple projectors are required for the light field to achieve such illumination. This paper proposes a dynamic and occlusion-robust illumination technique by employing a light field formed by a lens array instead of using multiple projectors. Dynamic illumination is obtained by introducing a feedback system that follows the motion of the object. The designed lens array incorporates a wide viewing angle, making the system robust against occlusion. The proposed system was evaluated through projections onto a dynamic object.
In this paper, we propose to render synthetic underwater images considering revised image formation model, towards modeling restoration. Underwater images suffer from low contrast, color cast and haze due to dynamically varying properties of water, floating particles and submerged sediments. Due to this, light reaching the object, from the surface is subjected to direct scattering, backscattering and forward scattering. We propose to model this varying nature of light to render synthetic underwater images based on depth information and inherent optical properties of Jerlov water types.
This paper proposes a light source selection for photon mapping combined with recent deep-learning-based importance sampling. Although applying such neural importance sampling (NIS) to photon mapping is not difficult, a straightforward approach can sample inappropriate photons for each light source because NIS relies on the approximation of a smooth continuous probability density function on the primary sample space, whereas the light source selection follows a discrete probability distribution. To alleviate this problem, we introduce a normalizing flow conditioned by a feature vector representing the index for each light source. When the neural network for NIS is trained to sample visible photons, we achieved lower variance with the same sample budgets, compared to a previous photon sampling using Markov chain Monte Carlo.
Conventionally, it has been difficult to realize both photorealistic and geometrically consistent 3DCG rendering on head mounted displays (HMDs), due to the trade-off between rendering quality and low motion-to-photon latency (M2PL). In order to solve this problem, we propose a novel rendering framework, where the server renders RGB-D images of a 3D model with the optimally arranged rendering viewpoints, and the client HMD realizes geometric compensation of M2PL by employing depth image based rendering (DIBR). Experiments with real smart glasses show that the proposed method can display binocular images closer to the ground truth than the conventional approaches.
Shadows are an important factor in the perception of object position and shape. In this study, we investigated the effect of shadows on the perception of the shape of a mid-air image by projecting shadows of different shapes onto a mid-air image that is displayed in real space. Specifically, participants viewed one oval cylinder with a shadow and one oval cylinder without a shadow individually and were forced to choose which had the greater thickness in the depth direction. As a result, we found that shadow shapes possibly changed the perception of the thickness of the mid-air image.
In this work, we present an analysis tool to help golf beginners compare their swing motion to the swing motion of experts. The proposed application synchronizes videos with different swing phase timings using the latent features extracted by a neural network-based encoder and detects key frames where discrepant motions occur. We visualize synchronized image frames and 3D poses that help users recognize the differences and key factors that can be important for their swing skill improvement.
The modern deep neural network has allowed an applicable level of speech-driven facial animation, simulating natural and precise 3D animation from speech data. Regardless, many of the works show weakness in drastic emotional expression and flexibility of the animation. In this work, we introduce emotion guided speech-driven facial animation, simultaneously proceeding with classification and regression from the speech data to generate a controllable level of evident emotional expression on facial animation. Performance using our method shows reasonable expressiveness of facial emotion with controllable flexibility. Extensive experiments indicate that our method generates more expressive facial animation with controllable flexibility compared to previous approaches.