Path tracing is ubiquitous for photorealistic rendering of various light transport phenomenon. At it’s core, path tracing involves the stochastic evaluation of complex & recursive integrals leading to high computational complexity. Research efforts have thus focused on accelerating path tracing either by improving the stochastic sampling process to achieve better convergence or by using approximate analytical evaluations for a restricted set of these integrals. Another interesting set of research efforts focus on the integration of neural networks within the rendering pipeline, where these networks partially replace stochastic sampling and approximate it’s converged result. The analytic and neural approaches are attractive from an acceleration point of view. Formulated properly & coupled with advances in hardware, these approaches can achieve much better convergence and eventually lead to real-time performance. Motivated by this, we make contributions to both avenues to accelerate path tracing. The first set of efforts aim to reduce the computational effort spent in stochastic direct lighting calculations from area light sources by instead evaluating it analytically. To this end, we introduce the analytic evaluation of visibility in a previously proposed analytic area light shading method. Second, we add support for anisotropic GGX to this method. This relaxes an important assumption enabling the analytic rendering of a wider set of light transport effects. Our final contribution is a neural approach that attempts to reduce yet another source of high computational load - the recursive evaluations. We demonstrate the versatility of our approach with an application to hair rendering, which exhibits one of the most challenging recursive evaluation cases. All our contributions improve on the state-of-the art and demonstrate photo-realism on par with reference path tracing.
This doctoral research focuses on generating expressive 3D facial animation for digital humans by studying and employing data-driven techniques. Face is the first point of interest during human interaction, and it is not any different for interacting with digital humans. Even minor inconsistencies in facial animation can disrupt user immersion. Traditional animation workflows prove realistic but time-consuming and labor-intensive that cannot meet the ever-increasing demand for 3D contents in recent years. Moreover, recent data-driven approaches focus on speech-driven lip synchrony, leaving out facial expressiveness that resides throughout the face. To address the emerging demand and reduce production efforts, we explore data-driven deep learning techniques for generating controllable, emotionally expressive facial animation. We evaluate the proposed models against state-of-the-art methods and ground-truth, quantitatively, qualitatively, and perceptually. We also emphasize the need for non-deterministic approaches in addition to deterministic methods in order to ensure natural randomness in the non-verbal cues of facial animation.
Simulating complex cuts and fractures robustly and accurately benefits a broad spectrum of researchers. This includes rendering realistic and spectacular animations in computer graphics and interactive techniques, as well as conducting material strength analysis in industrial design and mechanical engineering. In this thesis, we develop a graph-based Finite Element Method (FEM) model that reformulates the hyper-elastic strain energy for fracture simulation and thus adds negligible computational overhead over a regular FEM. Our algorithm models fracture on the graph induced in a volumetric mesh with tetrahedral elements. We relabel the edges of the graph using a computed damage variable to initialize and propagate the fracture. Following that, we extend graph-based FEM to simulate dynamic fracture in anisotropic materials. We further enhance this model by developing novel probabilistic damage mechanics for modelling materials with impurities using a random graph-based formulation. We demonstrate how this formulation can be used by artists for directing and controlling fracture. Finally, we combine graph-based FEM with a Galerkin multigrid method to run fracture and cutting simulation at a real-time, interactive rate even for high-resolution meshes.
Sketching is one of the most natural ways for representing any object pictorially. With the advent of Virtual Reality (VR) and Augmented Reality (AR) technologies, 3D sketching has become more accessible. But it is challenging to convert these sketches to 3D models in a manner consistent with the intent of the artist. Learning based methods provide an effective alternate paradigm for solving this classical problem of sketch based modelling in interactive platforms. Surface patches, inferred from 3D sketch strokes, can be treated as a primitive which are assembled to create complex object models. We present our proposed framework for this problem and discuss our solution to one of its core challenges. Our deep neural network based method allows users to create surfaces from a stream of sparse 3D sketch strokes. We also show integration of our method into an existing Blender based 3D content creation pipeline. This serves as a basis to solve a more complex problem of creating complete 3D object models from sparse 3D sketch strokes.
This thesis presents research on building photorealistic avatars of humans wearing complex clothing in a data-driven manner. Such avatars will be a critical technology to enable future applications such as VR/AR and virtual content creation. Loose-fitting clothing poses a significant challenge for avatar modeling due to its large deformation space. We address the challenge by unifying three components of avatar modeling: model-based statistical prior from pre-captured data, physics-based prior from simulation, and real-time measurement from sparse sensor input. First, we introduce a separate two-layer representation that allows us to disentangle the dynamics between the pose-driven body part and temporally-dependent clothing part. Second, we further combine physics-based cloth simulation with a physics-inspired neural rendering model to generate rich and natural dynamics and appearance even for challenging clothing such as a skirt and a dress. Last, we go beyond pose-driven animation and incorporate online sensor input into the avatars to achieve more faithful telepresence of clothing.
This paper briefly examines the relationship between music recordings and narrative structures found in video games. It highlights the immersive and time-consuming nature of game quests, where players invest substantial time exploring virtual worlds to reveal the narrative. The paper explores the potential of spatialised music experiences, particularly in the context of Dolby Atmos and the concept of degrees of freedom (3DoF and 6DoF). It investigates the application of spatial algorithms and game engines' middleware to create immersive music experiences that dynamically adapt to the listener's spatial and temporal position. The research acknowledges limited instances of studies on compositional narrative design frameworks specifically tailored for completely spatialised music, particularly regarding the virtual spatialisation of individual instruments. It recognizes that the prevalence of non-diegetic music in games, which serves to represent the player's emotional state, has impeded the exploration of spatialised music elements. Consequently, the paper proposes the adoption of narrative design frameworks derived from video games as a promising approach for songwriters to develop Quest Driven Spatial Songs (QDSS).
A virtual animatable hand avatar capable of representing a user’s hand shape and appearance, and tracking the articulated motion is essential for an immersive experience in AR/VR. Recent approaches use implicit representations to capture geometry and appearance combined with neural rendering. However, they fail to generalize to unseen shapes, don’t handle lighting leading to baked-in illumination and self-shadows, and cannot capture complex poses. In this thesis, we 1) introduce a novel hand shape model that augments a data-driven shape model and adapt its local scale to represent unseen hand shapes, 2) propose a method to reconstruct a detailed hand avatar from monocular RGB video captured under real-world environment lighting by jointly optimizing shape, appearance, and lighting parameters using a realistic shading model in a differentiable rendering framework incorporating Monte Carlo path tracing, and 3) present a robust hand tracking framework that accurately registers our hand model to monocular depth data utilizing a modified skinning function with blend shapes. Our evaluation demonstrates that our approach outperforms existing hand shape and appearance reconstruction methods on all commonly used metrics. Further, our tracking framework improves over existing generative and discriminative hand pose estimation methods.
Conventional cameras, inspired by human vision, only capture the intensity and color of light, restricting our ability to fully comprehend the visual environment. My thesis aims to transcend the limitations of traditional RGB imaging by delving into three underexplored light properties: polarization, ray direction, and time-of-flight. Firstly, this thesis leverages polarization of surface reflections for geometry estimation, reflectance separation, and lighting estimation, resulting in accurate estimation and editing of visual appearance. Then this thesis demonstrates how to leverage reflections on multi-view images of glossy objects to recover the reflected 3D environment, essentially turning the glossy objects into cameras. Finally, this thesis develops imaging systems that exploit the distribution of time-of-flight of photons, also known as transients, to reconstruct objects hidden from the camera’s line of sight. By harnessing the full potential of light, this work lays the foundation for next-generation vision systems that can recover information seemingly invisible to human eyes and conventional cameras.