ChatAvatar revolutionizes the process of creating hyper-realistic 3D facial assets with physically-based rendering (PBR) textures through AI-driven conversations, specifically text-to-avatar interactions. By leveraging advanced diffusion technology and our comprehensive Production-Ready Facial Assets dataset, we generate CG-friendly assets that adhere to industry standards and seamlessly integrate into popular platforms like Unity, Unreal Engine, and Maya. This groundbreaking technology opens up new horizons for immersive virtual experiences, pushing the boundaries of realism and interactivity.
In the real-time demo, we demonstrate how to create a musical performance using juggling movement as an instrument. We have equipped juggling balls with accelerometers, gyroscopes, and WiFi sensors. The system measures acceleration and rotation in a small HW footprint, allowing us to map various events to music. We provide hardware and software platforms to ease the creation of real-time live performances for artists and researchers alike.
A movement-driven controller can be used to create music, trigger media or lights in theater performances, serve as a lighting console or VR controller, or track performance in sports or scientific experiments. We provide OSC and MIDI APIs that are widely used in before mentioned fields.
We present an AI-based tool for interactive material generation within the NVIDIA Omniverse environment. Our approach leverages a State-of-the-art Latent Diffusion model with some notable modifications to adapt it to the task of material generation. Specifically, we employ circular-padded convolution layers in place of standard convolution layers. This unique adaptation ensures the production of seamless tiling textures, as the circular padding facilitates seamless blending at image edges. Moreover, we extend the capabilities of our model by training additional decoders to generate various material properties such as surface normals, roughness, and ambient occlusions. Each decoder utilizes the same latent tensor generated by the de-noising UNet to produce a specific material channel. Furthermore, to enhance real-time performance and user interactivity, we optimize our model using NVIDIA TensorRT, resulting in improved inference speed for an efficient and responsive tool.
We introduce live character conversational interactions in Intermediated Reality to bring real-world objects to life in Augmented Reality (AR) and Artificial Intelligence (AI). The AI recognizes live speech and generates short character responses, syncing the character’s facial expressions with speech audio. The Intermediated Reality AR warping technique allows for a high degree of realism in the animated facial expressions and movements reusing live video optical appearance direct from the device’s embedded camera. The proposed applications of Intermediated Reality with AI are exemplified through the captivating fusion of these technologies in toy interactive storytelling broadcasts and social telepresence scenarios. This innovative combination allows individuals to engage with AI characters in a natural and intuitive manner, creating new opportunities for social engagement and entertainment.
We present a novel approach for 3D content creation via a platform that connects reality to AI, on paper.
Traditional narrow collision pipelines for rigid body dynamics typically consist of calculation of collisions between primitives with a dedicated deepest point function for each primitive pair and a much slower general collision calculation between a pair of convex shapes. The mixture of different collision functions makes these methods not CPU-cache friendly and unsuitable for modern GPU hardware. For many reinforcement learning applications, data transfers become the bottle-neck of the whole pipeline.
To get a stable behavior of rigid bodies even on a flat ground without jittering, it is necessary to use multiple contact points for each pair of objects. This is achieved by either keeping track of old collision points, or by adding noise to the collision pair. Using the first approach we get a full set of collision points only after multiple frames, using the second, the computational cost largely increases by a factor equal to the number of samples.
In this project, we propose to tackle these issues by introducing: a) the quadric decomposition - a data representation for both primitives and general shapes, describing each object as a union of the intersection of planar and quadric inequalities, b) calculation of the deepest point as a maximization of depth over a set of quadric inequalities, c) prediction of the desired number of contact points as a maximization of the surface of the intersection polygon, d) Recurrent neural network (RNN) based solver to speed up the optimization process.
We present a novel live platform enhancing stage performances with real-time visual effects. Our demo showcases real-time 3D modeling, rendering and blending of assets, and interaction between real and virtual performers. We demonstrate our platform’s capabilities with a mixed reality performance featuring virtual and real actors engaged with in-person audiences.
Roblox is investing in generative AI techniques to revolutionize the creation process on its platform. By leveraging natural language and other intuitive expressions of intent, creators can build interactive objects and scenes without complex modeling or coding. The use of AI image generation services and large language models aim to make creation faster and easier for every user on the platform.
We present a novel marker-based motion capture (MoCap) technology. Instead of leveraging initialization and tracking for marker labeling as traditional solutions do, the present system is built upon real-time and low-latency data-driven models and optimization techniques, offering new possibilities and overcoming limitations currently present in the MoCap landscape. Even though we similarly begin with unlabeled markers captured with optical sensing within a capturing area, our approach diverges as we follow a data-driven and optimization pipeline to simultaneously denoise the markers and robustly and accurately solve the skeleton per frame.
Similarly to traditional marker-based options, our work demonstrates higher stability and accuracy than inertial and/or markerless optical MoCap. Inertial MoCap lacks absolute positioning and suffers from drifting, therefore, it is almost impossible to achieve comparable positional accuracy. Markerless solutions lack the existence of a strong prior (i.e., markers) to increase the capturing precision, while, due to the heavier workload, the capturing frequency cannot easily scale, resulting in inaccuracies in fast movements.
On the other hand, traditional marker-based motion capture heavily relies on high-quality marker data, assuming precise localization, outlier elimination and consistent marker tracking. In contrast, our innovative approach operates without such assumptions, effectively mitigating input noise, including ghost markers, occlusions, marker swaps, misplacement and mispositioning. This noise tolerance enables our system to function seamlessly with cameras with lower cost and specifications. Our method introduces body structure invariance, empowering automatic marker layout configuration by selecting from a diverse pool of models trained with different marker layouts.
Our proposed MoCap technology integrates various consumer-grade optical sensors, leverages efficient data acquisition, succeeds in precise marker position estimation and allows for spatio-temporal alignment of multi-view streams. Sequentially, by incorporating data-driven models, our system achieves low latency and real-time rate performances. Finally, efficient body optimization techniques further improve the final MoCap solving, enabling seamless integration into various applications requiring real-time, accurate and robust motion capture.
Concluding, real-time communities can be benefited from our MoCap which is a) affordable; with the use of low-cost equipment, b) scalable; with processing on the edge, c) portable; with easy setup and spatial calibration, d) robust; on heavy occlusions, marker removal and camera coverage and e) flexible; no need for super precise marker placement, super precise camera calibration, body calibration per actor or manual marker configuration.
Tell Me, Inge… is a cross-platform VR/XR experience that allows the public to interact with Holocaust survivor Inge Auerbacher. Users are immersed in Inge’s memories as a child survivor of the Theresienstadt ghetto, asking questions in their own words and listening to her video responses combined with hand-drawn 3D animations within a 360-degree canvas. All reconstructions of Inge’s memories were based on her personal testimony, historical documentation, and location surveys. We created a novel pipeline for authoring 3D Quill animations in the PlayCanvas WebXR engine. The system is driven by StoryFile’s Conversa platform for AI training and conversational video streaming.