SIGGRAPH Real-Time Live! '24: Special Interest Group on Computer Graphics and Interactive Techniques Conference Real-Time Live!

Full Citation in the ACM Digital Library

AI in mixed reality - Copilot on HoloLens: Spatial computing with large language models

Andy Klein
Ethan Arnowitz

Mixed reality together with AI presents a human-first interface that promises to transform operations. Copilot can assist industrial workers in real-time with speech and holograms; generative AI is used to search technical documentation, service records, training content, and other sources. Copilot then summarizes to provide interactive guidance.

Enhancing Narratives with SayMotion's text-to-3D animation and LLMs

Kevin He
Annette Lapham
Zenan Li

SayMotion, a generative AI text-to-3D animation platform, utilizes deep generative learning and advanced physics simulation to transform text descriptions into realistic 3D human motions for applications in gaming, extended reality (XR), film production, education and interactive media. SayMotion addresses challenges due to the complexities of animation creation by employing a Large Language Model (LLM) fine-tuned to human motion with further AI-based animation editing components including spatial-temporal Inpainting via a proprietary Large Motion Model (LMM). SayMotion is a pioneer in the animation market by offering a comprehensive set of AI generation and AI editing functions for creating 3D animations efficiently and intuitively. With an LMM at its core, SayMotion aims to democratize 3D animations for everyone through language and generative motion.

GenUSD: 3D scene generation made easy

Tsung-Yi Lin
Chen-Hsuan Lin
Yin Cui
Yunhao Ge
Seungjun Nah
Arun Mallya
Zekun Hao
Yifan Ding
Hanzi Mao
Zhaoshuo Li
Yen-Chen Lin
Xiaohui Zeng
Qinsheng Zhang
Donglai Xiang
Qianli Ma
J.P. Lewis
Jingyi Jin
Pooya Jannaty
Ming-Yu Liu

We introduce GenUSD, an end-to-end text-to-scene generation framework that transforms natural language queries into realistic 3D scenes, including 3D objects and layouts. The process involves two main steps: 1) A Large Language Model (LLM) generates a scene layout hierarchically. It first proposes a high-level plan to decompose the scene into multiple functionally and spatially distinct subscenes. Then, for each subscene, the LLM proposes objects with detailed positions, poses, sizes, and descriptions. To manage complex object relationships and intricate scenes, we introduce object layout design meta functions as tools for the LLM. 2) A novel text-to-3D model generates each 3D object with surface meshes and high-resolution texture maps based on the LLM’s descriptions. The assembled 3D assets form the final 3D scene, represented as a Universal Scene Description (USD) format. GenUSD ensures physical plausibility by incorporating functions to prevent collisions.

Real-time AI and the Future of Creative Tools

Diego Rodriguez Prado
Victor Perez
Mihai Petrescu
Titus Ebbecke
Erwann Millon
Tianpei Gu

KREA is a novel platform that integrates AI into the creative process, aiming to enhance how artists interact with artificial intelligence. This system combines recent advances in AI research, distributed systems, and human-computer interaction to facilitate real-time visual content creation. KREA’s architecture, which we term an “inverted videogame” system, enables AI-powered creation through web browsers, addressing challenges in latency, state management, and reliability in dynamic GPU compute environments. This paper presents the key innovations of KREA, including its unique distributed architecture, real-time AI generation capabilities, and user interface designed for intuitive AI-assisted creativity.

Mesh Mortal Kombat: Real-time voxelized soft-body destruction

Tim McGraw

Mesh Mortal Kombat uses a new non-physical voxel constraint based on Gram-Schmidt orthonormalization within a position-based dynamics (PBD) framework. After the input mesh is voxelized, we create 8 PBD particles which are constrained to approximate each voxel. Breakable face-to-face constraints between voxels, which are also based on Gram-Schmidt, are created for pairs of neighboring voxels. The regular voxel grid allows efficient parallel processing in a compute shader using Gauss-Seidel iterations with only 1 voxel constraint partition and 3 face constraint partitions. This allows us to minimize the number of compute shader dispatches and memory barriers per frame. Destruction is achieved by disabling face-to-face voxel constraints when the strain limit is exceeded. Anisotropic materials can be created by setting different strain limits on x-, y-, and z- faces. This allows meshes to be peeled into sheets, shredded into strips, or fractured into voxels.

MOVIN TRACIN' : Move Outside the Box

Byeoli Choi
Deok-Kyeong Jang
Dongseok Yang
Deok-Yun Jang

We introduce MOVIN TRACIN’, an advanced AI-driven motion capture system that leverages a single LiDAR sensor for precise, real-time, full-body tracking. TRACIN’ streamlines the user experience by eliminating the need for traditional suit markers or body-mounted sensors, while delivering motion capture accuracy comparable to conventional systems. The system captures full-body motion at 60 frames per second by upscaling from a 20 fps LiDAR point cloud sequence, making it particularly effective for applications demanding high responsiveness, such as live streaming, metaverse platforms, gaming, and interactive media art.

Unlike traditional motion capture systems that often require multiple sensors or high-quality cameras with extensive calibration, our approach uses a single LiDAR sensor with instant calibration. This significantly reduces setup costs, time, and effort, making the system accessible to general users without the need for expert mocap engineers.

To address the limitations of a single LiDAR setup, such as self-occlusion, we utilize a generative AI model that generates the most plausible output pose from the learned motion distribution, even with sparse or incomplete input point clouds. Our model maintains real-time performance with a fast inference time of around 50 ms, ensuring suitability for real-time applications.

MOVIN TRACIN’ also supports an API-SDK for developers, enabling easy integration and streaming to game engines such as Unity and Unreal.

In summary, MOVIN TRACIN’ represents a significant advancement in motion capture technology, offering:

Phases: Augmenting a Live Audio-Visual Performance with AR

Dávid Maruscsák
Brett Bolton
Bent Stamnes
Matt Swoboda
Christian Sandor

Our performance combines stage visuals with Augmented Reality (AR) to demonstrate a novel way of experiencing live events. Our goal is to enhance the sociability of such events by showcasing the potential of AR glasses over smartphone technology, which forces the audience to look at their phones and isolates them from their environment. Our project explores how the latest AR and visual technology enhance the spectator experience while retaining the social and inclusive environment of live performances.

Rodin: 3D Asset Creation with a Text/Image/3D-Conditioned Large Generative Model for Creative Frontier

Qixuan Zhang
Longwen Zhang

"Rodin" leverages a Latent Diffusion Transformer and a 3D ConditionNet to facilitate the rapid generation of 3D assets from text, images, and direct inputs. This system streamlines the creation process, producing production-ready models optimized for real-time engines. This "Real-Time Live!" highlights Rodin’s innovative framework, its seamless integration into professional workflows, and its transformative impact on the 3D content creation industry.

Warudo: Interactive and Accessible Live Performance Capture

Man To Tang
Jesse Thompson

Warudo innovates virtual performances (VTubing) using accessible tracking technologies, procedurally-generated avatar motions, and a visual programming system to customize audience interactions. Its seamless approach enables creators with no technical background to design unique, collaborative live performances. We demonstrate an immersive and real-time VTubing performance driven by the SIGGRAPH audience.

Revolutionizing VFX Production with Real-Time Volumetric Effects

Oleksandr Petrenko
Oleksandr Puchka
Alex Klimenko

Zibra AI introduces a unified approach, integrating both the creation and rendering of VFX directly within game engines, aiming to streamline and enhance the production process. The new workflow is empowered by two major innovations: ZibraVDB — an OpenVDB compression tool that enables real-time rendering of volumetric VFX, and Zibra Effects — a highly performant real-time VFX simulation technology. Both tools support in-engine workflows for Unreal Engine, Unity, or any other custom engine. ZibraVDB achieves up to a 150x compression rate while maintaining the high visual fidelity required for advanced production workflows such as virtual production and AAA game development. Combined with a highly efficient custom render pass, it achieves significant rendering performance. Zibra Effects enables the physics-based VFX simulation of liquids, smoke, and fire directly inside a game engine. It leverages proprietary AI-based SDF compression technology that reduces collider memory footprint by 40x on average. In combination with ZibraVDB, it covers the VFX creation workflow end-to-end: users can choose between real-time interactive effects or cache them into ZibraVDB format for efficiency