AI3D Desktop is an XR app platform that lets anyone create 3D models using AI in an intuitive and delightful way by using skeuomorphic design motifs. Innovations include: Creating virtual humans and other AI3D objects using a hybrid desktop draft-to-reticle approach, AI3D user-created-content e-commerce as a virtual catalog, and aiQuery and sQuery language layers in RealityScript to easily chain AI models and spatial computing HCI elements together (also, as represented as gears and cogs!). We showcase such approaches for all AI3D methods: text to 3D, text to image to 3D, image to 3D, 3D to 3D - and also multimodal AI3D methods for text/image/3D to scene, and text/image/3D to interactive 3D. The current implementation is for Apple Vision Pro.
Balboa Park Alive! is a series of immersive, interactive phone-based installations that leverage Augmented Reality (AR) to explore biodiversity of San Diego/Tijuana. This application combines Niantic Lightship ARDK, Mapbox and mobile hand tracking in service of situated and embodied encounters with 3D digital renderings of some of the plants, insects, and animals that make this region one of the top biodiversity hotspots in the world. Unlike related conservation-centered mobile AR apps, Balboa Park Alive! prioritizes first-hand perspective taking and evidence-based methods of inquiry, using guided movement and visualizations to invite families to engage with their surroundings and each other.
Bird listening is an application to find and appreciate ten types of wild birds by using spatial acoustic bird sounds. When the user faces the direction of a bird’s call and holds a digital device in front of him or her, the bird appears through the leaves of the tree.
CoCapture is a mobile app tailored for group photography, allowing users to easily take great group photos and stunning scenic shots during travels, outings, or social events without the need for selfie sticks, tripods, or external help. This application assists in capturing cherished moments, ensuring our unforgettable experiences endure. CoCapture’s innovative method involves users first capturing a background shot, then taking turns photographing the rest of the group with the individual who needs the group photo. Once three photos are taken, CoCapture allows users to seamlessly merge these images into a complete group photo that includes all members.
Napkinmatic Ubiquitous is a seamless spatial computing app platform connecting AI to the real world – and back again. Simply take a picture of anything (or import an image), to load an AI Swiss army knife menu that lets you “chat with/narrate the image", load relevant 3D models, chain models, or AI3D create from image to 3D [Chang 2023e], ControlNet [Zhang et al. 2023] transform the image into a masterpiece painting [Chang 2023b], add interior design to an empty room, and more - and augment it back to the real world.
The GEAR.sg App displays 3D models of The GEAR, a recently built research lab in Singapore. Using the smartphone camera, the app presents the building’s models and onsite photos on AR picture cards, which illustrate floor plans and allow users to move, flip, and combine the virtual models on them.
Applying large Language Models (LM) to unlock natural language based interactions with game NPCs is an emerging frontier. But the LMs of today are prohibitively expensive to run locally, especially on mobile platforms due to limited runtime resources, thermal issues and strong FPS requirements. In this work, we identified a series of mobile-friendly small LMs called TinyStories[Eldan and Li 2023] that can generate simple but coherent English. We repurposed this model to generate NPC dialogues. We then combined it with Unity ML-Agents, creating an NPC that can perform complex actions and interact verbally with the player. Our implementation runs on current-generation mobile platforms, at the level of performance required for real-time gaming. With this work, we offer a glimpse into the future of gaming: interacting with complex NPCs using natural speech. We also demonstrate it is possible to run LMs alongside Unity ML-Agents on mobile with great results. This will significantly enhance the accessibility and interactivity in games. Lastly, we want to attract academia and industry attention to the area of mobile / edge friendly LMs, as we believe this will be a fertile ground for exploration.
VolumeMatic is an Apple Vision Pro natural language text to interactive volume app creator app, utilizing the “chat to create" motif. The user can easily create 3D using various AI3D creation methods, such as text to 3D and image to 3D from different models and providers, utilizing an abstractified AI3D multimodal API. We utilize object detected and semantic relations among different HCI elements to enable natural language interactive spatial computing app creation. The app hopes to launch an AI3D Foundation to help accelerate the advancement and impact of AI for 3D and interactive content (AI3D). We also present the AI3D Benchmark Card to quickly summarize the results from different models, with a ground truth mesh.