Rigging Magic

Ishayu Shikhare · Aidan Vogt

We are going to implement a physics simulation of a Magic 8 Ball on an NVIDIA GPU. We will simulate the fluid dynamics and the interaction between a die and the fluid, as well as an encasing body and the fluid. We will compare different design approaches to maximize performance while keeping the physics as realistic as possible.

Summary

We are going to simulate the fluid dynamics and the interaction between a die and the fluid as well as an encasing body and the fluid to simulate a magic 8 ball. This will be done as much as possible on the GPU — the goal is to simulate each particle on a different thread.

We aim to target the GHC GPUs by writing our simulator in CUDA. The workload is largely parallelizable across water particles, with interesting divergence cases since some particles will be interacting with the die.

Background

We will be using smoothed particle hydrodynamics (SPH) for the fluid simulation. The die will be regarded as one particle, and we will consider all of the various influences on the die in parallel. We think that every single step of this problem benefits from parallelism.

Pseudocode

High-level kernel structure and synchronization plan:

Die is shared
Die has interactions array

Each block writes deltas from all interacting particles into interactions array
Die sums array and then gets change

Loop over physics steps:
  Bucket all particles
  Spatial binning
  Thread sync

  Run fluid kernel
    Thread = particle and roughly block = bucket
    Figure out all interactions
    Figure out what threads are touching die
    Add die interactions to shared array
    Figure out who’s touching outside boundary
    Fix all particles running outside boundary
  Thread sync

  Run die kernel
    Sum all forces on die
    Update die

Challenges

Efficient neighbor search. In order for water particles to interact with each other, it needs to be able to know all the other water particles that are close to it. A brute force approach will be \(O(N^2)\) with the number of particles, which is unacceptable. We will need to experiment with more efficient particle representations that allow for finding neighbors faster.

Accumulating particle effects on the die. Every timestep, some of the particles will be interacting with the die. In order to update the die for the next timestep, we need a way to efficiently reduce all the contributions from those particles into a net force to be applied to the die.

Load balancing. If a thread has to do extra work when its particle is interacting with the die, then those threads will take longer — contributing to poor load balancing. We will need to address this through smarter work assignment schemes, or by splitting the particle processing into multiple phases.

Additional questions we’ll need to answer:

Design Tradeoffs

Platform Choice

We hope to target GHC and focus on using CUDA. This makes sense for our workload because it’s immensely parallel and in theory each particle is “simple” and does very similar things to each other, with divergence primarily around die/boundary interactions.

Resources: we will run our simulation on the GHC machines, and we will also take into account implementations found in research papers or open source projects for inspiration.

Goals and Deliverables

Plan to achieve

A realistic simulation of a single die suspended in a sphere of uniform fluid (every particle is the same) that runs and renders in real time. A demo suitable for the poster session, prioritizing a reasonable framerate (>= 30fps) while keeping approximations to a minimum.

Hope to achieve

Schedule

Week Focus
1 Learn the basic physics and scaffold out a CPU implementation.
2 Finish CPU implementation and vibe code a visualizer.
3 Start porting over to GPU.
4 Finish porting over to GPU.
5 Final optimizations and report.