Nadia Progress Report #3
Arrsh Khusaria / December 2025 (1041 Words, 6 Minutes)
PROGRESS REPORT WOHOO!
My exams just got finished and I thought I should work on NADIA before another big break for final exams, so here is the devlog.
The Goal: Build a spatial audio engine for a 5.1.4 Dolby Atmos setup. The Problem: I don’t own a 5.1.4 setup. I have a standard 2.1 system and a laptop.
This meant I couldn’t rely on “hearing” if it worked. I had to rely on Math. If the vector calculus is correct, the sound must be correct.
Here is the deep dive into how I reverse-engineered the math of sound.
1. Relative Coordinates
The first challenge was coordinates. An audio engine cannot care if your room is 3 meters wide or 30 meters wide.
If I put a sound at [1.5, 3.0], is that the corner or the center?
The Analogy:
I imagined the room as a Unit Cube with vertices ranging from -1 to 1. No matter how much you stretch the room, the “center” is always 0 and the “walls” are always 1.
- Center (Head):
(0, 0, 0) - Left Wall:
X = -1.0 - Right Wall:
X = +1.0
This makes the math “Room Agnostic.” Here is the normalization logic I wrote in relative_space.py:
# Center the room
room_center = dimensions / 2.0
# Calculate Relative Position
# If speaker is at [0, 5.5], and center is [2.75, 2.75]...
# We shift it to [-2.75, 2.75] and divide by center to get [-1.0, 1.0]
relative_pos = (absolute_pos - room_center) / room_center
Now, every speaker in the world exists in the same -1 to +1 universe.
A different way to understand this is that the relative coordinates acts as percentages which can be applied to the dimensions of the room.
2. The “Vector” in VBAP
Now, how do we tell the speakers how loud to be? I used Vector Base Amplitude Panning (VBAP).
The Analogy: Imagine you are mixing paint.
- Speaker A is Red paint.
- Speaker B is Blue paint.
- The Sound is Purple.
To get Purple, you need 50% Red + 50% Blue.
If you want Red-Violet, you need 80% Red + 20% Blue.
In Linear Algebra, the “Colors” are Vectors. If I want a sound P to appear somewhere between two speakers (L and R), I describe P as a mixture of L and R.
\[P = g_1 \cdot \vec{L} + g_2 \cdot \vec{R}\]The variables \( g_1 \) and \( g_2 \) are the scaling factors. They are also the Volume Gains.
The Math Trick: To find , we treat the speakers as a Basis Matrix. If we multiply the speaker matrix by the gains, we get the position. So, if we invert the speaker matrix and multiply it by the position, we get the gains.
\[\vec{g} = \mathbf{M}^{-1} \cdot \vec{P}\]3. The “Pancake” Crash
The math above works for one pair of speakers. But a room has 5, 7, or 10 speakers. I needed to stitch them together into a mesh of triangles.
My first attempt was the Convex Hull algorithm. Imagine wrapping a rubber sheet around all your speakers.
- Expectation: A beautiful geodesic dome.
- Reality: A crash.
LinAlgError: Singular Matrix.
The Diagnosis: Most 5.1 setups are flat. They are a ring, not a sphere. When you try to wrap a 3D volume around flat points, the “volume” of the triangles is zero. You cannot invert a matrix with zero volume. It’s the matrix equivalent of dividing by zero.
The Fix: The Dimension Detective
I realized I had two different shapes, and I needed two different analogies to solve them. I wrote layout_analyzer.py to inspect the room topology.
Case A: The Pizza (2D Ring) If the speakers are flat (Vertical Spread < 0.5):
- I ignore height completely.
- I sort the speakers by angle around the center.
- I connect neighbors like Slices of a Pizza.
- Sound pans from Slice A -> Slice B -> Slice C.
Case B: The Dome (3D Atmos) If the speakers are tall (Vertical Spread > 0.5):
- Now we have a volume.
- I switch to the Geodesic Dome logic (3D Convex Hull).
- Sound pans across the surface of the dome.
if spread > 0.5:
# "Dome Mode" - Use 3D Convex Hull
return "3D", speakers
else:
# "Pizza Mode" - Force Z to 0.0.
return "2D", flattened_speakers
4. The “Baker” (The Cheat Sheet Optimization)
We could run that np.linalg.inv() function every frame.
But matrix inversion is \( O(n^3) \). Doing that 60 times a second is wasteful.
I realized: The triangles don’t move. The relationship between Speaker A, Speaker B, and Speaker C is constant.
So I built “The Baker.”
It pre-calculates the inverted matrix for every triangle in the mesh and saves it to panes.json.
The Output (Simple Lookup):
{
"ids": [0, 1, 6],
"inv_matrix": [[0.2, -0.4, 0.0], [-0.2, 0.4, 0.0], [0.0, 0.0, 1.0]]
}
Now, the runtime engine is incredibly dumb.
- Find which “Pizza Slice” or “Dome Triangle” the sound is in.
- Load the matrix from the cheat sheet.
Gain = Position_Vector * Matrix.
5. The “Underground” Bug (The Cage)
I verified the math with a test script.
- Ceiling Test:
(0, 0, 1.0)-> Hits Top Speakers. - Floor Test:
(0, 0, -1.0)-> Disaster.
The gains were huge negative numbers.
Why?
My virtual speakers were at ear height (Z = -0.1). I was trying to play a sound at the floor (Z = -1.0).
Geometrically, I was asking the math to reconstruct a point that was outside the vector cone.
The Analogy: The speaker mesh is a Cage. You can fly the sound anywhere inside the cage or on the bars (the mesh surface). If you try to put the sound outside the cage (underground), the math breaks because you can’t have negative sound.
Next Steps
I have the map (panes.json). I have the math verified.
The Python prototype is complete.
Next up is writing the actual real-time sound stuff in C, I wont port this code to C since its all pre-processing and to make my life easier. I will do it if it ever becomes a bottleneck.
As you could’ve already guessed I was really hungry while writing this so I am going to grab a pizza, this eating session might last few days to a few months but I’ll come back eventually.