Background

Hand Tracking: If you are curious about what hand-tracking is, think VR headsets without the controllers. Here is a company video showing a couple of the spatial interactions to a nice marketing jingle: 1 minute hand tracking demo

Ultraleap

Our hand tracking software is deployed in various commercial settings but mainly in AR and VR headsets. Varjo and Pico both include our hand tracking, and we also sell the hardware and softawre for touch-free kiosks used in restaurants and and airports, namely the 3di.

Aspirationally, the hand tracking is paired with a haptic feedback system, so you can feel the virtual objects you are interacting with. The haptic system is very much a WIP, but the hand tracking is fairly mature. It’s pretty futuristic and worth checking out if you’re into that sort of thing.

Some Context

Name of the Game: We make and sell our own tracking hardware, but OEMs increasingly want to install use their own cameras in headsets and just use our software. Even worse, they want to deploy them for dual use, like sharing a camera system between 6DoF tracking and hand tracking. This means we need to be able to run our models on a variety of conditions.

My Team

The job of our team was to find better ways to track hands on different devices, in different lighting conditions, and with different gestures.

Earlier Role as a Machine Learning Engineer

Starting out, I was mainly responsible for testing models by hand (no pun intended). To convince customers our models work well for them, we needed a robust system of metrics that could run on simulated data and serve as an honest benchmark. I created a system of performance metrics that could be run on simulated data, and then used to compare models, giving out customers a quantitative estimate on things like latency, false positive rate, and a bunch of specific hand-tracking performance related metrics like palm normal error. This system is now used to test all of our models, and is a key part of our CI/CD pipeline.

To be fast to iterate internally, we needed a system that could check for specific performance regressions. To this end, I expanded what was basically a system of gesture-based unit tests to create a comprehensive system that gets more accurate as it gets used.

Later Role as a Senior Machine Learning Engineer

As I got more experience, I moved more towards the research side, trying train robust, well-behaved networks.

Together with our synthetic data team, I designed and implemented experiments around different architectures and dataset parameters like lighting, camera geometry, and gesture coverage to improve occlusion performance.

I learned a lot about trying to balance tradeoffs between things like dexterity and robustness. For example, some models needed to track hands down to a centimeter accuracy at close range, others needed to be keep tracking through lighting changes, while other still need to have great depth estimation.

Here is a video showing our models running on a Lynx R1 mixed reality headset, one of many pieces of hardware we got our hands on https://www.youtube.com/watch?v=LwOsgEXb9YY

I also become interested in finding better ways to train models, and started doing curriculum learning experiments. My day to day activities involve reading papers and using the ideas to generate machine learning experiments for our detection and tracking problems. Research wise, I designed and implemented experiments in curriculum learning, model compilation for inference optimization, and model architecture design. Later on, when the attention models took off full steam, I did some work on vision transformers and shipped a suitable sub-quadratic alternative architecture.

tl;dr I worked on training low power, on-device vision models for real-time pose tracking, optimizing tradeoffs in model performance, latency, power usage, accuracy, and robustness.