I am a Computer Science PhD student at the University of Colorado Boulder. I am broadly interested in representation learning for spatial and geographic applications.
Before graduate school, I worked as a Machine Learning Scientist at Orbital Sidekick in San Francisco. I was a research intern at NASA Jet Propulsion Laboratory'sMachine Learning and Instrument Autonomy Group, where I worked on improving robustness of machine learning models on-board large-scale airborne and spaceborne imaging spectrometers advised by Andrew Thorpe and Steffen Mauceri.
I graduated from The Chinese University of Hong Kong majoring in Financial Technology and Computer Science. While at CUHK, I worked on Decentralized Machine Learning at CUHK's Network Science and Optimization Laboratory with Professor Hoi-To Wai, and Adversarial Robustness with Professor Bei Yu
Service is a major component of my non-research endeavors. Since 2024, I have worked with the Prison Mathematics Project in the capacity of both a student instructor and an advocate during pre-clemency instructional sessions. I also volunteer at Boulder County Jail's math circle as a volunteer tutor. Since 2021, I have been affiliated with the 414LIFE program to end youth violence in Milwaukee, Wisconsin.
Geographic data is fundamentally local, with patterns clustering in specific areas like population centers and coastlines. Current machine learning models distribute representational capacity uniformly across the globe, limiting performance at fine-grained resolutions needed for localized work. We propose using spherical Slepian functions to build a geographic location encoder that concentrates computational resources within regions of interest while scaling efficiently to high resolutions. For applications requiring broader geographic context, we introduce a hybrid approach combining Slepian and Spherical Harmonic functions that balances local-global performance while maintaining safety at poles and preserving spherical-surface distances. Testing across five tasks involving classification, regression, and image-enhanced prediction shows our Slepian-based encodings outperform existing methods and remain effective across various neural network designs.
Within the context of representation learning for Earth observation, geographic Implicit Neural Representations (INRs) embed low-dimensional location inputs (longitude, latitude) into high-dimensional embeddings, through models trained on geo-referenced satellite, image or text data. Despite the common aim of geographic INRs to distill Earth's data into compact, learning-friendly representations, we lack an understanding of how much information is contained in these Earth representations, and where that information is concentrated. The intrinsic dimension of a dataset measures the number of degrees of freedom required to capture its local variability, regardless of the ambient high-dimensional space in which it is embedded. This work provides the first study of the intrinsic dimensionality of geographic INRs. Analyzing INRs with ambient dimension between 256 and 512, we find that their intrinsic dimensions fall roughly between 2 and 10 and are sensitive to changing spatial resolution and input modalities during INR pre-training. Furthermore, we show that the intrinsic dimension of a geographic INR correlates with downstream task performance and can capture spatial artifacts, facilitating model evaluation and diagnostics. More broadly, our work offers an architecture-agnostic, label-free metric of information content that can enable unsupervised evaluation, model selection, and pre-training design across INRs.
Arjun Ashok Rao,
Esther Rolf,
"Using Multiple Input Modalities Can Improve Data-Efficiency and OOD Generalization for ML with Satellite Imagery", TerraBytes Workshop @ ICML 2025, Proceedings of Machine Learning Research (PMLR).
SpotlightBest Poster Award
Preliminary version accepted to ML4RS Workshop @ ICLR 2025.
OralBest Student Paper Award [Paper]
A large variety of geospatial data layers is available around the world ranging from remotely-sensed raster data like satellite imagery, digital elevation models, predicted land cover maps, and human-annotated data, to data derived from environmental sensors such as air temperature or wind speed data. A large majority of machine learning models trained on satellite imagery (SatML), however, are designed primarily for optical input modalities such as multi-spectral satellite imagery. To better understand the value of using other input modalities alongside optical imagery in supervised learning settings, we generate augmented versions of SatML benchmark tasks by appending additional geographic data layers to datasets spanning classification, regression, and segmentation. Using these augmented datasets, we find that fusing additional geographic inputs with optical imagery can significantly improve SatML model performance. Benefits are largest in settings where labeled data are limited and in geographic out-of-sample settings, suggesting that multi-modal inputs may be especially valuable for data-efficiency and out-of-sample performance of SatML models. Surprisingly, we find that hard-coded fusion strategies outperform learned variants, with interesting implications for future work.
Methane is a highly potent greenhouse gas, and accurately measuring methane emissions is an important step in addressing global environmental change. A large number of human-caused methane emissions originate from small ground structures known as "point-source emitters". These structures typically span a few meters wide and release highly concentrated methane. Current work on detecting these emissions use machine learning models that are trained to identify point-source emissions from aerial imagery. However, these models require large amounts of expensive labelled data, and have shown an inability to adapt to new terrain and environments. To increase the amount of methane emission data available to scientists, we use a collection of synthetic methane measurements. This data is generated with a mathematical model built to realistically replicate the three-dimensional distribution of methane observed in aerial imagery. We train a Convolutional Neural Network on the generated synthetic data, and test our model on realistic datasets containing airborne imagery from diverse terrain and environmental conditions. Training models on a combination of synthetic and real methane data helps reduce false positives on previously unseen scenes.
The Chinese University of Hong Kong Network Science and Optimization Laboratory
Advisor: Prof. Hoi-To Wai
Winter + Fall Research Intern | Topic: Decentralized Optimization In decentralized consensus optimization,
data is partitioned privately among \(\mathbb{N}\) workers, and the goal is to minimize each worker's objective function
\(f_{i}(\theta)\) while ensuring that \(\mathbb{N}\) workers agree about the underlying distribution of \(\theta\).
The catch: \(\mathbb{N}\) workers are distributed over a sparse graph topology, and can only communicate with immidiate neighbours.
Applications include sensor networks and privacy preserving machine learning.
Nov 2020 – May 2021
The Chinese University of Hong Kong Department of Computer Science and Engineering
Advisor: Prof. Bei Yu
Summer Research Intern | Topic: Adversarial RobustnessAdversarial Examples, which can better be visualized as imperceptible 'distribution' shifts in data are a natural consequence
of the dimensionality gap between inputs and linear models on which
high-dimensional inputs are trained on. They generalize across different architectures, and can be used
in a 'black-box' fashion to threaten real-world deep learning models. The most common strategy to defend against
test-time attacks has been to train models on adversarial data, thus ensuring some 'robustness' against
standard attacks.