Rohit Patel

Research and Publications

WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios, with MSL and Meta Reality Labs, NeurIPS
The first benchmark designed to evaluate visual question answering capabilities of multi-modal AI assistants on wearable devices like smart glasses. Unlike prior benchmarks with high-quality third-person imagery, WearVQA reflects the challenges of ego-centric interaction with occluded, poorly lit, or blurry visual inputs. The benchmark comprises 2,500 curated image-question-answer triplets spanning 7 image domains, 10 cognitive task types, and 6 wearables-specific quality issues. Open-source and proprietary multi-modal LLMs achieved only 24–52% accuracy on WearVQA, with substantial drops on lower-quality images and reasoning-heavy tasks.

Understanding Reinforcement Learning for Model Training, and future directions with GRAPE
This paper provides a self-contained, from-scratch, exposition of key algorithms for instruction tuning of models: SFT, Rejection Sampling, REINFORCE, Trust Region Policy Optimization (TRPO), Proximal Policy Optimization (PPO), Group Relative Policy Optimization (GRPO), and Direct Preference Optimization (DPO). Explanations of these algorithms often assume prior knowledge, lack critical details, and/or are overly generalized and complex. Here, each method is discussed and developed step by step using simplified and explicit notation focused on LLMs, aiming to eliminate ambiguity and provide a clear and intuitive understanding of the concepts. By minimizing detours into the broader RL literature and connecting concepts to LLMs, we eliminate superfluous abstractions and reduce cognitive overhead. Following this exposition, we provide a literature review of new techniques and approaches beyond those detailed. Finally, new ideas for research and exploration in the form of GRAPE (Generalized Relative Advantage Policy Evolution) are presented.

The Llama 3 Herd of Models, with Meta GenAI team
This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

Meta Llama 3.2, with Meta GenAI team
Meta AI's Llama 3.2 release introduces small and medium-sized vision LLMs and lightweight text models optimized for edge and mobile devices. These models support context lengths of 128K tokens and excel in on-device tasks like summarization and instruction following. Llama Stack distributions simplify deployment across environments with integrated safety features. This update enhances Llama's capabilities, modifiability and cost efficiency, driving innovation in generative AI applications.

Meta Llama 3 large language model, with Meta GenAI team
Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety.

Costly Inspection and Money Burning, with Can Urgun
A principal designs a mechanism to allocate an indivisible, productivity-increasing good to one of many agents. Monetary transfers are not allowed. Instead, we consider the interplay between two instruments studied only in isolation: “costly verification of the agent’s type” and “money burning”. We use a graph-theoretic approach and characterize the optimal mechanism completely.

Costly Verification by an Intermediary in a Two Sided Market
This paper studies the players in a two sided market when the good sold is of uncertain value, network effects are in play, and there is a cost to verifying the value of the good. It describes the optimal mechanism to maximize the revenue for the intermediary.

Indices for Dynamic Pricing in the Event Ticketing Industry
This paper introduces new price indices and measures to facilitate dynamic pricing in the sports ticketing industry.

A Seller's Problem and Costly Verification, with Can Urgun
This paper outlines the optimal strategy to maximize the revenue of a seller when the buyer's willingness to pay is unknown, but can be discovered at a cost (market research etc..).

Differentiating Reputations in Dynamic Duopolies, with R. Andrew Butters
This paper outlines how reputation in a marketplace can be used to boost prices and revenue in dynamic duopolies, and how firms end up at different product quality and prices in a dynamic equilibrium.

Speaking

Upcoming

CES 2026 | Las Vegas, NV | Jan 6-9, 2026
MIT Emtech | Athens, GR | March 19-20, 2026

Past

AUTONOMOUS | Virtual | December 3-4, 2025
Tech Basel Miami AI Summit | Miami, FL | December 3, 2025
ISG AI Impact Summit | New York, NY | November 17-18, 2025
TechCrunch Disrupt | San Francisco, CA | October 27-29, 2025
NDSML Summit | Stockholm, SE | October 22-23, 2025
Ai4 | Las Vegas, NV | August 11-13, 2025
JP Morgan Quantitative Conference | New York, NY | June 2025
SINFO 32 | Lisbon, PT | February 2025
THE AI SUMMIT NEW YORK | New York, NY | December 2024

Rohit Patel

Research and Publications

Speaking

Upcoming

Past

Resume