Profile Photo

Rohit Patel

I find joy in math and coding. At the moment, I work at MSL (Meta Superintelligence Labs), on our AI models. Before Meta, I have experience working with C-suite executives of private equity owned businesses to drive impact. I enjoy sharing what I learn. Here is a plug for an article that I wrote: Understanding LLMs from Scratch that people seemed to like (it became one of Medium's most shared stories of 2024) and I hope you enjoy it. I also enjoy speaking about topics I care about.

Research and Publications

WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios, with MSL and Meta Reality Labs, NeurIPS
The first benchmark designed to evaluate visual question answering capabilities of multi-modal AI assistants on wearable devices like smart glasses. Unlike prior benchmarks with high-quality third-person imagery, WearVQA reflects the challenges of ego-centric interaction with occluded, poorly lit, or blurry visual inputs. The benchmark comprises 2,500 curated image-question-answer triplets spanning 7 image domains, 10 cognitive task types, and 6 wearables-specific quality issues. Open-source and proprietary multi-modal LLMs achieved only 24–52% accuracy on WearVQA, with substantial drops on lower-quality images and reasoning-heavy tasks.

Understanding Reinforcement Learning for Model Training, and future directions with GRAPE
This paper provides a self-contained, from-scratch, exposition of key algorithms for instruction tuning of models: SFT, Rejection Sampling, REINFORCE, Trust Region Policy Optimization (TRPO), Proximal Policy Optimization (PPO), Group Relative Policy Optimization (GRPO), and Direct Preference Optimization (DPO). Explanations of these algorithms often assume prior knowledge, lack critical details, and/or are overly generalized and complex. Here, each method is discussed and developed step by step using simplified and explicit notation focused on LLMs, aiming to eliminate ambiguity and provide a clear and intuitive understanding of the concepts. By minimizing detours into the broader RL literature and connecting concepts to LLMs, we eliminate superfluous abstractions and reduce cognitive overhead. Following this exposition, we provide a literature review of new techniques and approaches beyond those detailed. Finally, new ideas for research and exploration in the form of GRAPE (Generalized Relative Advantage Policy Evolution) are presented.

The Llama 3 Herd of Models, with Meta GenAI team
This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

Meta Llama 3.2, with Meta GenAI team
Meta AI's Llama 3.2 release introduces small and medium-sized vision LLMs and lightweight text models optimized for edge and mobile devices. These models support context lengths of 128K tokens and excel in on-device tasks like summarization and instruction following. Llama Stack distributions simplify deployment across environments with integrated safety features. This update enhances Llama's capabilities, modifiability and cost efficiency, driving innovation in generative AI applications.

Meta Llama 3 large language model, with Meta GenAI team
Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety.

Costly Inspection and Money Burning, with Can Urgun
A principal designs a mechanism to allocate an indivisible, productivity-increasing good to one of many agents. Monetary transfers are not allowed. Instead, we consider the interplay between two instruments studied only in isolation: “costly verification of the agent’s type” and “money burning”. We use a graph-theoretic approach and characterize the optimal mechanism completely.

Costly Verification by an Intermediary in a Two Sided Market
This paper studies the players in a two sided market when the good sold is of uncertain value, network effects are in play, and there is a cost to verifying the value of the good. It describes the optimal mechanism to maximize the revenue for the intermediary.

Indices for Dynamic Pricing in the Event Ticketing Industry
This paper introduces new price indices and measures to facilitate dynamic pricing in the sports ticketing industry.

A Seller's Problem and Costly Verification, with Can Urgun
This paper outlines the optimal strategy to maximize the revenue of a seller when the buyer's willingness to pay is unknown, but can be discovered at a cost (market research etc..).

Differentiating Reputations in Dynamic Duopolies, with R. Andrew Butters
This paper outlines how reputation in a marketplace can be used to boost prices and revenue in dynamic duopolies, and how firms end up at different product quality and prices in a dynamic equilibrium.

Advisory

BayPine | Digital Advisory Board
I serve on BayPine's Digital Advisory Board, where I advise on the use and implications of AI across private equity sourcing and investments, as well as portfolio companies' value creation plans.

Speaking

I regularly speak on topics including AI, evaluations, and building effective AI agents. I've spoken at conferences like CES, MIT Emtech, TechCrunch Disrupt and more:

  • eMerge Americas | Miami, FL | Apr 22-24, 2026
  • Fintech Americas | Miami, FL | Mar 24-26, 2026
  • MIT Emtech | Athens, GR | March 19-20, 2026
  • CES 2026 | Las Vegas, NV | Jan 6-9, 2026
  • AUTONOMOUS | Virtual | December 3-4, 2025
  • Tech Basel Miami AI Summit | Miami, FL | December 3, 2025
  • ISG AI Impact Summit | New York, NY | November 17-18, 2025
  • TechCrunch Disrupt | San Francisco, CA | October 27-29, 2025
  • NDSML Summit | Stockholm, SE | October 22-23, 2025
  • Ai4 | Las Vegas, NV | August 11-13, 2025
  • JP Morgan Quantitative Conference | New York, NY | June 2025
  • SINFO 32 | Lisbon, PT | February 2025
  • THE AI SUMMIT NEW YORK | New York, NY | December 2024

Articles and Essays

Some of my writing lives outside of research papers. Here are a few essays and explainers I have published online.

The Kind Of Intelligence We Are Building With AI, And What It Means To Be Human | Forbes
A reflection on the kind of intelligence current AI systems embody, and what that means for how we think about being human. It contrasts creative fluency in modern models with their continuing weaknesses in rigorous logic and reasoning.

What does the future of AI look like if we hit the LLM scaling wall? | Medium
An essay on what comes next if frontier LLM scaling slows down. The argument is that small models, scaled inference, and AI agents may become the more important path forward.

Understanding reinforcement learning for model training from scratch | Medium
A first-principles walkthrough of how pre-trained models become instruction-tuned models, covering supervised fine-tuning, rejection sampling, and reinforcement learning methods for LLM training.

An intuitive treatment of Negative log-likelihood, Cross entropy, KL divergence, and Importance sampling | Medium
A plain-English treatment of core ideas behind modern model training, including negative log-likelihood, cross entropy, KL divergence, and importance sampling.

Understanding LLMs from Scratch Using Middle School Math | Medium
A self-contained explanation of how LLMs work, starting from simple arithmetic and building up to transformers. It is written to make the core mechanics accessible without assuming prior machine learning background.

How to do the Price-Volume-Mix waterfall right | Medium
An explanation of how to correctly decompose revenue changes into price, volume, and mix effects, and why common PVM waterfall analyses often get the attribution wrong.

Resume