One of the top-requested features for New York Times Cooking was to have metric versions of the recipes, but the effort that was needed to create accurate conversions that users can trust made it impossible.
With the advent of large language models the Cooking team had hoped that finally this product feature could be achieved but their initial results were disappointing. The outputs of the models wouldn't adhere to editorial standards, made incorrect conversion calculations and introduced critical errors into the recipe instructions - even with extensive prompting.
Based on our research developing LLM workflows, I identified a path forward:
- Creating a golden dataset of examples and an evaluation system.
- Making the task smaller by converting individual ingredients and steps instead of full recipes.
- Splitting the task into multiple stages, with LLM handling text understanding and generation while using regular code for mathematical calculations.
- Identifying areas where more knowledge was needed - LLMs don't know the correct ratios for converting volumes to weight so we needed to create a knowledge base to use for this calculation.
With these principles we were able to complete recipe conversion for the entire archive of 24,000 recipes (which required over 1 million individual LLM generations).
The conversions are highly accurate, and the first-of-its-kind system to convert dynamically from volume to weight makes the recipes dramatically more usable for audiences that prefer metric units.

Converting baked recipes accurately was critical.
A quick re-ranking of major landmarks in human civilisation: - 18000BC: first use of pottery - 4000BC: wheel invented - 3999BC to 2023: writing, electricity, penicillin, etc - 2024: New York Times cooking app adds metric conversions so no one has to Google “how many gms in stick of butter???”
— Richard Adams (@richada.bsky.social) Dec 11, 2024 at 12:19 PM
Users were enthusiastic.