Evaluating LLMs with Leva
with Kieran Klassen
Listen to this episode
About This Episode
In this episode of the Ruby AI Podcast, host Valentino Stoll talks with special guest Kieran, a prominent figure in the Ruby AI space. Kieran recently gave a talk at the San Francisco Ruby Meetup about his new gem, Leva, which focuses on LLM evaluations in Ruby. Kieran discusses his background, his passion for AI and Ruby, as well as his journey in building AI products, including his tool Cora, which helps manage email inboxes by categorizing and summarizing emails using AI. Together, Valentino and Kieran explore the process, challenges, and best practices of creating AI-driven gems and tools in Ruby, the importance of evaluations, and the fun and creative aspects of integrating AI into Ruby on Rails projects.
Mentioned in the show:
Kieran Klaassen – Ruby developer, creator of Cora and Leva.
Leva gem – Kieran's LLM evaluation framework for Rails.
Jumpstart Pro – “is the best Ruby on Rails SaaS template out there”.
Full Transcript
In this episode of the Ruby AI Podcast, host Valentino Stoll talks with special guest Kieran, a prominent figure in the Ruby AI space. Kieran recently gave a talk at the San Francisco Ruby Meetup about his new gem, Leva, which focuses on LLM evaluations in Ruby. Kieran discusses his background, his passion for AI and Ruby, as well as his journey in building AI products, including his tool Cora, which helps manage email inboxes by categorizing and summarizing emails using AI. Together, Valentino and Kieran explore the process, challenges, and best practices of creating AI-driven gems and tools in Ruby, the importance of evaluations, and the fun and creative aspects of integrating AI into Ruby on Rails projects. Mentioned in the show: Kieran Klaassen – Ruby developer, creator of Cora and Leva. Leva gem – Kieran's LLM evaluation framework for Rails. Jumpstart Pro – “is the best Ruby on Rails SaaS template out there”. Stepper / Stepper Motor (workflow engine) – a “journey” with steps for background jobs. Jaccard Index – A metric for set similarity (|A∩B|/|A∪B|). LangSmith – a platform for building production-grade LLM applications. Morph LLM – The Fastest Way to Apply AI Edits (4500+ tokens/sec). Friday AI Agent – An AI-powered coding agent that handles PRs from start to finish. DSPy.rb – Framework for building AI agents and optimizing prompts. Highlights: 00:00 Introduction and Guest Welcome 00:53 Kieran's Background and AI Journey 01:20 Building AI Tools and the Leva Gem 03:47 Challenges and Best Practices in AI Development 07:16 Evaluations and Real-World Applications 07:36 Community Recognition and Adoption 12:37 Prompt Engineering and Model Testing 22:06 Leveraging AI for Workflow Optimization 28:35 Visualizing Workflows and Tools 31:44 Exploring Hybrid Orchestration Layers 33:15 Debating Deterministic Workflows vs. Agent Flows 34:28 The Fun of Experimenting with AI and Ruby 34:55 Building Gems and Learning Through Creation 40:03 The Value of Rails in AI Development 46:28 Evaluating AI Outputs and Metrics 50:40 Annotation and Continuous Improvement 53:50 Future of AI and Rails Integration 54:54 Closing Thoughts and Recommendations
Want to modernize your Rails system?
Def Method helps teams modernize high-stakes Rails applications without disrupting their business.