Episode 8

Real-World Ruby AI: Practical Systems That Work

with Amanda Bizzinotto

Real-World Ruby AI: Practical Systems That Work

About This Episode

In this episode of the Ruby AI Podcast, co-hosts Joe Leo and Valentino Stoll, alongside guest Amanda Bizzinotto from Ombu Labs, delve into the ongoing controversy within the Ruby community involving Ruby Central, Shopify, and Bundler/Ruby Gems. While both Valentino and Amanda share their perspectives on the situation, the conversation swiftly transitions into Amanda's journey and current work in AI and machine learning at Ombu Labs. The episode highlights various AI initiatives, including the creation of an AI bot to streamline internal processes, automated Rails upgrade roadmaps, and multi-agent architectures aimed at enhancing efficiency in Rails projects. Amanda also discusses the challenges of integrating AI in consultancy services and shares some insights on the tools and strategies used at Ombu Labs. The podcast concludes with exciting updates about Amanda's recent work, Joe's announcements on upcoming projects including Phoenix's public release, and Valentino's discovery of a new user interface for Claude Swarm.

00:00 Introduction and Welcome

00:26 Ruby Community Controversy

04:37 Amanda's AI Journey

08:45 AI in Business and Consultancy

Full Transcript

Joe Leo (00:01)
Hey everybody. Welcome to the Ruby AI podcast. I am your ⁓ co-host Joe Leo, and I'm here with my co-host Valentino Stoll. Say hi, Valentino. And we're joined here by Amanda, who's going to tell us about some of the AI initiatives happening at AMBU Labs. Say hi, Amanda. It's great to have you here today. And ⁓ I thought, and I don't know,

Valentino Stoll (00:15)
See you now.

Joe Leo (00:30)
I didn't get anybody who said no to this, so I thought we could jump right in and talk about the controversy that is roiling the Ruby community today. ⁓ And that is, and maybe we're a couple of days late, but that's okay. We're keeping a schedule here. We ⁓ are plunged into this controversy of Ruby Central ⁓ and Shopify and DHH and Bundler and Ruby Gems. Valentino, who you got in this struggle?

Valentino Stoll (01:02)
To be honest, I haven't been following it too closely. I like to stay away from controversy, ⁓ But it seems like ⁓ there was some mishandling of the commit ⁓ aspects of the GitHub repositories and organization. I don't really know who's in the wrong where, but ⁓ you know.

Joe Leo (01:06)
Alright.

Mm-hmm.

Valentino Stoll (01:29)
If I had been a long time committer to a project and just suddenly lost ⁓ access to something I had committed for a long time and ⁓ had good faith efforts ⁓ and contributions, I'd be a little upset too.

Joe Leo (01:41)
Yeah.

I think, you know, the only thing, maybe this is my own ignorance, the only thing I know about Andre Arco is that he has maintained Bundler for 15 years. That is his thing. You know, I mean, we had him at NYCRB and he gave an impassioned talk, I mean, this is years ago, about Bundler, about what he had done, about what the future holds. It was very compelling. And ⁓ when these kinds of...

people in our community who do these things, work on these projects as a labor of love, and then have it either taken away or ⁓ have the reins pulled without any real reason, accusation of malfeasance or some kind of reason. ⁓ I think it really does burn me up.

I don't think that's right. the, you know, when we, when we dig a little deeper, you find like Ruby gems and Bundler Ruby gems. There's this conflation of the concept of Ruby gems, you know, the open source library and rubygems.org the website. And I've found that this conflation, you know, genuinely confuses a few people, but then is used to mask explanations to a whole bunch of other people. And, ⁓

I don't like the way that that is being.

sent out into the community as justification. Well, we need rubygems.org ⁓ to be the site where you can get all of these gems, and that's great, but that's not the open source library that everybody is maintaining that we're talking about. I find it interesting, you know, there's...

You know, there, there seems to have been a huge push from Shopify and I understand the stakes of, you know, Bundler not being available or Ruby gems not being available. Those are pretty high, but I don't know anybody, you know, many companies that are better equipped to, you know, of their own accord, you've stemmed the risks of those kinds of things than Shopify, right?

I mean, they have all the people to do it. They had people on standby ready to take over support of Bundler and Ruby gems once they kicked the maintainers off temporarily. And, you know, I think that,

Ruby Central lost some trust, maybe just, I don't know, some amount of trust, bringing DHH in to speak at RailsConf. And I think the way they did it was not great. ⁓ But I mean, for me, DHH, there's a lot not to like about the guy, but he did invent Rails. And so I'm like, okay, so he's at RailsConf.

I mean, I love it, but I kind of expected it. This, do think is different. This erodes a lot more trust because it's digging into these people who have selflessly given themselves to the Ruby community for thousands of hours over a decade and a half or more.

So let me step off my soapbox for a second. Amanda, what do you think?

Amanda Bizzinotto (05:13)
⁓ yeah, I'm with, ⁓ well, and you know, you're having been following too closely, but yeah, it seems like there's there, could have been handled better, especially from a communication standpoint. And it's definitely generated some, some records within the community. That's for sure.

Joe Leo (05:29)
All right, I can tell that we're not in ⁓ comfortable territory for my co-host and my guest. So ⁓ that's all I'll say at the moment. But trust me, I've got lots more to say. So find me after the show. ⁓ Let's get into you, Amanda. Before we get into ObviLabs, let's get into you, the person. What is...

What is the thing that is drawing you into artificial intelligence today? What's exciting you about it? What's your journey like to get there?

Amanda Bizzinotto (06:04)
Sure, yeah. Well, I got a in data science, machine learning. I worked a little bit with that before I joined Ombu. And ⁓ it's always been an area of interest, even though I kind of shifted a little to more operational work, data has always been part of the role for me. And so with the film that we saw when ChatGPT came out and all the GPT models,

⁓ It became ⁓ more and more a day-to-day topic. And so it just felt like a natural thing to check out, getting to just the natural evolution of that ⁓ coming from a machine learning type of background and working with machine learning models. So yeah, I had a chance to work with the GPT-2 before. GPT was all the rage. So it's been really interesting and really fun.

Joe Leo (06:58)
Mm-hmm.

Amanda Bizzinotto (07:01)
following all this evolution and how things are shifting as well, and also how it's shaping how we work and ⁓ what can we realistically do, and also honestly making a lot of this technology more accessible than previously having to deal with a whole lot of historical data and training models and hosting them. There's still obviously a place and a time for AI and for traditional machine learning, but it does make some things a lot more accessible.

Joe Leo (07:30)
Yeah, you you, you said something interesting about coming to this from an ML background and you know, AI as we know it today stands on the shoulders of, of the ML that has been, you know, has been being researched and developed and built for, years and years. But most people, including, think most developers don't really experience it that way. You know, they experience it as well. We're, we're building an API integration so that we can, you know, send our.

send our queries to an LLM and parse the results and do something with it. So do you find that disconnect when you come to a sort of traditional, you know, boutique consultancy like Ambulabs?

Amanda Bizzinotto (08:16)
Yes, yes, there's definitely a bit of a disconnect. I mean, our team is primarily rail software engineers. And so for the things that we're building, a lot of times it is approached in that sense. Like, okay, we need to build an API integration, talk to an LLM, and we'll be able to get what we want out of it. And, you know, for the vast majority of what realistically what we do with AI and cloud providers, that works.

But there are definitely some situations, especially when it comes to optimizing how you interact with an LLM or really understanding how those models work and realistically what you can do with it. And also when you get to very specific use cases where you might need to fine tune a model or in some way really customize a large language model where it really makes a difference understanding what's underneath and the...

architecture that powers these GPT models, are all built on top of the transformer. The transformer really hasn't changed a lot from the 2014 paper that introduced it, but there have been some evolutions. And it can really help get the most out of these integrations in the most efficient way possible.

Joe Leo (09:27)
This is probably Valentino territory, but I'm curious to know how the transformer has changed in the last 11 years.

Amanda Bizzinotto (09:35)
Um, as I it hasn't changed a lot. It's mostly, if you look at the 2014 paper by DeepMind that introduced it, it was conceived as an architecture to power machine translations, right? And so for the GPT models that we have today, it's really just power engine, power engine degeneration, we're not necessarily doing machine translations. Um, so, uh, you don't, you don't need to do all encoder decoder, right? Like mostly you have one and that's it.

We have seen some papers recently, some from Meta, for example, with the LAMA series that then modified a little bit to include adapters or routers within the transformer that can then lead to smaller specialized models. So this has been one of the main evolutions. But in terms of the overall characteristics of how it works, how the attention mechanism works, which is what really makes these models really powerful, the backbone is still there.

Joe Leo (10:29)
That was interesting. you know, when we look at, you know, you're part of a consultancy, I've run a consultancy for 11 years, and so the kinds of stuff ⁓ that you and your team experience is probably, you know, familiar to me. ⁓ I'm curious how... ⁓

how things have changed over the last two to three years, let's say, since the advent of ChatGPT and since everybody in the world has said, well, ⁓ we need AI with this, which, you know, when they may or may not really understand what they mean by AI and they really may or may not understand what it means to integrate it into their project. So how do you navigate that, ⁓ you know, the varying levels of

AI competence and ⁓ demand among the customers of Omnilabs.

Amanda Bizzinotto (11:32)
Yeah, that's a good question. To a certain extent, everyone is curious about AI, right? Like it's all the rage, everyone's doing it. So ⁓ we basically have, ⁓ I would say, three different types of situations. You have people who are curious about AI, they believe they could do something with AI, but they're not sure what. And so there is all that process of finding the right fits. Like what can AI actually do for you? ⁓ How can it help you? What's the most efficient way to integrate it?

And then we have people who have a very clear idea of what they want to do. They just don't know how, because there's a lot of stuff out there. There a lot of libraries. There's a lot of new concepts. And so it's like, I think I can solve this problem with AI. I have a clear idea for what I want out of this, but how do I do this? And the third, which I think is probably the most interesting is people who are very convinced that their problem needs AI and you need to convince them that it actually doesn't.

Because there are some situations where you are actually better off training a classifier, for example, than trying to do whatever you're trying to do with a language model. There are also some situations where a rule-based system is going to give you 95 % of the result with 10 % of the cost. So that really doesn't make a lot of sense to rely on LLM for.

Joe Leo (12:31)
Mm-hmm.

Yeah. Yeah, I hear you there.

Valentino Stoll (12:52)
It's.

That's really funny classification. It's usually the first thing people reach for, for an LLM. And they're like, ⁓ you just bucket, it can bucket data for you. And it's like, not really great at it. But when you get a lot of throughput through, right? Like at first it might seem like it's doing a good job. Like if you're in chat GPT or something. ⁓ And then if you've ever used that like traditional ML classification, like

Joe Leo (13:07)
Mm-hmm.

Amanda Bizzinotto (13:07)
Yeah.

Valentino Stoll (13:22)
Even a bad one is good, you know, in comparison.

Joe Leo (13:26)
Mm.

Amanda Bizzinotto (13:27)
Yeah, exactly. And also, LMS aren't great with tabular data. And so if you have a whole lot of historical tabular data, you're probably better off training a traditional machine learning model.

Valentino Stoll (13:37)
Interesting. Yeah, so I'm curious, as you start to adopt more and more, or onboard a lot of businesses and varying use cases, are there specific use cases that you've seen surface as, ⁓ this is an obvious choice and easy recommendation for businesses? Where are you starting to see the most useful?

applications, right? For like a business, I guess it will depend on the kind of business, but it.

Amanda Bizzinotto (14:11)
Yeah, depends a lot on the kind of business, but honestly, one question we get a lot is how are we using ⁓ AI to improve or make the Rails upgrade process more efficient? I mean, at the end of the day, ⁓ Ombulabs are the makers of FastRuby and FastRuby is really focused on Rails maintenance and Rails upgrades. So that is ⁓ the main use case we get asked about is how can AI help with my Rails ⁓ upgrades? And so...

We've been spending a significant amount of time and effort on that, because we truly believe with the information that we have, ⁓ and with the power of language models, it is very possible to make that process faster, more efficient, even if it is to have a hybrid human agent team that is working on a Rails upgrade. Other than that, there are always the small...

use cases, ⁓ I say small because they are a little bit simpler typically than these very specialized coding agents. ⁓ But that can help you save time, especially on very, very manual processes. So things like automating ⁓ part of your routing process, part of your sales process, part of your marketing process. And we do that for ourselves as well. Like we've recently built a small agent

that basically helps our marketing team with our newsletter. We have a bi-weekly newsletter. Our team suggests links and as you can imagine, they're fairly technical links and our marketing team doesn't really follow what's happening in that article. And so now we have a little dashboard that the marketing team can use to organize those links and then an agent in the background that's interpreting the article and generating the snippets for them and saying, okay, here's what you should say or the main topics.

And so it helps us automate that process. It's helped to save a lot of time that goes into creating a newsletter by reducing the back and forth between the marketing team and our engineering team. like, okay, what is this about? Does this somewhere we go? Does this work? It's really helped with that.

Joe Leo (16:17)
Yeah, that's cool. ⁓ I want to go back just a minute to what you said about fast Ruby and upgrading Rails. So that is actually, ⁓ that is where ⁓ Phoenix started. Like we started with the idea that, we can have an LLM kind of go end to end. We can do with one LLM, but we could build some agents that could go end to end and upgrade a Rails project. But ⁓ you know,

There are some really gnarly Rails projects out there and upgrading is ⁓ a really multifaceted task when you get into all of the different downstream Gen versions, ⁓ all the Gen dependencies, all of the tests that need to be upgraded or changed to support the changes that are happening in each dependency. ⁓

to say nothing of just Rails itself, right, or Ruby itself, and those changes that come with the upgrades. So, you know, long story short, we're like, okay, well, that's a really tough project. That's a really tough problem to solve with SaaS software, which is what we were going for. So we changed direction, we pivoted a little bit. So I'm curious to know what you've learned when you've tried to shorten the timeline or replace some of the manual work with

an LLM or an agent.

Amanda Bizzinotto (17:46)
Sure, yeah, ⁓ indeed, it's a pretty complex process. And ⁓ the way we've been tackling it is we have a very established process that we use to upgrade Rails. And so we've started tackling it step by step. ⁓ So the first tool that we built is actually the automated roadmap to upgrade Rails. So the way every Rails upgrade starts for us is with a roadmap, which is basically an analysis where

So engineers go in, analyze your code base, create that action plan. the question for us was, can we automate this or at least part of this? And as it turns out, we can. And it's not a hundred percent like a human roadmap, but it gets you 85, 90 % there depending on the project. Now it's all static analysis, of course. So any warnings that you would only get at runtime, it can't handle, but it is pretty good.

Joe Leo (18:35)
Mm-hmm.

Amanda Bizzinotto (18:40)
at that repetitive task that our engineers used to do manually where you have to get a deprecation warning, look at it, grab the code base, if that whatever is being deprecated is being used in that code base, and then put it the action plan, our automated roadmap does that for you. And so it's basically an agent that just does that. It's highly specialized in taking the data that's already processed, so taking the information about a deprecation warning, and then using tools that mimic what a human would do.

to search through the code base and try to find usages of that. And the way that helps us, we can generate a roadmap for you that includes every single application warning that's raised between two different versions of Rails, but it's a lot. Like we have some versions where you have 90, 100 warnings. And so the agent can carry it down to 10, 12 that are actually relevant for you.

Joe Leo (19:23)
Yeah.

Sure.

Okay. Now the, the way you, you and I met actually was the night at artificial Ruby that you gave a talk on the, ⁓ the AI powered Slack bot, which I, know, I, initially I thought, okay, as this is being, as you're just kind of introducing the concept, I thought, all right, this sounds like it's kind of, ⁓ this is a little bit of a vanilla, you know, just, you know, pour some words into an LLM and

and see what you get back, but it's not. It actually goes a little deeper than that. And I was impressed by its functionality. So do you want to tell us a little bit about that and then tell us maybe, since that was months ago, how has it evolved since then?

Amanda Bizzinotto (20:14)
Sure, yeah, absolutely. So, ⁓ yeah, that one was a pretty interesting case. We had a big problem ⁓ that started years ago, really, with ⁓ our team members ⁓ having a really hard time finding information. We have information distributed in a lot of places. So it really started as, can we concentrate this information all in one place and then put a chatbot on top of it? Because that's what chatbots are really good at. And so we have primarily...

three sources of information, right? We have Slack with learning channels and all that. We have our knowledge base where all of our internal information is. ⁓ We have a blog and those are three primary sources of textual information. But then we also started having two additional sources of information last year. One was office hours that are recorded. And then we also have these talks that the team ⁓ members give once every month that are also recorded.

And finding information in a video recording is painful. Like you don't know if that's the right recording. You spend 30 minutes, 40 minutes of your life watching a video and then you get to the end and it's the wrong one. Nobody likes that. And so ⁓ with the evolution of all this transcribing models as well, it just seemed like a pretty good use case for it. So it was a pretty interesting application of it. We basically got all of this data

Joe Leo (21:11)
Yeah.

Mm-hmm.

Amanda Bizzinotto (21:38)
into one centralized place, so transcribed videos, text-rear information, Slack information, formatted it somewhat similarly, and then we could do RUG on it. And then put it in a Slack bot that the team can just access and do search. The way it's been evolving, it's still, we had the shift priority, so we really didn't do a lot of follow-up on that, but it's, yeah, we're doing that now.

And the way it's evolving is to add some additional capabilities to it. Right now, we really just focus on our engineering knowledge base and the technical knowledge. So we're trying to see, can we add other things to this? Can we make it smart enough to separate, okay, this is a technical question, I'm gonna go here. This is a support or company policy question, I'm gonna go there. But this person is a contractor. So there's some information that they don't have access to, or this person is...

a team member, not like in in an HR capacity. So there's information that they don't have access to. ⁓ and so it creates this interesting permission problem. And we've also, ⁓ been investing, ⁓ in, in adding some guard rails really to help prevent hallucinations. Like it's still hallucinates a little bit. And so, ⁓ we've been adding some, some guard rails there to, help prevent that in situations where it doesn't have data or, ⁓ information is too similar.

Joe Leo (22:33)
Mm-hmm.

Amanda Bizzinotto (22:57)
for it to identify that it's unrelated. Like we have some situations where someone's asking a question specifically about Rails 4.2, but then it's pulling information from an article about Rails 6 because it's similar enough.

Joe Leo (23:10)
I say, yeah.

Valentino Stoll (23:12)
I really loved this talk. I went back and watched. I miss this event. But I think this puts into perspective, like, probably the most common use case of trying to extend the knowledge of these LLMs. ⁓ And you kind of really broke it down in a really great way of what all the problems are. And I really liked how you solved them all. ⁓ And so I'm curious, like, ⁓ you know.

Joe Leo (23:15)
Hmm, yeah.

Amanda Bizzinotto (23:38)
Thank you.

Valentino Stoll (23:42)
Part of the problem is like, know, hallucinations, like you mentioned. So I'm curious, like what, what you use for that, for those kinds of guardrails ⁓ and like maybe what your approaches are for grounding information. Like how do you make sure that it like generates and references those, not that knowledge base. ⁓ Like what is your strategy for like managing all that kind of complexity?

Amanda Bizzinotto (24:09)
Sure, yeah. So we've been leveraging a couple of libraries out there. They're primarily Python libraries, you have to kind of connect to them. But there's one called guardrails.ai that offers a really robust system of guardrails that helps you prevent a whole bunch of problems really that can help with LLMs. And one of the things that it helps prevent is hallucinations. And it does that by basically allowing you to hook into a natural language inference.

model, is one of those models, you those logical problems where you have all given A and B is C true. It's basically a model trained on information like that. And what it does is it takes in your source data and they're generated response and it evaluates sentence by sentence, given the information I have is that can the sentence be true? And that's how it helps you kind of prevent hallucinations and rethink the response if it cannot be grounded into your, your source information.

Joe Leo (25:05)
That's interesting.

Amanda Bizzinotto (25:06)
⁓ We've

Valentino Stoll (25:07)
What are your, just

Amanda Bizzinotto (25:07)
had a couple of other... sorry, go ahead.

Valentino Stoll (25:10)
real quick, what are your go-to models for that grounding fact checking?

Amanda Bizzinotto (25:15)
We've just been using the guardrails AI ones. They have this hub of guardrails that are available and you can just use directly from their hub. So we've just been using the out of the box one. Another interesting one is sometimes we want to make the information available to everyone using the bot, but not everyone has access to the same information, right? Because we have multiple teams working on different client projects and...

Valentino Stoll (25:18)
Okay.

Nice.

Amanda Bizzinotto (25:41)
and the only client we don't want to share, we always want to share client information. And so you can use models like Microsoft Presidio to remove ⁓ any kind of sensitive information or PII from the response, from the data even. So it anonymizes it before even sending it to the language model. And so you don't leak any sensitive information either to open AI or to someone who's ⁓ benefiting from the technical information there, but really shouldn't have access to any private information.

Joe Leo (26:09)
So the PII is stored in this data store. What's the data store?

Amanda Bizzinotto (26:16)
We're using, so not in the data store. It's sometimes we have information in the source like our knowledge base, for example. And it's the source itself is the vector database that we use as quadrants. And so we don't have PII in quadrants. We have PII in the original source. And so sometimes when you fetch information from there, there might be a snippet with a comment that has ⁓ something in there.

Joe Leo (26:38)
I see.

Yeah.

Amanda Bizzinotto (26:45)
Because the knowledge base is a lot more gated, of course, because we have notes and things as projects evolve, they get deleted at the end of the project. But as the project is happening, teams might add information there. And because it's a closed space, just for the people who already have access to that information, you might end up having some sensitive information in there.

Joe Leo (27:03)
Yeah, yeah, of course. That's interesting. And then, you know, it is filtered out before a response is sent through to Slack. Yeah.

Valentino Stoll (27:14)
Yeah, so I'm curious what your strategies are around like, you know, knowledge-based updating processes. Like how do you, how do you manage that pipeline reliably, right? Like as like, you know, you start to update an article somewhere, like, and then there's some kind of lag ingestion process. But then kind of like your clients need to talk to it the same way still. So like, how do you handle like that transition of like, Oh, I'm going to use like,

a different chunking strategy or something. Do you have like specific, do you change that ever? Yeah, I'm curious.

Amanda Bizzinotto (27:54)
Yeah,

that's an interesting question. So far we haven't changed it in the sense that ⁓ we still use a ⁓ sliding window chunking strategy for all of that content. What we do is pre-process the text before it goes into the chunking pipeline. And so we have ⁓ a prompt that basically tells an AI workflow, okay, like take this document, analyze it,

put it in this format, like break it down this way, put it in this format, rewrite it. And then that can go into chunking because we needed to standardize information that's really coming in a lot of different formats. Like the way you write a knowledge base article versus the way you just write information for yourself to remember specific characteristics of something you're currently working on versus a talk, those are very different. And so we needed to kind of like find a common ground there, way to really format this information.

in such a way that we could have some consistency in the chunking and the vector retrieval as to not confuse the retrieval process.

Valentino Stoll (29:03)
Yeah, that makes a lot of sense. ⁓

Yeah, that aligns with a lot of my experience too. But I'm curious, so this is kind of like a good transition into like, I guess, prompting strategies and like agentic workflows. Like, it seems like you guys have a preference to, you know, create specific agents that like perform tasks for you in specific ways. Is that right?

Amanda Bizzinotto (29:14)
you

⁓ Yes, but for a lot of the things that we do, ⁓ agents are a good tool mostly because they can make autonomous decisions. Especially for internal tools or things that are really just there to save us time and automate processes for us, it's relatively low risk in the sense that there's nothing in there that needs a high level of security or auditing, for example, that a workflow would give you.

Valentino Stoll (30:03)
Right.

Amanda Bizzinotto (30:04)
And if the agent makes a mistake or hallucinates a little, it's internal. Like our team is going to catch it and that's it. So they are a pretty good tool, but we do have some systems as well that are just like a one LLM integration, just a call to a language model directly, because really like you don't need an agent. It's, it's just, okay, this is a particular, this is the kind of task that LLM is good for. So we're just going to send it there, get it back. Summarization for example, is a good one. And we also have some that

require a higher level of observability and auditing where we use workflows instead. So it's not an agent in the sense that it's not autonomous to make decisions. It just follows the steps that we outlined, but it can still use tools and all that. You just know exactly what's going to happen.

Valentino Stoll (30:48)
So can you enlighten our listeners here on what you're talking specifically about workflows? Because I feel like the industry is trying to normalize some terminology here. ⁓ And maybe some people are thinking workflows in a different way.

Amanda Bizzinotto (31:05)
Fair enough, yes, absolutely. Sorry. Yeah, I think there's a lot of difference in terms of how those terms are defined. So what I mean by workflow and the way I've been trying to differentiate with our team workflow from ⁓ agents, ⁓ a workflow would be basically a set of steps that you can execute, right? And you execute them in order. An agent is a little bit less predictable in the sense that your agent can make decisions as to like which tools to use when, for example. So when I say a workflow,

⁓ What I mean is going back to the automated roadmap, for example, that is an agent because it has different tools available to it. And based on the query, it decides I'm going to use this tool. Okay, now I get this result, I'm going to use this other tool next. We have one tool internally that helps us automate reports that we send to our ⁓ maintenance clients, just reports on the state of the project and all of that. And it really is just a tool for our team. Like it helps them.

automate part of that process and makes the review easier. For that, we need a higher level of observability. And so what we have there is a set of steps. like, take this information from here, do this. Now go there, take this information, do this. Now go there, take this information, do this. So the agent has absolutely, it's not an agent in the sense that the process has absolutely no decision-making power. It just executes what we tell it, but there are LLMs in each step.

Joe Leo (32:30)
And is there an opportunity with a workflow versus an agent? Maybe it doesn't matter which one, but is there an opportunity for ⁓ observability and monitoring in a workflow that may not be available in an agent?

Amanda Bizzinotto (32:46)
It's, observability tools have gotten significantly more robust. So I wouldn't say there's necessarily a huge opportunity that comes with workflows you don't have with agents. But if you take ⁓ a library like Langchain, for example, on my index, and you take the agents that are available out of the box, ⁓ there are some things happening underneath that you might not get as good a look.

at, like you can still see the process, you can still see the tools that were called and the order of execution, but you're losing predictability. And with a workflow where you know exactly which steps are going to be executed and when you have a lot more predictability and you can then add a whole lot of ⁓ logging and tracing around each one of those activities because there's nothing underneath you that you don't control. Like you control the prompts, control the outside system calls, you control everything.

Joe Leo (33:44)
Hmm.

Valentino Stoll (33:45)
What do you use for observability?

Amanda Bizzinotto (33:48)
Right now we're using LengthFuse. It's integrated well with our tools. ⁓ But for our recommendation for anyone already using tools like Datadog, for example, or for really large production systems is to use those integrated observability tools. So Datadog has an online observability tool that's pretty good. If you knew Relic added one as well, ⁓ it really helps. So we've been using LengthFuse.

Valentino Stoll (34:15)
Nice. Yeah, I like link views. ⁓ Yeah, so I'm curious because like ⁓ you have some great articles, by the way, on ⁓ prompt engineering techniques. ⁓ And I love your React ⁓ pattern ⁓ implementations ⁓ article as well. So if you're interested in learning, ⁓ these are great examples because they give ⁓ the literature on why.

Amanda Bizzinotto (34:27)
Thank you.

Valentino Stoll (34:43)
different prompting techniques are effective and what they're good at. So I recommend people go take a look at this. ⁓ But I'm curious, ⁓ you know, when it starts getting more complicated and the reasoning aspects start coming into play, ⁓ like, is React like still like something that you would pursue? Do you see other strategies kind of working their way in here? Like how do

How do you start to like think about even the systems ⁓ as you start to introduce reasoning into these workflows or agents? ⁓ Do you have like some thinking mechanisms that you or thought process you go through?

Amanda Bizzinotto (35:30)
Yeah, absolutely. So React is a really robust choice. So it's something that we always consider because it's pretty powerful in the sense that it can integrate the tool calling with a chain of thought prompt, which is also a pretty robust prompting strategy. And so we basically create an agent that has its own scratch notebook, so to speak, that's saying, okay, I need to do this, calls the tool, gets the result. All right, I did this, I got this, what do I do next?

And it automates that loop for you. ⁓ So it's a really powerful one. But you obviously have some other strategies as well that you can use that are pretty good. There's a planner executor strategy that's really nice, especially when you have very complex workflows. So you basically use a planner first that takes a task and is like, OK, this is the plan.

this is how we're going to execute this task. And then it starts calling subagents that are each specialized in one thing and are going to execute that. And the planner can modify the plan as it goes and as it gets more information. ⁓ So planning is a really, really robust tool as well, especially if you have complex workflows. One thing that we've been looking at more and more recently as well is multi-agents. Because as we tackle more and more complex use cases,

⁓ It gets to the point where if your agent's trying to do too many things, then it's going to start failing more often. so ⁓ multi-agent systems can be very effective at orchestrating these multiple agents that can work collaboratively in complex tasks. And then you can also get creative with what we're doing. We've had situations where you can have a blog writer and then a reviewer that

Joe Leo (37:04)
Mm-hmm.

Amanda Bizzinotto (37:27)
that reviews the writers' ⁓ work, but then maybe you have three writers and each one them gives you a different version. And now the reviewer is like, okay, which version is best or how do I compose this? You can have agents that can call other agents as tools. one of the tools available to your agent is a different agent that's specialized at a different thing. So this thing's gonna be very, very flexible. ⁓ But yeah, at end of the day, ⁓

The ones we've been using the most are either a planner and reviewer strategy. So you've been tied the same agent to have an execution, like plan, execute, review, flow and the react agent.

Joe Leo (38:09)
Yeah, I like that. We evolved Phoenix the same way, you know, starting with a single agent and then I think Planner was the first, you know, first of the multi-agent ⁓ chain. ⁓ Reviewer ⁓ is actually the most recent because Reviewer needed its own set of ⁓ guidelines to say, okay, well, what is it good?

Valentino Stoll (38:09)
Yeah.

Joe Leo (38:33)
Rails tests, right? And how do we how do we grade them? And so now we have a grader that goes back and says, okay, these are ⁓ You know, I well I guess you could consider at first the reviewer was something like well do these tests pass or fail and let's get the failing ones out of there because nobody wants that and then it went into like actually are these good tests let's grade them right and then ⁓ You know kind of evolved from there. So that's interesting and I'm curious to know, know, you've got we've talked a lot about the work that is going on

internally at Ombu Labs, but of course you also have services that you are, in which you're building these kinds of solutions for customers. so have you gotten into some of these multi-agent architectures ⁓ on behalf of your customers as well?

Amanda Bizzinotto (39:24)
Not so far. We're really hoping to get a use case where a multi-agent architecture would make sense. But so far we really haven't had one for a customer. So it's really been mostly ⁓ just internal tools that we're developing. ⁓ Indirectly though, the one significant application that we have of a multi-agent is for client-facing work. Because one of the questions that we get asked the most is,

How are you using AI to make the Rails upgrades more efficient? And ⁓ that's where one of the current effort is going into developing basically a multi-agent system that can be your ⁓ upgrading pair. And basically is tool to either, ⁓ initially to help our engineers perform Rails upgrades in a more efficient way.

Joe Leo (39:58)
Mm-hmm.

Amanda Bizzinotto (40:19)
But eventually we want to grow it into something that we can also offer customers that need to upgrade their own applications as a, hey, here's an agent that can help you and guide you through the process. He won't be an agent that can upgrade for you. Cause as you mentioned, that's an early upgrade. Upgrading can be an early process, but an agent that's capable of building on top of not just the data that's available out there, but also dates that we've accumulated over the years with the amount of upgrades that we've done and help you find the best solutions to those problems.

Joe Leo (40:30)
Mm-hmm.

Yeah, cool.

So I am mindful of the time ⁓ and I know that we want to get in our usual ending ⁓ of interesting AI and Ruby. It doesn't have to be AI and Ruby, but interesting finds. ⁓ So should we go to that next? Or do you have more? You've got more questions? Yeah, go for it. Go, go, go. That's okay.

Valentino Stoll (41:14)
Wait, one last thing. I have one, one, it could be a brief segment here. ⁓

So I just noticed from your examples, Amanda, that you use almost no gems. It's all like a Faraday. And so I'm curious, like, do you have ⁓ any gems out, AI, Ruby AI gems out there that you do like the interfaces of, ⁓ especially building agents and things like this? Are there ones that you see promising at least? ⁓

curious what your thoughts are there.

Amanda Bizzinotto (41:47)
Sure, yeah. So yes, we started actually keeping track internally of Ruby projects that we find interesting in this space. ⁓ I've been, I had a chance to talk to Justin Bowen about Active Agent. I've been looking into that a lot and we've actually used it for an internal example and we're hoping to do more with it. So that's one gem that I really like that can help with the agent process.

We've used some features of the library formerly known as length chain RB. And yeah, it's getting really, really mature as well. And recently I've also been looking into ways to better integrate DSPy RB into our projects. So yeah, there's definitely a lot of interesting things out there.

Joe Leo (42:29)
Hmm.

Amanda Bizzinotto (42:39)
But to be honest, I think the gems we've used the most have been PG Vector, which is just to help us do vector search with Pulsegrass and along the same lines, Neighbor, which just integrates that for Rails. A lot of those examples were really built before the projects I mentioned were very mature, with the exception maybe of ⁓ LinkchainRB, which is why we were basically doing the integration or doing everything basically

by hand, so to speak. ⁓ It also helps in the sense that our team is very interested in this AI developments, as I'm sure everyone is, and excited to get to work on some of these internal tools as well. And I think it's important to know what's underneath you. And so I've been kind of focusing on that a little bit as well so that we can shape that knowledge before we actually go and use a library, right? Like you're using a library to build an agent. So what actually is an agent, what's underneath this, what's happening?

A lot of times we also give, we like to give fancy names for things that are actually not that complicated.

Valentino Stoll (43:44)
That's true. And I happen to agree if you're just like summarizing something, maybe you don't need to pull in like a giant library like lane chain RB, ⁓ you know, to run that single, you know, API call. ⁓ Well, thanks for sharing. I said I'd keep it brief. Well, honestly, we should have you back on to just ⁓ tear through all of...

Joe Leo (43:52)
Mm-hmm.

Amanda Bizzinotto (43:53)
Yeah.

Thank you.

Joe Leo (44:03)
Yeah, no, that's good.

Valentino Stoll (44:09)
all of these great libraries and how you're making use of different things. Because ⁓ I think your applications ⁓ from a raw perspective ⁓ are ⁓ very well put together. So I'd be interested to find out more at another time for sure. ⁓ Cool.

Amanda Bizzinotto (44:24)
Thank you. Yeah, absolutely. Happy to.

Joe Leo (44:25)
Yeah, me too.

Amanda Bizzinotto (44:27)
Have a great.

Joe Leo (44:30)
All right, so let's move into the final segment. ⁓

V, what do we call the final segment? This this is escaping me. This is interesting Ruby stuff. ⁓ At the end of each episode, we like to go through and just share one thing that has been really interesting to us. One thing that's, that's kind of, you know, either inspired or, ⁓ or just given us ⁓ something interesting to think about. And so ⁓ we'll do that this week and I'll start, I have two. ⁓ The first is that the

Valentino Stoll (44:39)
I do not have ⁓ an LLM open. ⁓

Joe Leo (45:05)
The Well-Grounded Rubyist Edition 4 is going to be released in its Early Access program. So if you've ever seen this book, any of the first three versions, ⁓ another one is on its way. And if you are interested in getting a discount on the Early Access, ⁓

Manning book of the well-grounded Rubyist. We will have codes for you in the show notes and in the substack email. So take a look and and pick up a copy the other exciting thing is that Phoenix is making a a public release this month. We are going to have a special artificial Ruby event on October 15th

Our lead engineer, Steve Brudds, is going to come talk to us a little bit about Phoenix. And there on that day, we will have a private beta sign up for anybody that's there and wants to join ⁓ and wants to check it out, followed by a larger release about a week or two after. ⁓ But ⁓ we're really excited about this. Until now, Phoenix has really just been sort of enterprise B2B. We've gotten some great customers and some great feedback, but we're ready to...

to open up the doors a little bit and let everybody in. So look for that coming a little bit later this month. All right, how about you, Amanda?

Amanda Bizzinotto (46:37)
Cool, yeah, well, we've just released our automated Rails roadmap, so Rails upgrade roadmap, four weeks ago. So yeah, it's available, it's free to use, and you can find it on the Fast Ruby website. I think we've also linked it here in the chat, but yeah, you can use our little automated roadmap. So that's very exciting. And I think in terms of what's next for us,

Joe Leo (46:56)
we have.

Amanda Bizzinotto (47:05)
Yeah, like I mentioned, we're working on agents that can potentially pair with one of our engineers and make Rails upgrades a little bit more effective. It's still early stages, but we're pretty excited about it and we're hoping to be able to test it in a couple of months. So that's been really interesting.

Joe Leo (47:25)
Very cool. And yeah, we'll have that in the notes. We're excited about that.

Valentino Stoll (47:27)
Yeah. ⁓

Yeah, I'm gonna have to give that a spin and see what kind of plan it makes. I'm curious.

Joe Leo (47:35)
Mm-hmm. Yeah.

Amanda Bizzinotto (47:36)
Nice.

And if you have any feedback, just let us know. We're always happy to hear how we can improve it.

Valentino Stoll (47:41)
Yeah, you got it.

Yeah, mean, upgrades are so painful. And I think creating a roadmap, wouldn't even thought of that. I would have just gone and done it and then left notes for somebody. And then just have CI try and keep trying.

Joe Leo (47:58)
Yeah.

Yeah.

Valentino Stoll (48:05)
So

I love the approach to create a plan and have something more definitive. I think everybody needs that. So yeah, I appreciate everything that you guys are putting together for that. It doesn't really work. ⁓

Joe Leo (48:13)
Mm-hmm.

I appreciate your brute force method, Valentino. I'd like to see it at work. Just force it. Forced upgrades by Valentino Stoll is coming. Forced upgrade gem.

Amanda Bizzinotto (48:23)
Hahaha

Yeah.

Thank

Valentino Stoll (48:33)
brute force your way through GitHub AI action.

Joe Leo (48:35)
Yeah.

Valentino Stoll (48:41)
Yeah, don't do that, please.

Amanda Bizzinotto (48:44)
I mean, we have some LLM suggesters deleting your tests to fix them. So, you know, no tests, no failures, just...

Joe Leo (48:48)
yeah, that's a good one. Yeah, I know plenty of people that have taken that advice.

Valentino Stoll (48:51)
It's true.

I've seen too many LLMs just be like, okay, I guess I'll just remove this test because it's not related to the changes I'm making. ⁓

Joe Leo (49:00)
Yeah. Right.

Valentino Stoll (49:07)
Okay. I have one to share. ⁓ Actually, it came from ⁓ the last ⁓ artificial Ruby meetup ⁓ and it's a Swarm UI. So there's a Claude Swarm project from Shopify ⁓ and the same ⁓ author of that, ⁓ Peruda, he has this ⁓ Swarm UI, which is a Rails app ⁓ user interface for Claude Swarm.

If you use the ⁓ Cloud Swarm project at all, which I recommend you also check out, there's a nice now Rails interface where you could create projects and spin up coding agents to perform tasks in a nice user interface. And it's pretty fun. ⁓ I haven't gone too deep into it, but my preliminary tests have been successful. So yeah, kudos to him. Go check that out.

Joe Leo (49:51)
⁓ yeah

Yeah, I like this.

Very cool, thank you.

All right. ⁓ Well, I think we've reached the end of our show. Amanda, I want to thank you very much for being such a great guest and enlightening us. ⁓ And we wish you all the best in the continuing work, especially all of these internal tools and the work that Blue Labs is doing. ⁓ You know, these kinds of things are important as we continue to...

⁓ mature as a community and ⁓ as language and tool builders around AI.

Amanda Bizzinotto (50:43)
Thank you, yeah, I appreciate it. Thanks for having me. This has been a lot of fun. Thank you.

Joe Leo (50:48)
Sure thing. We'll see you again soon. All right, everybody. Thanks for listening and we'll see you again soon. Bye bye.

Amanda Bizzinotto (50:51)
See you soon.

Valentino Stoll (50:55)
Peace.

Amanda Bizzinotto (50:56)
Thank you, bye bye.

Want to modernize your Rails system?

Def Method helps teams modernize high-stakes Rails applications without disrupting their business.