Episode 19

You Can't Vibe-Code Trust: Scaling AI Safely with Bekki Freeman

with Bekki Freeman

You Can't Vibe-Code Trust: Scaling AI Safely with Bekki Freeman

About Bekki Freeman

Bekki Freeman is a staff software engineer at Caribou Financial and the organizer of Rocky Mountain Ruby, held annually in Boulder, Colorado.

About This Episode

Valentino Stoll and co-host Joe Leo open the episode noting OpenAI is winding down its Sora video app and discuss the broader difficulty of building durable AI businesses. Guest Bekki Freeman, staff software engineer at Caribou Financial and organizer of Rocky Mountain Ruby, shares details on the conference (Boulder, Colorado at eTown, September 28–29; CFP opening soon).

The conversation focuses on safely scaling AI use in an 8-year Rails monolith: preparing messy codebases with dead code and metaprogramming, strengthening test harnesses and coverage, improving documentation, and being explicit about desired patterns rather than letting AI copy existing bad ones.

They discuss PR review bottlenecks from increased AI-generated PRs, ideas like specialized AI review agents, stronger RuboCop rules, pairing and mobbing, and remote knowledge-sharing practices, plus security cautions and what AI may and may not replace—tech-debt work versus taste.

Full Transcript

Valentino Stoll (00:01.047)
Hey everybody, welcome to another episode of the Ruby Air podcast. I am Valentino Stoll, joined by my lovely co-host Joe.

Joe Leo (00:09.549)
Hi, I'm the lovely co-host. I've got something to hit you with right off the bat, Valentino. Sora is no mora. OpenAI announced today that, let me get this right from CNN, is winding down Sora, the video generation app. launched to so much fanfare last year. And I have a question for you. What are you going to do? The deal with Disney that was announced in December is now dead.

Valentino Stoll (00:19.479)
You

Joe Leo (00:38.935)
You can no longer automate your favorite Disney characters. What are you going to do with your time?

Valentino Stoll (00:46.593)
find another model. I do think it's funny that they couldn't afford video.

Joe Leo (00:48.271)
Find another model. Yep. Yep.

Joe Leo (00:57.358)
Yeah, it's true. It's true. Well, you I read it and I thought, what I first thought was like, kind of like, okay, good. And my second thought was, actually, it's really hard to do some of this stuff. And if you're trying to focus on everything, I mean, it's a little bit of a response to people who are just like, well, in two years, you know, AI will just do X, where X is anything. Movie production, you know.

a full accounting system, like whatever it is. And it's not because AI can't do it or it's incapable. It's because a business has to be built around this and it's hard. And you know, we didn't, we didn't get here with like three years worth of effort and you're not going to just kind of take it over with one company or three companies doing every industry that we can comprehend right now. So there's my rant and thank you for indulging me. Now let's introduce

our guest today, the wildly competent engineer and community organizer, Becky Freeman. We are so happy to have you on the show.

Bekki Freeman (02:05.675)
Thank you for having me. I'm so excited.

Joe Leo (02:08.672)
We're excited too. I'm excited first, and I'm going off script here, not that we really have a script, but I'm kind of curious about your work with Rocky Mountain Ruby and how it's going and what it's gonna look like this year.

Bekki Freeman (02:21.889)
Yes!

We have dates for this fall. So excited. And now you're catching me off guard because I don't have a calendar up in order to tell you the exact date. So it's going to be the end of September and it's going to be in the beautiful Boulder, Colorado as always. Same venue, E-Town, which we all love. And our dates are September 28th and 29th. Would love to see you all.

Joe Leo (02:26.697)
Alright.

Joe Leo (02:40.811)
Mm-hmm.

Bekki Freeman (02:48.747)
We're going to be opening our request for proposals in the next couple months and we'll start getting our lineups set and then tickets will go on sale as soon as we have our schedule.

Joe Leo (02:54.732)
All right.

Joe Leo (03:02.495)
I'm gonna fire off a proposal, I'll tell you what, because I missed the window for RubyConf because of laziness. And I would like to remedy that with Rocky Mountain Depth. yeah. That's great.

Bekki Freeman (03:14.093)
Common theme. Yes. We would love to have your proposal. That sounds great. I will make sure you get the message when we go live.

Yeah. So Spike and I've been doing Rocky Mountain Ruby for a few years now, and we just love it. It is such a welcoming and kind atmosphere. it's, we build in a lot of social time, which some people go, but the truth is in reality, people love it. We have icebreakers. We make sure that everybody is very friendly to inviting a new stranger into their group to get to know. We have open lunch so that everyone can go and talk. And it's just a fun conference. And

Joe Leo (03:40.735)
Mm-hmm.

Bekki Freeman (03:57.506)
Boulder in the fall is gorgeous as well. So a lot of people will come early for the weekend and go hiking and then come and see us for the conference.

Joe Leo (04:06.632)
Yeah, I love that. I do love the getting to know you stuff. I do groan at it when it's announced and then I do it and I always enjoy it. know, Ancient City Ruby, I don't think they have it anymore, but that used to be along the similar lines in Jacksonville Beach where, yeah, you spent about 60 % of the time in conference sessions and then you really went out and got to know everybody, the rest of it. So I'm very excited about that and I love just the...

the is the right word, the premier regional conferences of which this is definitely one. And I really love that you're still doing that work. I know. Yeah, looks good.

Bekki Freeman (04:42.861)
Thank you. We love it. Yep, I'm wrapping my Rocky Mountain Ruby Stella dinosaur right now. Yes, so if you want to see this year's Stella, you have to come to the conference.

Joe Leo (04:53.77)
All right. So I reached out to you so that we could talk a little bit about AI and Ruby. I thought, you you're at a, I guess, relatively new, you're about a year in to your new role. And I thought, well, I bet there's some interesting stuff going on. And when we had just a brief chat, I thought, yeah, this is good. We haven't done a show, you know, quite in this theme before, which is, hey, you know, here's a high risk legacy system.

Bekki Freeman (05:07.469)
Mm-hmm.

Joe Leo (05:21.49)
legacy, I eight years is not that long, right? But it's been around for a little while and you're coming into it. And of course you've got opinions about AI and so does everybody else on your team. And you've got, you know, potential use cases for AI. And then that kind of hits reality. And that's what I want to talk about today. So I guess just give us an overview. You know, what are you working on? Where are you working? And what's the team and the application, the system like?

Bekki Freeman (05:39.053)
Love it.

Bekki Freeman (05:48.814)
Absolutely. Yeah, so I'm a staff software engineer at Caribou Financial and we focus on refinancing auto loans. Our mission is to give people financial freedom and believe it or not, there are people with car loans that are 30 % interest. And so the goal of our company is to get people down to a more reasonable interest rate, save them money every month, get them the gap protections they might need so that if they have a car accident, heaven forbid.

that they don't go into like financial distress because of it. And because we have real customers, we have to be very thoughtful about changes we make in the code. And that means we can't YOLO vibe code, just, you know, sit there, just like in front of our computer being like, Claude, just make this thing for me, because we have real people who depend on our software.

And so we are more cautious about it. And the way I like to think about anything in software is as a system. That's very much where my brain goes. And an engineering organization, along with the software it owns and gardens is a system, many inputs, many outputs, a lot of dependencies. And so where we're at right now is we're trying to scale our use of AI as a software development team from

every engineer doing random things and experimenting to an actual system wide use that has the ability to scale safely. And that's like my fun project right now is trying to figure out how do we scale AI development without destroying our code or reducing our quality.

Joe Leo (07:31.015)
Valentino, that's kind of the job description for you, isn't it? Ed Gusto? Yeah.

Valentino Stoll (07:34.87)
Yeah, pretty much. We have a very similar problem that we're trying to solve and it's ongoing. Right. So yeah, so I'm curious, you know, where do even start with something like that? Like what are your... Yeah, right?

Bekki Freeman (07:44.726)
Yes.

Joe Leo (07:50.949)
Help Valentino because he needs some direction.

Bekki Freeman (07:51.041)
You start at the beginning.

Bekki Freeman (07:56.499)
All right, this is gonna be fun then because I need help too. So we're gonna brainstorm and we're gonna come up with some really good ideas. So I think when I started looking at this and reading all the blog articles as we all do and doing all the research, what I first saw was that the code base has to be ready for AI to come in and build the context it needs. Everything in engineering is about context. If you don't understand all of the moving parts and all of the dependencies.

you can have a problem. And if we have an eight-year-old monolith that at some point thought it should be microservices and then realized that was not the right idea, you also have some like spaghetti.

Joe Leo (08:33.059)
Just wanna cut in here and say every Ruby code base that is at least eight years or older has this problem. All of them have had an identity crisis at one time. Go on.

Bekki Freeman (08:41.864)
I'm glad we're not unique and no shade to microservices, a tool for the right problem, you know, and our problem was not a microservice problem. But we do have a lot of like spaghetti in our code and some things that are half done and some projects that got killed halfway done. And so for the AI bots, for our friend Claude specifically to understand our code, it's going to go down the wrong path.

Joe Leo (08:48.087)
Sure.

Joe Leo (08:53.474)
You

Bekki Freeman (09:10.294)
quite frequently, because it'll go find this class and be like, this is what I can use. Well, no, actually that's dead. Doesn't work anymore. It's just never been removed. And so that's why I'm saying we have to kind of start at the beginning and prep the code base. So we found a couple of very specific issues where we've had to really shore up our practices. So one is in meta programming. I think in Ruby, we love our meta programming. That makes the code a little bit tough to search.

I think Claude's favorite command is grep. And when you can't grep for a specific string, you really do have to be very careful. So that leads to the next safety mechanism, which is test harnesses. We need to make sure we have very robust testing so that if we do break something, that it will be caught in CI or caught before it goes into prod. I was going to say, I don't think that's my train.

Joe Leo (09:40.096)
Hehe.

Joe Leo (09:59.541)
Sorry, that's New York City. I'll mute that.

Bekki Freeman (10:04.616)
My train has a much more aggressive honking sound. Yeah, so then we had to short our test harnesses quite a bit. And there were some areas where we had 20 % code coverage on unit tests. There were some areas that we had integration level coverage, but not unit level, and then some vice versa. And as I started digging through that, I started finding dead code. So now we've removed that. So you can see how like...

building the system starts way at the beginning before we can even use Claude to really improve our code or add new features. And I could go on, there's more and more and more, but the big one that comes next after this is what kind of documentation do we have in the code base so that we can actually give our agents more information upfront so that they have a better starting point for any questions we ask it or any features we want it to put in.

Joe Leo (10:57.96)
Yeah, I like that as a starting point. I have this example from this morning that I wanted you to react to. So this is Codex, but the results are pretty much the same with Cloud Code. I love Codex now. Valentina, I'm ready to fight you over Cloud Code versus Codex, but not at the moment. So, okay, so I am working on a legacy code base, and I'm just in this exploratory phase. I'm gonna start making some changes. So was just like, very simple prompt. I just wanted to see what it would do. And I said, look, I always want...

Codex to be TDD and I want it to be, you know, use best practices for design. It's a code base that I'm not, or sorry, a language I don't work with all the time, it's not Ruby. So I don't know how to really drill in. You always use Git work trees, know, PR would fit, et cetera. Anyway, so it gives me this boilerplate that had this bullet point in it and it said, prefer existing design patterns and best practices in the code base. Keep changes surgical and coherent with current architecture. And I immediately thought of you because I thought,

Bekki Freeman (11:55.34)
voted.

Joe Leo (11:56.337)
I don't want it to use the patterns that are here. I don't know it that well, but I know it's not doing the right things. You know, I want it to use best practices and refactor, you know, incrementally to those, to those patterns. But I think by default codex at least is going to say whatever's here is the way this person wants it. And so I'm going to kind of keep that structure.

Bekki Freeman (12:01.974)
right.

Bekki Freeman (12:15.624)
Yes, and that is such a double-edged sword because on the one hand, yes, we don't want new patterns all over the place because it makes the code impossible to understand. But on the flip side, if you have bad patterns in place, having those replicate is not good. Our example of that situation was we were not using the subject pattern and described class pattern in our RSpec files.

Joe Leo (12:24.541)
Mm-hmm.

Bekki Freeman (12:41.574)
was like, cool, I'll keep doing the same thing. And so we actually had to tell it, no, no, please use those patterns even though they don't exist and be really explicit about what patterns we wanted to change. And that's part of building that system around scaling AI is some of the patterns you want, some of the patterns you don't. And now you have to tell Claude which ones are the good ones.

Joe Leo (12:44.593)
Right.

Valentino Stoll (13:02.784)
Yeah, it's funny, going against the Rails conventions really is just fighting everybody at this point. More than ever. Yeah, I tend to agree with that. Setting up the right conventions even on your own that you do want is a hard challenge. even, know, Claude sometimes will listen to your Claude files and sometimes won't. But...

Joe Leo (13:11.258)
Yeah.

Joe Leo (13:29.252)
Yeah, that's another thing I'm finding, right?

Bekki Freeman (13:31.766)
Yes.

Valentino Stoll (13:32.16)
There are ways that I've found it to work better, because it identifies the CLAWD files in subdirectories. So you can have very specific directories, have their own rules and memories, which is helpful. But I'm curious, you mentioned testing. Testing is kind of critical. I would say don't even start using CLAWD unless you have testing in place and working well.

Bekki Freeman (13:59.276)
100%.

Valentino Stoll (14:02.185)
But how do you deal with the test growth? Do you have any rules in place for, OK, no AI can make tests? You mentioned best practices for trying to get it to use the right subject predicate or things like that. How involved are you? Are you becoming test-driven development?

Bekki Freeman (14:28.861)
It's such a big question. So I'm trying to think how to distill down the answer. We are still developing our rules and best practices around tests. Our test suite is slow already, and we have to make a decision. Do we want to let it get slower by adding more tests and then work on getting it faster? Like, what's our priority? Is it speed or is it coverage? And both are right.

So we have to just choose which is most important right now. And so right now where we are focused is in increasing the test coverage. So a specific example I have, in part of my experiments of developing our system for AI driven development, I just started yoloing some PRs. I didn't necessarily merge them yolo, but I started yoloing them. And I said, know, Claude, go.

refactor this code to use modern Rails practices and go to OOP, things like that, and use test-driven development and blah, blah. And I put up this PR and it actually looked really good. I was like, actually, this is fine. I think this is okay. And then one of my reviewers, my human reviewers pointed out that there were some meta methods.

that Claude didn't notice and that would now just throw a no method error because it hadn't fixed those. And I was like, that's interesting. So a human found this, but Claude didn't, but also our test harness did not find this. And so then I took a step back and I said, okay Claude, we only have 60 % test coverage over here. Please increase this, get all the branch coverage, et cetera, et cetera. And then I merged those two PRs and said, okay, now,

Would it fail on this change? Nope, still didn't fail. It had to keep digging into it. And so I think right now my kind of policy or what I'm doing myself in my experiments is do all the test coverage first in a separate PR before I even try to tackle the rest. So yes, more TDD, but confession, I would much rather have they write tests for me than sit there and do it myself.

Joe Leo (16:13.624)
you

Joe Leo (16:40.856)
Yeah, I get it. We want the honest truth here. I think most people are in that camp. Most developers are in that camp. But then it does become this challenge, both at an individual level and at a team-wide level, which is how do you know, where is it that you insert

Valentino Stoll (16:44.137)
Yeah, same.

Joe Leo (17:09.111)
you know, the human judgment, the human behavior to get the highest leverage or return on your investment. So just answer that one for us real quick so we can... Yes.

Bekki Freeman (17:16.105)
Yeah, it's really hard. Yeah, you're asking me all the hard questions. So if I may take this one quite broad. So we've thought, okay, Claude and Gemini, they can do pretty decent PR reviews in the aspect of like conventions or, you know, no method errors, things like that, unsafe code. They're pretty good at that, but they don't really understand yet, especially these review bots, the full context and all the side effects and all of the unintended consequences of a change.

And so we thought, okay, the human should come in on that aspect, the more complex of like the human can understand all these pieces that Claude can't see and apply that knowledge and that, you know, historical context to this PR and make sure that the design is really good. The implementation of the intended feature design is really

Where I got stuck is I said to myself, that's too late. Have you ever put up a PR that you like lovingly, you know, grew and you sent it through everything and then you put it up and then someone's like, you did this all wrong. This in the wrong place. The design's wrong. You shouldn't have used a service object here. And like, so to circle back to your actual question, pair programming. I'm

Joe Leo (18:14.581)
Hmm.

Joe Leo (18:23.413)
Right.

Joe Leo (18:27.433)
Yeah.

Joe Leo (18:33.622)
Mmm.

Bekki Freeman (18:34.475)
like seriously going back to everything we write should be paired because then you get all of that design collaboration upfront before, know, at the best, most efficient time to have it rather than at the end when the PR is ready to ship.

Joe Leo (18:49.221)
I love this and actually we talked a little bit about this with Kinsey, didn't we, Valentino? Where we were talking about a pair, know, like letting a third, letting an agent into your pair and how that might kind of positively impact the application and positively impact the efficiency. So have you done any of that or has your team?

Bekki Freeman (19:18.293)
We're starting to do more of it and we are quite early on in this system because it is like 30 things that we're trying to adjust at once. So we are working on that. And right now our biggest bottleneck is PR review time. And I would assume that Gusto, you're kind of in the same boat and we have gone from an average of 20 open PRs to 70 because we're putting up so many more PRs, but we still only have the same team size to review them. And

Joe Leo (19:21.907)
Yeah.

Joe Leo (19:25.813)
Yeah.

Bekki Freeman (19:47.787)
That's where I'm building the system further forward is how do we fix the PR review problem. And we haven't solved it, but everything I'm reading says that if you have good test harnesses and you have good canary environments, you don't necessarily have to read the code as much and as manually. So then you do want to have that pair program because you don't want just one person kind of doing this in a silo. want

to have more eyes on it. But that's where we're stuck right now is what do we do about PR review bottlenecks?

Joe Leo (20:25.369)
Let me tell you something. Every company that I know of has this problem. was having a conversation about this just yesterday with one of my engineers on their client. so it's rampant. And of course, this goes all the way up to Amazon, right? AWS, who had these outages a few weeks ago where they have applied something called controlled friction, which is just slowing down and having more people look at the code.

So that's really not an answer, right? Like go slower is not an answer. But, so I don't think that this is solved, but I do know that it's a rampant problem. There are just PR is open everywhere. And of course, you know, it's, it's almost like you don't even want to tell the business that all this is ready. It's like, just deploy it. What's the worst could happen, right? Like we know the worst that could happen because our jobs are on the line. But, but I think it's,

Bekki Freeman (20:55.444)
Yeah.

Bekki Freeman (21:15.294)
Yes.

Joe Leo (21:21.189)
Today, it's a problem without an answer, but I'm very interested to know what you think an answer might be or what it might look like. And then, yeah, I mean, guess, what are you thinking?

Bekki Freeman (21:28.5)
Yeah.

Bekki Freeman (21:33.845)
Yeah, it's a great place to start. So what I'm thinking right now, and this is not a novel idea, this is just for reading what other people are doing, what other people are finding success for, is where we're headed is toward very specific agents that review PRs. So the one agent would be an expert on active record query, query efficiency, query performance.

correct database access. Another agent PR reviewer might be the observability specialist. So it will look and say, hey, I don't see any way that we can actually see when your code is breaking or see if it's operating successfully. So I want you to add more observability here or tweak it for best practices, whatever. And then one joke was like, we need a Becky agent PR.

a reviewer because Becky will always call you out on instance variables in a partial. So you might as well just have a PR review agent that just looks for instance variables and partials. And the joke with that one is that it's not going to be mad at you. It's going to be disappointed that you put an instance variable in your partial. you know, things like that, that just...

Joe Leo (22:36.752)
Yeah.

Joe Leo (22:49.903)
Oh no, it's so much worse.

Bekki Freeman (22:57.412)
do more than static analysis, but kind of look at the conventions and you know, like the you can tell Claude that it's on the Rails core team and it's an expert on Rails and is this following all of the Rails conventions? Is this how you core team member would have written this PR and try to get a little bit closer to at least not having to read like word by word for any little tiny issue in conventions or.

Joe Leo (23:07.277)
Mm.

Bekki Freeman (23:27.466)
logical errors, things like that. So that's one thing we're looking at. And we've started building some of these. We already have RuboCop and we're trying to bolster our RuboCop and get our to-do file from what, 3000 lines to like maybe a hundred. I think once we get down to that point, we won't have to do as much style review, things like that. Yeah, what are you thinking?

Joe Leo (23:43.566)
Hmm.

Valentino Stoll (23:50.751)
Yeah, I'm curious, cause what this sounds like is like the hardest problem that anybody has is really how do you, how do you define meaning, right? Under different contexts, right? Like, what does it mean? Like you could just say, okay, be the best Ruby engineer and follow the best practices. But like, it doesn't mean the same thing for every instance, right? Like you might follow different conventions in a test file that you might in a service object that you might in a front end component, right?

Bekki Freeman (24:04.873)
Yes.

Valentino Stoll (24:20.468)
The meeting of what it means to be a good Ruby program is different in each of those contexts depending on what team you're on, what project you're on. And so I think about this a lot. Like how are you kind of like defining this meeting in a way that can improve future use cases, right? Like if you have like this specification bot, like are you like storing that information in a specific way, in a specific place so that it can resurface when it visits a test file?

What are your approaches to capturing that and saving it?

Bekki Freeman (24:58.068)
the more specific it is, I feel like the better it can be, right? Like, so if you have the Hotwire TurboStream expert bot, it has the meaning of what does good use of all of the Hotwire functionality look like. And so I wouldn't want that when looking at the services for Rails conventions. I'd want them to be very specific to different pieces of our code. And then...

What do we do? We pointed at the documentation, I guess, and then pointed at a bunch of recent blog articles. I've definitely had Claude look at some Rails 6 conventions for our Rails 7 app and apply the wrong fixes and the wrong review comments because we're on the wrong version of Rails compared to what it was looking at. So I think it has to be explicit and specific, but then also concise.

I brevity is important because as soon as the context falls over, it starts doing wacky things. But every app has a different purpose in life. So I don't think all of us can use the same review agents because we all have, you know, we're all solving slightly different problems. So we have different concerns. You know, if you're looking at a query on a million row database table,

You might make a trade off to have an plus one query instead of pulling the entire table into memory, for instance. But you need to like know which issue you're dealing with or which is the most critical to your system. so I, yeah, what is the meaning of meaning?

Joe Leo (26:26.954)
All right.

Joe Leo (26:40.17)
That's all we need to figure out here and then we can go home. I actually, really like your answer, Becky. And it sounds like, you know, the thing that caught my attention with creating review agents, for example, is that, yes, you're right. Your review agents might not be my review agents on my application. And of course I may work on five applications and I need different agents for all of them. But within your company, you might create a review agent and that might be the best thing for Caribou.

Bekki Freeman (26:44.2)
You

Valentino Stoll (26:45.671)
you

Joe Leo (27:07.302)
no matter who the engineer is, right? Depending on the circumstance. Are there, now you're in this position where you're trying to implement changes across a team, which means that you're going to have to deal with humans, unfortunately. And so I'm curious to know what other things have worked and simultaneously with that, what other misgivings or challenges have you had to overcome with the humans?

Bekki Freeman (27:07.689)
Mm-hmm.

Bekki Freeman (27:33.598)
Yes, the humans are the center of everything we do and they're a huge part of our system. So there's a couple things. So people have opinions. That's one lovely thing about humans. We all have opinions. And so we can do this big collaborative exercise of trying to gather all the information in everybody's perspectives, but then who makes the final decision? Because at some point we need to make a decision.

and some people aren't going to agree with that decision. And so making sure that we're following change management practices, that people, even if they don't necessarily agree with the decision, they know why it was made and they understand and can be bought into it. It's not always a fast process, but I think it's really important in terms of building our engineering culture. And then there's also, we have to be respectful of people who are like, am I just writing cloud files that are gonna take my job?

at some point, like I'm teaching Claude how to do my job and that has a fear component with it as well. The training I'm finding is also hard because we are remote, so we're not at the same water cooler. And so we can't find out what people are doing or what processes and policies have been put in place by accident. We have to be very intentional about it.

And if you've heard the adage, people have to hear something seven times before they actually internalize it. Now picture you're the leader in the organization saying, use this Claude plugin before pushing your PR. And you start to feel like a broken record because you have to say it seven times to make sure everybody's actually seen it at least once, has seen the directive or seen the policy change. And so we are definitely struggling with how do we promote.

the changes that we're making, the improvements we're making, and then how do we get people to use all of the tools that we're putting in place.

Joe Leo (29:29.164)
Yeah, well first let me say that that adage that you have to say something seven times before people hear it once, it's as annoying as it is true. In 12 years of running a company, I have found it to be true over and over and over again. And so you just kind of get used to it. I'm just kind of like, well, I I know, I'm saying this for the fifth time, but it's just not, it hasn't stuck yet, so I'm just gonna keep saying it. More importantly,

You've talked about this water cooler effect. I think that's really interesting or the lack thereof because you're remote. And I could have four different conversations with, you know, four different engineers at, at deaf method. And they're going to tell me four different ways that they use AI, even if they're on the same team, even if they're on, you know, working on a similar application. And so I get that that can be a real challenge. What I guess, are there ways.

that you can help your team to kind of keep up with the latest or even be intrigued or excited by sort of developments. Because the thing is, even if you implement something, it's gonna have to change in a couple of weeks, right? Things change so quickly.

Bekki Freeman (30:34.493)
Yeah, yep. We've been trying a few things here and it's tough because we are also across the country. So we have four time zones. so, you know, lunch for one time zone is dinner for the other. it is always hard to get us all in one room. But we have a Friday technopalooza is what we call it, where we just share.

random things we've been working on that are involved with the engineering organization. So sometimes it's product changes. Sometimes it's a new tech related thing that we've discovered or worked on. And lately it's been turning a lot into sharing AI practices and knowledge and AI based experiments that we've been doing. We also have a Slack channel called Knowledge Share where it's like, oh, I found this random trick that's really making my life better. I'm going to go share it in this channel.

And while that is kind of just a scrolling feed of stuff, at least if you see it and a month later it becomes relevant to you, you can actually go search for it. So at least it's there. The hardest part when people say, we just need to document everything, I don't buy that because that's only half of it. The other half is people have to read it. And I think 90 % of documentation that exists on a system is not read by anybody.

Joe Leo (31:38.681)
Yeah.

Bekki Freeman (31:56.265)
and nobody even knows to go look for it. And that to me is like, when people say just go document it or go write up a policy or a procedure, I don't buy it because writing it doesn't actually help if no one knows to go read it. So you still have to keep doing the knowledge share piece of it. So just creating spaces where people can talk and can share ideas has been really important. Another incidental one that I found is we have a deployment call

Joe Leo (31:59.042)
Mm-hmm.

Bekki Freeman (32:25.927)
that we don't do continuous releases, we do scheduled releases. And when those happen, there's a call and only one or two people have to be on that call. But on that call, we're just doing a lot of watching pods roll and watching builds go through. And so it turns into a really great water cooler. So even if I'm not assigned to be on that release, I go to that meeting.

Joe Leo (32:40.162)
You

Bekki Freeman (32:49.481)
because it ends up being a really nice spot to share information with teammates and also get to get to know our remote coworkers. And usually we do end up talking about what our latest experiment is, or I've been trying to add tests to this class and holy cow, it is crazy. And then someone else will be like, oh, I worked on that. So I know that the reason it's crazy is this other thing over here. And you really do get a lot more organic knowledge sharing just these shared spaces.

Joe Leo (33:11.659)
Yeah.

Joe Leo (33:17.279)
Yeah, I like that.

Valentino Stoll (33:17.744)
Yeah, you mentioned a really great point of how do you bring human back into these new workflows that we're creating for ourselves? And that's a great one having these release water coolers. That's fun. I'm curious, where else can we bring human back? And I know, like, you one thing that has worked well recently is, like,

Joe Leo (33:23.254)
Yeah.

Valentino Stoll (33:46.003)
before working on a feature, of doing like a mob pairing session where you have like, you know, five to seven people all, you know, working on the design or like whatever new thing that is going to be worked on as like a collective experience where one person's driving like the agent session. I mean like, no, that's not exactly what we want, right? And like, well, why don't we want that? Right? And like teasing apart, like kind of like doing the old school, like on a whiteboard almost.

Bekki Freeman (33:52.467)
love it!

Valentino Stoll (34:14.362)
but in an agent session on a zoom or something, right? Of like, of breaking apart like the thing, like the problem you're really trying to solve, right? Like what other, what other ways are you like thinking about this kind of problem like, and forcing the like, you know, what problem are we solving and how, you know, how are going to solve it together kind of thing.

Bekki Freeman (34:18.066)
Totally.

Bekki Freeman (34:24.496)
I love it.

Bekki Freeman (34:40.296)
We had, so this is gonna sound, I guess, counterintuitive, but we had some issues this week with some of our pods falling over just running out of CPU and getting like seven of us on the call troubleshooting it, even though was high pressure, high stress, and like nobody wants to be in the situation where we have an outage. It was one of my most fun days at work in the last month.

Because yeah, we were all just together hammering on this problem. And the funniest thing is we are all asking Claude independently what's wrong, what's causing this. And Claude gave us, each of us a different answer. So we had five different smoking guns from Claude and not one of them was the root cause. So.

Joe Leo (35:32.68)
That is kind of shock. Maybe it shouldn't be shocking, but it's shocking to me.

Valentino Stoll (35:33.936)
Ha ha ha!

Bekki Freeman (35:37.097)
And Claude was pointing at this particular sidekick job that we had. We'd refactored a sidekick job to pull more stuff into memory rather than doing more database queries. And Claude was like, this is a smoking gun. This is definitely what caused your issue. High risk, high impact. had siren emojis in its report. It was very, very adamant that this was the issue. And then after we researched it for like 20 minutes, we're like,

Valentino Stoll (35:37.522)
That's really funny.

Bekki Freeman (36:06.312)
that job wasn't even scheduled to run. What are you talking about? It was like, oh, well, I guess I didn't really know if it would run or not. but we, you know, we had this lovely mob troubleshooting session. We all got to laugh about it. We all got to be very human, even while we were using our, our assistants to help us troubleshoot the problem.

Valentino Stoll (36:27.666)
You know that that makes me think like this is like almost like a missing product is like a way to get like Collectively everybody's bots to join in on a single session where there's like a mediator like trying to solve a problem like incident response But like for the but you're human and they're bought to join right like

Joe Leo (36:47.739)
Yeah, I like that idea.

Bekki Freeman (36:48.278)
goodness. So now on the next incident call, only everybody's AI agents will join and no actual humans will join and they'll just... So much for bringing humanity back.

Valentino Stoll (36:54.514)
You know humans, I've ruined the whole thing for you. No, what I do think about this mold book experiment and maybe as a way to sidetrack people into humanity is by getting their AIs distracted in a mold book fashion where the bots are socializing and giving the human space.

Joe Leo (36:55.354)
Yeah.

Yeah

Joe Leo (37:12.293)
Mm-hmm.

Yeah.

Bekki Freeman (37:23.634)
Right.

Joe Leo (37:23.931)
But there was a dating app that came out on the back of that, know, that the agents get together and they kind of work out whether or not the humans would make a good match.

Valentino Stoll (37:24.274)
You

Bekki Freeman (37:28.848)
Right.

Bekki Freeman (37:36.231)
Lovely. We're fairly cautious in how we deploy our agent tooling because we deal with a lot of sensitive data and so we have a very high security bar. And so we aren't doing things like, like Maltbook and OpenClaw where we just can just YOLO go out anywhere and do stuff. We're much more cautious with what permissions we give it. And when I hear some of the stories of people using OpenClaw, I am shocked that

they're doing this with their company's code and their company's infrastructure. And I wonder how long it is before things start going very badly.

Joe Leo (38:17.85)
Well, bad is in the eye of the beholder. If you work at a company where your primary purpose is to fix these kinds of things, yeah, it's probably going just fine. I'm doing fine, yeah. Now I also use OpenClaw and I also like Vibe Code and YOLO, but I only do it on things where I'm the only user. And when I'm using OpenClaw, it's in a sandbox on a digital ocean droplet where the blast radius is very small.

Bekki Freeman (38:25.992)
So you're happy you're you're like, this is great. Keep doing that keep doing that

Bekki Freeman (38:37.255)
Right.

Joe Leo (38:47.575)
You does it surprise me that people are not doing that? Not really. mean, people have, you know, engineers, think included there. There are just a subset of engineers that just have a very high risk tolerance or a very low amount of caring for, for things that go wrong in their company. Either way. Yeah.

Bekki Freeman (39:06.396)
Yes. Yeah, your software has no bugs if you have no users. Therefore, users cause bugs.

Joe Leo (39:11.619)
That's absolutely true.

Users cause bugs. And if I'm the only idiot that's annoyed by the stupid thing I did with codex and YOLO mode, well, it's like a tree falling in the woods.

Bekki Freeman (39:21.906)
Ha ha ha!

Valentino Stoll (39:25.679)
I mean, in all fairness, as developers, we cause bugs. The code wouldn't exist without us, right?

Bekki Freeman (39:26.62)
Ha ha.

Joe Leo (39:31.149)
Yeah, well, yeah, that's fair.

Yeah, yeah. mean, but that is, bring up a valid point there because what everybody is reporting, including here today, is that because of AI code gen tools, we have more PRs than we can possibly handle, which means more code almost exclusively. Right, PRs almost always add more code than they delete. And that means that we have this throughput issue where

Bekki Freeman (39:36.562)
Very true.

Joe Leo (40:03.541)
The answer seems to be for a lot of people, well, hey, we've got to open this up. We've got to get this stuff moving because the bottleneck is now the human or not human review stage. But what that's going to get you if you solve that problem is a lot more code. And code is a liability. So what do do about that?

Bekki Freeman (40:27.589)
If I had the answer to that, I could probably retire. This is the question.

Joe Leo (40:31.233)
Ha ha ha!

Valentino Stoll (40:32.369)
I've got a solution. I call it the Optibot and it's, you know, following the no code presentation he gave a long time ago and it just starts deleting code, you know? And, and you know, like, you know, are you going to, you know, do you really need this and can it be a Google Sheet? Right.

Joe Leo (40:34.869)
Yes, let's hear it.

Bekki Freeman (40:35.002)
Alright.

Joe Leo (40:43.061)
Davide Gramia. You know what?

Joe Leo (40:55.359)
Right, right. I like that.

Bekki Freeman (40:56.231)
You

Valentino Stoll (40:58.673)
You know just going going through code one by one and being like what is this really doing right and be like yeah We don't need this like let me just delete this the test pass right like Almost like almost like a you know the chaos monkey from Netflix You know but like for your code and just like pruning it And be like yeah, you know that there's a reason you don't have a test to cover this It's probably cuz you don't need it right like and then just remove it

Bekki Freeman (40:59.271)
That is so true.

Joe Leo (41:06.475)
Yeah

Yeah.

Joe Leo (41:15.489)
Right?

Mm-hmm.

Bekki Freeman (41:19.205)
Yes.

Bekki Freeman (41:23.822)
Thank you.

Yeah, you know, like a lot of the tech podcasts I listen to, they talk about how people aren't going to buy SaaS software anymore because they're just going to vibe code up their own software to do whatever they need to do. And I am with you, Valentino, that I think more software is not good, less software is good. And I actually see that kind of falling over a cliff at some point where people are like, I don't want to maintain this

know, Trello board, I'll go pay the 20 bucks a month, whatever, because maintaining this Trello board YOLO app is more expensive on my time than just paying the $20 a month. I mean, we only have so much energy and focus and I don't think most business owners want to maintain 20 pieces of software themselves.

Joe Leo (41:59.527)
Right. Yeah.

Joe Leo (42:18.321)
I think you're absolutely right. And that's, that's the thing is people keep considering this stuff in isolation. Like, okay, I'm going to, rip out my instance of Salesforce, right? That was a famous example and I'm going to build it myself. Okay, great. Well, now you've got a CRM to maintain. Are you actually, if you actually do that, which I think is insane, are you then also going to rip out another app and another one? Do I really want to maintain everything that I depend on today?

Bekki Freeman (42:30.811)
Yes.

Bekki Freeman (42:39.345)
Yup.

Joe Leo (42:45.307)
with a bunch of applications that are all being vibe coded and vibe managed or maintained? Absolutely not. So I think that what I think is more likely is that the threat that I can, maybe if I'm an enterprise or I'm a large company, hey, I can just go and build this myself. So you better give me a discount or you better give me this. And that was something that's not an original thought. That was something that was brought up in the Cetrine research paper that sent Wall Street scrambling for a couple of days.

Bekki Freeman (42:59.931)
Mmmmm

Joe Leo (43:14.51)
I don't agree with all of it, but I think that's a smart thing. Like if it's giving you negotiating power, then maybe the business model needs to shift a little bit and maybe you need to be a little bit more thoughtful about what differentiates you as a company other than just the software. But I don't think that's strikingly different. I think that's just, you know, enhancing what has already been true.

Valentino Stoll (43:36.815)
Yeah, you know, it reminds me of Basecamp's whole phenomena on the software industry in general, in that they had one great product that they just focused on, like, exclusively and just iteratively worked on just delivering the quality product of that very specific thing. And they started getting rid of stuff, right? Like, they had Campfire as a chat service. And they're like, you know what? We don't want to deal with that. We're not going to develop that anywhere.

Joe Leo (44:04.656)
Yeah, well, neither did anybody else. That was a good move. Yeah.

Valentino Stoll (44:06.607)
Yeah, right? Nobody else does. I mean, now they've now they've like revisited for some reason, but like, you know, they're not they're not like offering it as a service, right? Like you're not paying them money for it. Right. And like, you know, they there's like a proliferation of that idea. Right. Like, OK, like we have a very specific thing. It's giving us customers and we're happy with the customers we have. Right. And like we don't need this exponential growth like month over month in order to sustain that like state of being. Right.

Joe Leo (44:11.334)
Okay.

Joe Leo (44:15.378)
Mm-hmm.

Valentino Stoll (44:36.185)
and like I thought that that was okay, that's the future, right? Like people are gonna really dig in and they did for a bit and then they're like, well, you I want, you know, more money. And everybody thought, I could just, you know, if I just add on all these other features, like we'll attract more customers, right? And then you get those customers and they're like, why doesn't this thing work? Like in the corner of the app that I wanna use it for. And you're like, well, we'll get there.

Joe Leo (44:57.786)
Right.

Joe Leo (45:02.138)
Yeah.

Valentino Stoll (45:02.801)
And feel like this is the same effect that we're seeing, right? It just like, with more growth, right? And more like adoption. like, yeah, sure, like you can vibe code your thing and it'll serve like a very specific purpose to start and then people will be like, oh, what I really want to do is be able to call this thing from my phone and just have it talk to me, right? And then like, oh yeah, how hard could that be, right? And somebody builds that and...

Bekki Freeman (45:03.185)
Yeah.

Joe Leo (45:26.169)
Right.

Valentino Stoll (45:28.229)
then it has bugs and issues and like, well, I can't really deal with that right now, but just stop using that for a little bit, right? That's what ends up happening. you get these people that get adopted and they start to get excited and then it falls apart on them and they get less excited and less excited and more frustrated. And then they're just like, all right, I'm just gonna pay somebody else to do this thing. Right? That does this very specific thing.

Bekki Freeman (45:38.001)
Right.

Bekki Freeman (45:47.559)
Yes. I was going to pay somebody. And I think, yeah. So like WordPress has been around for what? 40 years at this point. And I see so many small business owners that, you know, they got WordPress because it's like, you don't need a developer. You don't need to code anything. And they, 40 years later, the people who are using WordPress, they still don't know how to manage plugin updates without their whole site going down. And so.

Joe Leo (45:50.842)
Yeah

Bekki Freeman (46:17.126)
When I think about people who are still struggling to do the minor IT work of their low code systems, I just don't envision those same business owners being excited about maintaining their AI agent software over time. Or like, oh, I got this notice that my queue is compromised. What do I do? And that happens a few times and you're like, this isn't worth it anymore. Even though it saved me 20 bucks a month.

It's not worth it.

Joe Leo (46:49.182)
Yeah, I hear you there. So I want to step back a bit and I want you to think about the AI engineering of the future. And when I say the future, I guess I mean like two years. That's enough time for things to go to be wildly different than they are today. So I mean, what feels real to you and your team and what you can reasonably envision doing and what feels like it's kind of just hype and will fall by the wayside.

Bekki Freeman (47:19.386)
first thing that comes to mind as being more achievable with AI is staying on top of tech debt. The biggest issue when you have a business is that you're trying to write software for the business that improves the business, brings the business forward. And so the things that get deferred are the maintenance aspects and like upgrading rails or...

like switching from single quotes to double quotes in the whole code base, like those kinds of things. And so I definitely see AI being an unlock in being able to maintain modern standards and software in a more ongoing organic way. I think that's going to be huge. And we're already using it to really accelerate our tech debt, our tech debt management today. So I could see in like two years that just being

Joe Leo (47:49.004)
Heh heh.

Bekki Freeman (48:14.808)
a thing that happens every night automatically without us even having to be involved. The thing that I am still not sure is going to be ready is, I guess some people call it taste. Like I could tell the software I want a Trello board, but that, I can tell the agent, I mean, that I want to, I want a vibe coded Trello board, but I can't.

Joe Leo (48:19.947)
Mm-hmm.

Bekki Freeman (48:42.54)
necessarily explain correctly or in the right way of like how I want to flow or feel when I use it. Again, those are human aspects of our software. Like in the end, most of our software is written for human use. Some of it is talking to other software and maybe we'll get further there. But at the end of the day, there are people using it and if it doesn't feel good to them.

Joe Leo (48:51.402)
Mm-hmm.

Bekki Freeman (49:09.166)
And I just don't know how we would explain that to an AI agent.

Joe Leo (49:15.881)
Yeah, I think that's insightful. Somewhere between there's no accounting for taste and hey, just building off of what was already done is kind of easy. It was always kind of easy. Now it's easy and fast. But that doesn't mean that you're going to be able to craft what the next thing is that people want to use and see and feel and touch. Do I have that roughly correct?

Bekki Freeman (49:41.434)
Yeah, I think that's a good way of looking at it. So almost like we're all gonna become product managers where we have to be able to describe the purpose of the system in a way that the AI agents can understand.

Joe Leo (49:54.025)
Uh oh, we're going have to cut that out. We have to tell Paul to get that. Yeah. Okay. I'm sorry to interrupt you. They will all become product managers. Uh, and what else, what else happens after that? It's still reeling from that one.

Bekki Freeman (50:02.758)
That's right.

Bekki Freeman (50:10.038)
I mean, maybe the apocalypse? don't know.

It's an exciting time because I'm not sure anybody really knows where we're headed right now.

Joe Leo (50:22.801)
Yeah, you're right about that. But we love getting people's sort of read on the future because yeah, because a lot of people see things differently and we just don't know. But it's certainly an exciting time we're living in today. Valentino, you have any parting questions?

Valentino Stoll (50:42.492)
Valentino Stoll (50:46.468)
Yeah, I guess, so have you tried Codex? And do you prefer one or the other?

Joe Leo (50:52.457)
I was coming back to it.

Bekki Freeman (50:55.92)
man, you're putting me on the spot. Okay. So I, I love RubyMine. So there aren't a lot of other of the IDEs that I picked up because RubyMine is my favorite, but I love Claude Codes CLI interface. I've just, very fond of it. So no, I have not picked up Codex. I haven't even picked up, what is it called Claude coworker or? Yeah.

Joe Leo (51:20.584)
Claude Cowork. It's like a vanilla Claude bot.

Bekki Freeman (51:27.078)
Yeah. And then Google's antigravity is actually quite good. yeah, antigravity is pretty good. so I'm going to totally cop out on this and I'm going to say they're both great. I love them equally. Sorry, guys.

Joe Leo (51:34.695)
Mmm.

Joe Leo (51:46.855)
There's no way that's true. All right.

Valentino Stoll (51:48.624)
I'll take that as a win for me.

Joe Leo (51:51.911)
I'm just going to tell you right now that you're both incorrect on this. And no, I know. That's true. What I know is that on the small apps that I'm using, Codex is flying past Cloud Code and Cloud Code. Now, like I have superpowers, you know, enabled and all that. And now I'm just like, and maybe you don't use this anymore. But now when superpowers starts planning, I'm like, oh, God, oh my God, just let's just kill me. It's going to take forever. I had it working on some like very simple, like kind of just adding

know, linting and you know, some kind of like boilerplate kind of stuff to an app yesterday. And God, it's like, it's asking me these clarifying questions and then it's creating this plan and writing the plan. And meanwhile, Codex is on another branch just like flying through issues, flying through them. Now, it's smaller code base, not huge like Gusto and not whatever the gap is between us and Caribou, but still, mean, worth a shot. Worth taking a look at.

Bekki Freeman (52:37.359)
Really?

Bekki Freeman (52:53.613)
Okay, I will go assign someone to do that.

Joe Leo (52:56.101)
You have to sign somebody to do that. Sign an agent to test out that agent and have it report back to you.

Bekki Freeman (53:03.983)
Yes, but then I need another agent to summarize the report because I don't want to read it.

Joe Leo (53:07.146)
I know, I know, and then another one to decide. Well, Becky, it's been great having you on the show. Would love to have you on again. Would love to talk about the conference, for example, maybe right before or right after you have it. I wish you all the best of luck in planning and organizing. And yeah, and hope to see you around soon.

Bekki Freeman (53:13.728)
that's great.

Valentino Stoll (53:14.671)
It's funny.

Bekki Freeman (53:23.299)
Yes!

Bekki Freeman (53:29.061)
Thank you.

Bekki Freeman (53:32.823)
Absolutely. And I look forward to seeing you both at Rocky Mountain Ruby, because I know you're going to be the first ones to buy tickets.

Joe Leo (53:38.402)
Well, yeah, yeah, as soon as I, well, I'm not even gonna need to buy a ticket, because I'm gonna get this request, except I'm gonna submit a request like I did not do a couple of weeks ago. And Valentino, will I see you this evening at artificial Ruby?

Valentino Stoll (53:39.439)
try my best.

Bekki Freeman (53:43.269)
Because you're going to get in, yes.

Bekki Freeman (53:48.441)
Love it.

Valentino Stoll (53:54.111)
Unfortunately not. I got some contractors at home. I gotta get back to. Contract killers. No, I'm just kidding.

Joe Leo (53:56.132)
All right.

Contract killers. Contract killers at home. All right, well, I'll be there on my own telling people to stop using Cloud Code. Just kidding. All right, well, thanks. Thanks, everybody, for listening, and we'll see you again soon.

Valentino Stoll (54:13.007)
That's funny.

Want to modernize your Rails system?

Def Method helps teams modernize high-stakes Rails applications without disrupting their business.