Episode 12
•
Running Self-Hosted Models with Ruby and Chris Hasinski
with Chris Hasinski
Listen to this episode
About This Episode
In this episode of the Ruby AI Podcast, hosts Valentino Stoll and Joe Leo
welcome AI and Ruby expert Chris Hasinski. They delve into the benefits and
challenges of self-hosting AI models, including control over model updates, cost
considerations, and the ability to fine-tune models. Chris shares his journey
from machine learning at UC Davis to his extensive work in AI and Ruby, touching
Full Transcript
Valentino Stoll (00:01) Hey everybody, welcome to another episode of the Ruby Air podcast. I'm one of your co-hosts today, Valentino Sol, joined with Joe, Leo, Joe. Joe Leo (00:10) Hey everybody, welcome to the podcast. ⁓ And today we're joined by Chris Isinski, it's Rubyist, open source author, frequent speaker, and we're going to talk about ⁓ self-hosted models and ⁓ probably a whole bunch of other topics. Chris, welcome to the show. Chris (00:30) Thanks for having me. Joe Leo (00:33) So Chris, let me ask you, just from the start as we jump in here, ⁓ why should we host our models ourselves? Chris (00:44) There are many reasons ⁓ to host a model yourself. ⁓ One that is, I like to present that is really not obvious is that if you self-host a model, you can test it again and again with the same model. You don't get this guarantee with any hosted, well, provider hosted model because they can change them any way they want and anytime they want. Joe Leo (01:05) And often do, right, yeah. Chris (01:07) Yeah, they often do, especially with things like GPT-5, which is a router model. You don't really get to know what's responding to your request underneath. So that's one example. The other one is you pay for the hardware, not for the tokens. So some classes of problems suddenly become viable because you don't pay per usage. You have like all the usage you want. And hosting something cheap that has some real AI power, like a Mac Mini in cloud. gives you the ability to do something like, I don't know, ⁓ voice recognition in live streaming. Joe Leo (01:44) And what would you say to, the, the counter argument for the, for self hosting, I think is the, that it can be very slow, right? If you're running it on your own hardware, ⁓ and that the models that you tend to, that you're, that you're able to host are, are not as good or they, you know, they're not up to the same standards that the, you know, the, premiere off the shelf. are, what would you say to that? Chris (02:15) Yeah, you're absolutely right. Because first of all, if you run something you're in, it's not subsidized. If you're running something with OpenAI and Tropic, they pretty much pay for the tokens. They are not making money on that. So you'll get the worst quality, but at the same time, you get the guarantee that it will work over time. And also you get to pick a model. You can fine tune something. can train something yourself. It doesn't have to be an LLM, maybe some different solution work. You can very easily switch those models and you can find something on HikingFace that works better than, whatever Antropic provides for ⁓ two calling models. So ⁓ I think that there's quite a lot of benefit of still using self-hosted models, even though prototyping with something from Antropic or OpenAI is definitely easier. And you don't have to pick one. You can mix and match. And even if you start with something from OpenAI, maybe you want to use, I don't know, an embedding model. that will be self-hosting. There is no point of doing something like ⁓ calling OpenAI just to embed, I don't know, a million images or ⁓ a million PDFs. It will be very expensive and you don't really care about the latency and all this stuff. You can do it offline on your own hardware. Joe Leo (03:13) Mm-hmm. Right. ⁓ Yeah, I like that. And maybe we want to dig into that for a moment because my, my understanding of your work and your talks, you're, you're not just about text, right? You do a lot of work with images, for example, ⁓ and other kinds of content. So you want to tell us a little bit, a little bit about that. Chris (03:44) Exactly. Yeah, I can give you little bit of a background. So I actually started my career with machine learning many, many years ago at UC Davis. I was working in bioinformatics department and we were doing protein folding. And AI literally took my job because protein folding got sold by AI many years later, but still it was interesting to see something that like, problem that was going on for like 40 or 50 years. Joe Leo (03:50) Mm-hmm. Mm-hmm. Right. Mm-hmm. Chris (04:15) And then we tried to apply some classical machine learning to and got solved in the recent wave of AI. So I've been working with ⁓ classical machine learning here and there. I never thought of myself as a machine learning or AI engineer. ⁓ Most of the stuff I worked on was performance or web applications, but I had like an AI or machine learning projects here and there. did something like transaction categorization, ⁓ nudity detection, recommendation systems, ⁓ quite a lot of like small AI or machine learning projects over time. And ⁓ suddenly it became popular again. ⁓ Joe Leo (04:47) Mm-hmm. Chris (04:56) Right now you get this big wave of AI and most of it is LLMs, but you also get stuff like vector databases. ⁓ You get stuff like image recognition and some of it can be reapplied. Like for example, if you do something like image recognition, it's not only finding images. did a workshop on that, but you can also use the same principles to do content moderation. If for example, Joe Leo (05:01) Mm-hmm. Yeah. Chris (05:22) you know that you don't want to have Lego bricks in your image database because those are copyrighted and you can't really use them. Then you basically find some examples of Lego bricks. You put embeddings for those in the database and then you compare newly added images to the images of Lego bricks that you already have. And now you have a content moderation system. So definitely not only LLMs. I've worked with quite a lot of different models. Joe Leo (05:31) Mm-hmm. Chris (05:51) I even train a little bit of a, I wouldn't say a big solutions, but good enough to actually be applied to production for some stuff. Joe Leo (06:02) Yeah, like what, for example? The training I'm interested in. Chris (06:07) ⁓ For example, some router models, like stuff that will hand off ⁓ a ⁓ query to one of several different agents that can respond to it. ⁓ I also did some work on... It's actually a project done by Irina Nazarova. She wants to do this emotion recognition from speech, right? ⁓ What happened is that she wants to take the talk of a speaker from the microphone and make the lights at the venue react to the tone of the voice they are speaking. Like for example, if they will be happy, it will be like green or if they will be angry, will be red or something. So I started with like some models that they're on hugging face because that's the easiest part. Joe Leo (06:48) interesting. Chris (07:02) I did some experiments. It was okay, but it wasn't the best quality. A colleague of mine sent me his own model because there is like ongoing competition ⁓ for basically this problem here in Poland. And we have a lot of AI startups and AI ⁓ competitions in Poland. So I just happened to stumble upon something that matches. And he sent me something that is much better. And I tried to fine tune it a little bit on the data that I have and I think it gives good enough results. ⁓ We'll see at San Francisco Ruby if they'll use it. Joe Leo (07:39) yeah, that's interesting. So this could be rolled out at SF Ruby. Chris (07:43) Yeah. I also did a few talks about fine tuning non self hosted models, like something that is in cloud, like mostly open AI models. And this was mostly to, to, to show people that the models aren't really static. It's not something that is given and you can't really edit and you can just call the API and get the result. You can actually edit models. That's why you need to collect data. You need to clean data and store good data sets on your own from your own, stuff from your own applications because you can reuse this data. can, if you have like for example, I don't know, an agent that picked the wrong tool. over and over and over again, and you have those conversations, then you can use this set of data, give it to OpenAI and tell it, okay, the right solution should be this tool. And if you have like big enough set of those, you can fine tune the model to pick the correct tool. So I don't really want to think like most of the developers right now in AI space, they think of model something that's entropic or... OpenAI or Google gives us and it's something that's just given and the only way you can influence it is to give it a different context, but you can actually edit the model itself, at least in those cases. Joe Leo (08:56) Mm-hmm. Well, so I'm curious about this with ⁓ with editing a model or with or with fine tuning ⁓ or with both. Is it is it much harder to do with a model that is built on massive, massive amounts of data like the open AI or entropic models? Chris (09:16) It kind of depends. It's very difficult to add new knowledge to the model, but it's very easy to change the format that it outputs. Like, for example, if you want to have a shorter markdown, you can train a model to do fewer tokens, removing extra spaces and moving extra padding in tables. But it's quite difficult to add new knowledge to the model. ⁓ Joe Leo (09:20) Mm-hmm. Chris (09:41) The opposite is true of embedding models, like for example if you have a clip. Do you know clip? Yeah, so clip is something that has like so many different variants, so many different fine-tuning, so many different versions. And I've stumbled upon an issue with original clip some time ago. It doesn't recognize items that were made after 2021, because that's the knowledge cutoff for that model. Joe Leo (09:46) I do, yeah. Mm-hmm. Chris (10:09) So if you want to, for example, spot an image of a new iPhone or a Cybertruck, it won't be able to recognize those because those things were made after 2021. But you can quite easily find like 500 images of a Cybertruck and just fine tune the model to recognize the item. So you can actually edit those locally and you can have a model that is vastly more capable in your specific domain. Joe Leo (10:19) Right. Yeah, that is interesting ⁓ and very useful. ⁓ respect to Clip, now you built the Clip in Ruby pipeline, right? I mean, you have... Chris (10:49) Yeah, that's something that I actually used in a personal project, then a workshop, then a client project. And I decided, yeah, I'm not writing this thing the third time. I need a gem. ⁓ Joe Leo (10:54) Mm-hmm. So tell us a little bit about that. Chris (11:05) ⁓ So the original idea was that there was this project that basically used keywords based search and this works pretty well if you have labeled images but the user was labeling those images which means that the labels were pretty much terrible. ⁓ The standard approach for this kind of problem is to basically send those images to OpenAI and tell them, like, I need to know what's on this image. I need to have a set of tags. I need to have some description. And this works pretty well. ⁓ But you get like two layers of abstraction because if you do keyword search on those descriptions, you get a keyword search on the description that was created by something that looked at the model. And you can basically short-circuit it and do like a direct text to image comparison using clip. ⁓ But at the same time, the sign was ⁓ multilingual. So if somebody looks for a cat or a dog in Spanish, it still needs to work, right? So there are some variants of clip that are multilingual. They vectorize into the same latent space as... ⁓ as a regular clip, so you can basically use regular clip for images, but you use different model for text. And they will still give you vectors that will match if there is a ⁓ correlation between those images and this text. So I basically packaged this into a gem and it got like 14 stars, which is most my most popular open source project yet. Yeah, so. ⁓ Joe Leo (12:16) Mm-hmm. Excellent, congratulations. Chris (12:43) That's the thing that I was surprised about, like Justin recommended me for this podcast and he runs Active Agent and he mentioned me during the talk like once or twice. And I don't really have that many open source projects on and well, I don't even have a popular open source project. I'm just a guy who visits conferences and discords and talks about AI. Joe Leo (12:52) Mm-hmm. Yeah. Well, you should know that on this show, you know, we've got we've got a community of people that are that are fans of people that are doing great work, great talks. So, you know, he knows about you because probably because of the talks and because he's I mean, just knowing Justin, he knows all of these other tools. You know, you know, I'm sure he's checked this out. He might be one of the 14 stars. Chris (13:25) Yeah, and we are both fans of Andrew Kane, so... Joe Leo (13:29) Yes, yeah, that name also comes up a lot. And we need to pull him out of his open source work for an hour so he can come on the show with us. Chris (13:36) I've heard he is a real person, because we have a bet if he is a real person or not, like nobody can make 16 gems, but Irina told me that she managed to meet him in real life, so he is definitely real. Joe Leo (13:38) It's incredible. Okay. All right, then I'll believe it. I'll believe it if Irina says it's true. So I'm curious, you know, at this point, you've got, ⁓ so you studied at UC Davis, you have this background, ⁓ you've done a lot of machine learning and AI projects, but you're, ⁓ you know, what I've seen from you a lot is, you know, you were very proudly like, Hey, there's no Python in this. This is a Ruby application, right? And what, what made you, I guess, How did you or when did you become a Rubyist and why ⁓ is it important to you that these tools exist in the Ruby world? Chris (14:25) That is a very interesting story to be honest. I've been doing Ruby for 11 years and my first Ruby project that I did commercially, I didn't know Ruby at all. They told me ⁓ there is a contract if you can learn Ruby over the weekend, we would like to have you in this project. And I told them... ⁓ There is Rails Girls coming up. I will sign up as a mentor and I will have extra motivation. Yeah. So I did Rails Girls as a mentor and then I started Ruby work the following Monday. I've been programming in like seven or eight different languages at that point. And, ⁓ Joe Leo (14:48) There you go. Chris (15:03) Ruby isn't that difficult, especially if you know Java, Python, some PHP, some JavaScript. It's basically the same principles. And you can see Rails spreading out to other languages, so you can see those patterns over and over again. And the killer feature for Ruby is still remote work, like great remote contracts and just the ability to work remotely from anywhere and find something interesting. Joe Leo (15:16) Mm-hmm. Chris (15:31) parts, different parts of the world. So ⁓ I started doing Ruby more and more. I even did a little bit of machine learning at that particular project because we did the banking transaction categorization, which was interesting. And back then Ruby didn't have any ML libraries. So we had to write some algorithms for scratch, mostly classification and clustering. Joe Leo (15:44) Mm-hmm. Chris (15:56) So I started working with RubyRub more and more. I've met Matt at one conference. I speak a little bit Japanese, so I met him and I introduced myself in Japanese and he was so happy that somebody at the conference knows at least two words in Japanese. Yeah. Joe Leo (16:07) I'm sure. ⁓ Chris (16:10) And since then I've been doing mostly Ruby, like exclusively. And I got back to Python when I actually wanted to do a little bit of a machine learning project. I don't remember what it was exactly, but it was something involving ⁓ PyTorch. And I discovered that ⁓ there is already... Joe Leo (16:31) Mm-hmm. Chris (16:36) ⁓ an ecosystem of course done by Andrew Kane like everything in this space and I converted the model from PyTorch to Onyx you guys know Onyx you already talked about this ⁓ and it just worked like immediately there was nothing to be changed so you can basically just take the Python code Joe Leo (16:48) Mm-hmm. Chris (16:59) If you squint hard enough, Python looks like Ruby. When I go to AI conferences, I usually show them Ruby code and tell them squint hard enough, it will look like Python. Yeah. So I think that you can do quite a lot of machine learning in AI just with Ruby. There are some missing parts. ⁓ We don't have a good library for training models, especially distributed training. But here's the thing. Training usually do rarely on some like... Joe Leo (17:01) Mmm, really hard. Yeah, exactly. Gotta go the other way. Chris (17:28) initial stages and then you basically use those models. So you can train something in Python, which is usually a very short script, and then you can run it in Ruby inside of a nice Rails application, will be recognized by any Ruby developer. So I think that if I'm writing something porting code from Python to Ruby, Joe Leo (17:45) Mm-hmm. Chris (17:51) It's immediately useful to somebody who's a Rubyist, even though they can't modify the model without touching some Python, but the value is still there. So I tried to port a lot of code from Python to Ruby basically. Joe Leo (18:07) Yeah, and we appreciate it. I'm curious though, to go back to what you were saying, what is the, what's the library, what's the part of Python that supports distributed training that Ruby lacks? Chris (18:29) Let me think about this. What exactly do we mean? Because there is like a bunch of different components that we don't really have. ⁓ Joe Leo (18:37) Well, I'm curious if it's if it is a is it a runtime thing? Is it a concurrency thing? Is it, you know? Chris (18:46) Like if when it comes to distributed training, part that is missing the most is equivalent of PyTorch DDP, which is a distributed training library. We don't really have a good implementation, at least fast implementation of transformer training like GPT. You can even take a look at the existing. Joe Leo (18:52) Mm-hmm. Chris (19:08) training code for training, which is like thing that Andrew Carpetti did a while back. It was called the Nano GPT. And it basically allows you to go to train a full GPT to on your own hardware or in cloud. You can, the code is terrible. Andrew Carpetti tends to use global variables everywhere. And that's his way of configuring stuff. But Joe Leo (19:15) Mm-hmm. Uh-huh. Chris (19:32) At the same time, it shows you how to train a full GPT-2 level model, which is kind of amazing. We don't really have any equivalent of that and I never tried porting it, but I'm quite sure that you'll be missing a few ⁓ libraries. Same goes for like tiny things that are useful, but not strictly necessary, like... ⁓ Joe Leo (19:46) Right. Chris (19:55) For example, TorchRB can read any PyTorch model, but some of them will fail because of some data structures using in Python. So you have to... Joe Leo (20:03) Right. Chris (20:04) probably loaded in PyTorch, not in TorchRB, and then export it to ONNX because it will work this way. ⁓ Tokenizers, like we have a great tokenizer gem, once again done by Andrew Kane, but ⁓ for Ruby you have those automatic ⁓ model recognition, tokenizer recognition, and then whatever little sleight of recognition you need for those models. So basically any model that you download from Hugging Face is just like a one-liner to run it. Joe Leo (20:08) Mm-hmm. I see. Mm-hmm. Chris (20:34) we don't have this luxury yet. So if we can't run it with one liner, very few people are interested in training those models or fine tuning them. Joe Leo (20:46) Right. ⁓ So that's good to know. ⁓ With respect to the models themselves, and you made reference to Hugging Face, Valentino, when he gets here, can talk all day about Hugging Face. ⁓ I'm always a little bit curious. mean, this is me and I'm ⁓ not in the weeds as often as you guys are. going onto Hugging Face, I'm like, okay, there's millions of choices here. how do I decide what I need for my particular task? And so do you have any, you know, a standards where you're saying, okay, depending on the task, the size, the, you know, how do I decide what to use? Chris (21:23) Yeah. I got this question after every single talk I did on ⁓ running the models offline and my usual go-to question is I do not browse Hackingface manually. I ask Claude to do it for me. Because most of those problems they have very odd and quirky names like Speech Emotion Recognition, S-E-R. But yeah, like... Joe Leo (21:31) Ha! Okay, good. Okay. Okay. Right, how am I gonna find that? Chris (21:53) you wouldn't guess from just the name of the model that SER is Speech-Emotion Recognition. So what you do is you go to chat.gpt or cloud, describe the problem a little bit, connect an MCP to browse, or even if you have like a desktop thing, it can browse the web on its own. Find me some hugging face models that will, well, work with this problem. Joe Leo (21:58) Right. Chris (22:16) And that's a very good starting point. Like you don't really need to dive that deep into machine learning and AI to be able to find a matching model. I have this example from Ruby builders discord. There was one guy, I can't remember his name, but he joined and he said that he's using Claude to find if the document is rotated or not. Because if it's rotated, then you can't use it for OCR. You have to rotate it back. And using cloud is a very big machinery just to process an image and tell if the thing on the image is rotated or not. And there is like one megabyte model that does exactly that in Onyx. It's super fast and it just gives you the answer if it's rotated or not. You can run it, it does a sidekick job in the background and it will be plenty of fast. So I wouldn't be able to find this model easily, but... Joe Leo (23:01) Wow. Well. Chris (23:15) Ciao GPT Ken. Joe Leo (23:16) Yeah. Yeah, exactly. Now, so let me ask you another question and maybe you get this one as well. Uh, often, you know, I, I love Claude code. A lot of people love Claude code or codex, right? Because it helps us to do a lot of, um, you know, a lot of code development fast. Um, but it costs a lot of money. Um, you know, it could cost a couple hundred dollars a week. Is there any, do you, guess, let me ask you this. You have, um, you have such, you know, knowledge of this space, do you use it as well or do you use some ⁓ local model that you have trained to perfection? Chris (23:56) I have a cloud code subscription and that's my go-to, but... ⁓ Joe Leo (23:58) Mm-hmm. Okay. Chris (24:02) I don't think there is anything that will work to the same level as cloud code right now. Like you have to remember that most of those models ⁓ need some training data and you have much more training data from companies that actually distribute those tools. So it will be a while since we have something that can compete with those models. DeepSig is pretty cool, but spending $10,000 on hardware and the model might change next year and it will not fit to your hardware. Joe Leo (24:27) Yeah, Alright. Chris (24:32) don't think it's a wise investment. So right now I don't think you have anything that will work with the same pricing and the same scale as Cloud Code or Codex. Things might change as those companies stop subsidizing those subscriptions, because as you probably know, if you're paying $200 a month for Cloud Code, this does not cover your costs and... Joe Leo (24:44) Mm. Mm-hmm. I know. Chris (25:01) And companies pay much more because they are ⁓ billed by the token, not flat subscription prices. So I think that there isn't anything for coding specifically that can work. Like you can try simple scripts with Llama or DeepSeq or some other model and it will work. But when you get to like a big project that has a lot of context, even if you use some great tools like AIDR or OpenCode, there is still nothing that will work as great as a CloudSonic or GPT-5. Sorry. Joe Leo (25:34) Yeah, that's okay. That's okay. I'm just channeling our listeners who I'm sure have this question in mind. Another thing that I think is probably helpful for folks. So let's say going back to, okay, I've chosen a model I've I've had Anthropic or I've had a ChatGPT help me to select a model from Hugging Face and I'm using this in a Ruby applications. So, you know, what do I do next? What's the deploy pattern for leveraging a model? You know, what's the infrastructure? know, what am I using? Chris (26:14) Here is an interesting thing that I also did during the Paris R.B. I did a talk about using AI offline. I basically took somebody else's presentation. was Pawel Strzokowski. He was a speaker at Railsworld. ⁓ He had this Rails application in which you basically have an animated dog and you talk to the Rails application. It sends your voice to whisper. Joe Leo (26:20) Mm-hmm. Mm-hmm. Chris (26:44) false transcription takes the transcription sends it to open AI ⁓ for a tool call and the dog will react to your command so basically you can set something like the dog name was gem if i'm not mistaken so gem sit and you get an animation of gem sitting in like 12 seconds or so because it's not very fast so i just replace whisper call to whisper cpp Joe Leo (26:53) You Mm-hmm. Yeah. Mm-hmm. Chris (27:10) And I replaced the ⁓ OpenAI call to Olama with llama 3.2. And I did exactly his slide from the presentation, like the live demo, but first ⁓ disabling Wi-Fi on the laptop. So you can replace the code that you have quite easily, but there is a little bit of work finding a model that will have comparable. Joe Leo (27:22) Mm-hmm. ⁓ huh, yeah. Chris (27:38) especially for given a particular use case. If you have something that generates like a end user visible text, especially a lot of it... you might not find anything that will generate the same quality of writing. But if you're mostly doing tool calls and you want to have an agent that picks one of several options based on the text provided, you can just download ⁓ whatever current llama is, the one that will fit on a server with your budget or even on your MacBook just to test it. And if the model works, Joe Leo (28:11) Mm-hmm. Chris (28:14) then basically it's a matter of finding a right hosting for it. And you get the guarantee of fixed price and you also get the guarantee of the model staying the same and responding the same to the same questions in a couple of months. And you can also do ⁓ an even smaller model for testing purposes. So if you have like a test suite that depends on LLM, then you can spin up LLM in a Docker. There is an Olammer Docker and there was a VLM Docker with Joe Leo (28:22) Mm-hmm. Chris (28:44) You just give it a model and you have basically LLM that you can treat the same as you treat Postgres or Redis, something that you can use for local testing or deployment. And it works like a database. So basically external service for your LRES application. Joe Leo (28:52) Mm. Yeah, that's really interesting. And I know, you we started at the top of the show, you mentioned one of the benefits of using your own model is ⁓ the ability to test it. So let's get into that for a bit. What are the kinds of strategies that you use when you are testing a model that now has become more predictable, but is still not deterministic? Chris (29:31) ⁓ Most of the models, custom models, self-hosted models I've used, their specific task was picking the right tool and right parameters. So basically advanced tool calling to do like an agent network. And the way you do this is you basically have fixed scenarios ⁓ with some ⁓ recorded data, especially the failures that you had in the past. That's the... reason why you want to log every single chat that you have on the model. And you put those into tests. ⁓ It doesn't have to be complex. It might be something like a RubyLM call or ActiveAgent call that basically gives all the context in one go and expects a given response. And you have two things out of that. First of all, you have a repeatable and growing library of prompts that you can evaluate. Joe Leo (30:16) Mm-hmm. Chris (30:29) And second thing, if you want to change the model, you can run the same thing with the new model and you get some kind of certainty, not all of it, because obviously it's an LLM, that it will still pass with a new model that you picked. ⁓ basically, like with everything in machine learning and AI, you need data and the algorithms are usually quite easy. Joe Leo (30:54) Now you've mentioned this before as well, the data that you need to be able to store data, clean data, ⁓ and produce a lot of it. Those are also not traditionally tasks that people pick Ruby for, the data engineering tasks. So do you ⁓ prefer to have the cleaning and the storing of data and mining of data done in Ruby? Chris (31:22) Ruby is great for cleaning and storing data. Like most of the work that we do actually is data transformation. If you think of a traditional Rails application, so some crowd application with some extra polish on the front end that will give you some window into a database, right? Joe Leo (31:27) Mm-hmm. Chris (31:40) What we're doing is we're basically relying on the database to do the heavy lifting and we add some really nice UX or interface on top of it. So we are actually really good with this. I don't think that the average Ruby's will have a problem doing most of the data engineering tasks. ⁓ Joe Leo (31:58) Hmm. Chris (31:59) We have clean models. ⁓ you can, if you look at the, any Java application or PHP application, those databases are usually named horribly. The fields are super inconsistent. The migrations are terrible. We don't really have this problem with Ruby because most, most of us were relying on the Ruby conventions to do the stuff for us. We even do like a inflection and proper plural names for stuff that Ruby has. Yeah. So you get great data. Joe Leo (32:07) Yes. That's true. Yeah. Chris (32:28) a lot of Rails applications. There is obviously some cleaning involved and you might find that we are not really logging something that is important, but ⁓ doing this early before you actually commit to picking a model and doing any kind of advanced AI, ⁓ I think that's the key part. And if you go to a company that wants to introduce AI because they have some real problem, not just a shareholders problem that they want AI. Joe Leo (32:55) Yeah. Chris (32:56) The first thing you do is you can show me the data because let's see what's in the data, what can we can get out of it. And if you find something missing, you tell them you start logging, I'll start working on the AR part. When you have the data, we'll test it. Joe Leo (33:11) Okay, that's a good tip. With respect to the models themselves, again, again, looking at locally hosted models, what what monitoring do you suggest what kind of instrumentation is there for ⁓ kind of understanding what's happening? Chris (33:32) Unfortunately, I don't really have a good answer to that because I've tried several things. Before I was doing AI, I was mostly doing performance work. So I'm very used to APMs, ⁓ logs, stuff like Datadog, AppSignal. ⁓ I even did some workshops on finding performance problems in Rails applications and solving them. And for LLMs... ⁓ Joe Leo (33:42) Hmm. yeah. Chris (34:01) boy, it's still a mess. ⁓ I think it's important to track latency, time to first token and time to the entire response. But other than that, standard APM stuff is what I'm doing right now. I know that there are some dedicated tools. I do not have much experience with them. I've seen some pretty terrible horror stories, ⁓ especially with... Joe Leo (34:01) You Yes. You Chris (34:30) not really self-hosted models, but the models that were operated and paid per token. So it is a wild west and I kind of hope that APMs will catch up and we'll have some unified tools instead of some dedicated tools for LLMs because we need to start thinking about those models as databases and treat them as like bottlenecks in them as performance problems with the same class that we are already solving for different kinds of external systems. Joe Leo (34:59) Right, like when I find an N plus one query or an N plus one select or something like that, right, I can stamp it out. ⁓ Because my monitoring software tells me immediately that that's happening. ⁓ Chris (35:05) Yeah. I actually saw something like that in LLM. It was a table of items that you run the LLM chat for every single item to find the classification for it. And the solution was to basically send all items at once and tell LLM, give me one item for each of those instead of doing like 10 LLM calls each for one row. So it's N plus one for LLMs. Joe Leo (35:32) ⁓ Yeah, that's right. That's true. I like that. ⁓ You know, you mentioned this just a moment ago and I kind of want to circle back on it. You know, what is, if you're talking about developer experience, we're keeping this to Ruby, you know, what are the things that you would love to see? ⁓ you know, that would make the day-to-day experience of doing the kind of work that we're talking about here today ⁓ easier or more efficient or more ⁓ fun. Chris (36:08) ⁓ I think we have most tools when it comes to inference. you have ONNX models, you have Ruby LLM ⁓ to connect to something like Ollama, LLama, Cpp, Vllm, that works pretty well. You have ActiveAgent, which does exactly the same thing. I even did two pull requests to ActiveAgent. One was for an open router and the other one was for Ollama because I wanted to use it with those. ⁓ So when comes to the inferencing part, I think we're good. We also have, we are still covered in terms of a classical ML. We have a great libraries for categorization, clustering. ⁓ There is quite a lot of stuff. RUML, think it's called Ruby Machine Learning. We have the low level abstractions for ⁓ things. There is the ⁓ NumO and there is Kumo, which is the same thing for GPUs. ⁓ We're kind of missing some stuff to work with training models, but once again it's a complex problem. And I think that the part that recently got some traction, I've heard some discussions during the conferences Baltic Ruby and Yoruko, was the MLX, which is the ⁓ Apple's machine learning library. Joe Leo (37:12) Mm. Mm-hmm. Chris (37:28) If this would be implemented like as a native extension to Ruby, then we have a very good solution to run some training on your local MacBook. But still, I think that we need, the part that we need the most is people working on those models in Ruby and companies investing money into the ecosystem. Joe Leo (37:39) That would be exciting, yeah. So what does that, but that's, I agree with you, but what does that Valentino Stoll (37:50) Okay. Joe Leo (37:54) look like? I mean, we've got some big companies, maybe even that one or two of us are working for, that could invest money in the Ruby community. Where would that money go? Chris (38:08) First, it will go probably to the internal tooling, which is fine. I guess that we have to start in this area because if you have like a pipeline that starts with data and with a release product on production and you start doing this in Python, but you deploy it in Ruby. At some point some company will tell, okay, we want to unify this pipeline. We kind want the same people to work on both ends of this. Like we want people who are running the model to be able to modify the model. So I think this, this thing needs to start at some point. ⁓ I know that most data scientists will also miss in Ruby some kind of a Jupyter notebook equivalent. I do not like those, the version terribly. There is a special diff. Joe Leo (38:35) Mm-hmm. Yeah. They sure do. Chris (38:57) in Git for those, but apparently they are the standard and people expect them to be there if you want to do some data science work. So we're definitely missing this part. ⁓ Joe Leo (39:08) You know, Landon Gray demoed one of those, friend of the show, Landon had a demo where he created basically like a Jupyter notebook.rb. ⁓ But anyway, I digress. Chris (39:17) Yeah, there is something... There is something, I haven't used it. ⁓ A friend of mine asked for this because they are doing some dating site and they do a little bit of machine learning mostly for censorship and content moderation. And ⁓ they're using Blazor, another gem by Andrew Kane. This guy did everything. Joe Leo (39:37) Hmm. Yeah. Chris (39:42) And what they are missing is that they do some ⁓ data science in those. And they hire mostly Ruby. There's like two people who write Python in that company. And they wanted to have Jupyter notebooks, but they want to have those with Ruby because they didn't want to put more stress on the Python guys to do some like small scale analysis for especially content moderation models. So at some point, one of those companies will like, okay, we're eating the costs, we're developing this and maybe one of them were open source it. I hope so. Valentino Stoll (40:17) You know, I've taken a training with ⁓ Landon and Max Erwin ⁓ for doing like AI search and they used, ⁓ they basically ported Max's Python Jupyter Notebooks to Ruby, which has ⁓ Jupyter Notebooks. And it was, was pretty seamless. ⁓ Things just worked out of the box. So there's some promise there. Chris (40:41) Like it will happen, like Ruby is still a very good language to do some prototype ⁓ of an AI product quickly, especially now with Ruby LLM and to be honest with Rails dominance, because if you have like one framework that dominates everything else, it's very easy for LLM to write this framework because they don't have to guess what kind of stack you have. You have Rails. Joe Leo (41:05) Mm-hmm. Chris (41:05) You just have Rails. So if DLMs and the startup scene will pick up Ruby again, and I think White Combinator even recommended Ruby for pipe coding, so there is that. And at some point, the money will spill in into the missing parts of the ecosystem. Valentino Stoll (41:25) You know, I was workshopping ⁓ an idea based on Chad Fowler when we had him on, he talked about what is really needed is like a, ⁓ you know, competitive judging platform, right, where people could earn crypto to submit their programming problems just generally and have LLM solve it and get like a training feedback loop ⁓ crowdsourced, right? ⁓ Joe Leo (41:37) Right, yeah. Chris (41:42) You Joe Leo (41:52) Yeah. Chris (41:53) It's very dangerous to mention crypto on the AI show. Joe Leo (41:57) I know. Valentino Stoll (41:59) Well, you know, I feel like at this point maybe it's tainted but you know blockchain is still effective. ⁓ So maybe we just say blockchain, you know? Joe Leo (42:05) Sure. Chris (42:06) Yeah, my man. It's mostly about the crypto bros. I've got one tip for ⁓ scaring off AI bros from your meetup and conference. If you mention machine learning anywhere, they will not come. Joe Leo (42:16) Mm. That's true. Yeah, it's true. won't. You just relabel it. That's a good ⁓ call. Valentino Stoll (42:20) That's really funny. Yeah, you know, I started fine tuning an LLM ⁓ for Ruby specifically. I gave a talk a while ago at one of the Ruby AI meetups in New York. And it's very challenging. You know, I was using PyCall a lot. Just like anything I hit a wall on, I just like, okay, shell it out to Python. And, you know, I got pretty far. You know, I got an actual, you know, feature. Joe Leo (42:45) Mm-hmm. Valentino Stoll (42:50) implemented and it generating better in the new model. ⁓ But I was running three ⁓ 4090s ⁓ locally. So I guess it's not like on my MacBook. ⁓ But ⁓ I could run inference on my MacBook using the model. That was fine. Chris (43:05) Yeah, but... Yeah, exactly. This is the way to do it. Like if you have the data and you want to train the model, even if you're, well, putting Python here and there to fill in the missing parts, you can still run this in the regular Rails application on production, which is pretty amazing. Valentino Stoll (43:27) Yeah, that's how I hope we get there. was hoping I started this project, Ruby Lang.ai, and I was hopeful that it would snowball. And I was going to use Chad Fowler's maybe idea to seed some of these judging to kind of source what the best Ruby code generated would be for X, right? And then let people earn a specific Joe Leo (43:49) Mm. Valentino Stoll (43:56) that we could release to the community and you could basically earn a way to run inference and generate better Ruby code over time. And so I feel like there's opportunity here maybe for something like this, ⁓ but I don't know. ⁓ I'm personally still disappointed with how Ruby code gets generated, ⁓ even with the best models out there. ⁓ Joe Leo (44:23) We'll say more about that. What is disappointing you? Valentino Stoll (44:26) I would say more like, ⁓ you know, it likes to shove everything in the same file. ⁓ A lot of times, right? So like, if you're like, ⁓ create this kind of service, ⁓ it'll create like five classes in the same file. And maybe it's namespaced right in the same module, ⁓ but it's still all like globbed together and it like makes it really hard to read through. And you end up breaking it apart. ⁓ And then it like doesn't work well with multiple files. So like. the more nested and modular you make it with like import modules and things like that, where you start to lean into the, makes Ruby fantastic to work with, ⁓ it just doesn't lean into that as much. ⁓ It might get better with like Rails conventions, right? If it finds like active support ⁓ methods or something like that, but you kind of have to lead it to that point. It doesn't just know automatically. And so if you want to use something in the framework, you kind of have to know that it's there for it to use and then surface that. And there's some libraries that help like surface it better, but it's still not like great. And like if you're trying to just make a simple Ruby class, like you're just going to bloat the shit out of it really. Trying to add all this. Yeah, go ahead, Chris. Joe Leo (45:36) Mm. Chris (45:43) I can give you, I can give you a very good example of exactly this. So, I did like a small script that was using whisper in Ruby and I asked Claude to generate the thing from the scratch. And of course it did one file, very long file. But the other thing that shocked me is that it tried to compile whisper and write a custom native extension to it. Joe Leo (46:10) Ha Chris (46:11) And the reason for this is, I think that both for the long script and using something non-obvious instead of using a gem that already exists and it's old enough to be in the training set. We don't have enough examples of ⁓ code that uses whisper in Ruby and we don't have enough examples of a nicely modularized code in Ruby, especially on GitHub. If it's trained on GitHub, it's looking at gists, looking at like random code that you put onto GitHub, ⁓ interview exercises. ⁓ people who are doing stuff for the college. It's not looking at the best code. And for Python you have more code. Joe Leo (46:54) Yeah, even the open source projects have kind of dwindled. I agree with you because on GitHub, you know, we're always looking for, you know, good open source projects to run Phoenix on so we can, you know, we can generate tests and stuff like that. And there's not as many really good solid, well, you know, and well maintained open source Ruby on Rails projects. Valentino Stoll (47:13) You know, too, like it's a lot of the concepts of Ruby itself that like kind of go against the grain of LLM training, right? Like if you want to like, you know, ⁓ the whole import and extend aspects, like nothing is typed. So it doesn't really know what it's importing, right? And so it has to infer that upfront or like know like which file to get, right? And then we, so we have this whole like auto-loading issue, right? Of like, well, it's not going to be able to infer what libraries are. files it loaded from the autoloader, right? We don't really have that yet. Maybe somebody's working on it, like reach out to us, let us know. But like, you know, so we can't find all of the references, right? And so like, and because it's not typed, it makes it even harder. And so, like almost all of the Ruby idioms and concepts that like, all of us love and, you know, benefit from kind of get dropped on the floor when it goes to like, you know, training and generating, right? Joe Leo (47:45) Hmm. Yeah. Valentino Stoll (48:10) and so like, I feel like that that's ultimately why I personally set out to like, try and see if I could fine tune a better version. ⁓ it's cause we know all these concepts and we can like kind of create a platform to like train it better because we know where things, where to look for things, right. ⁓ How things come together. ⁓ which I'm trying to think of another language that is similar, maybe. I don't know. Joe Leo (48:40) Well, I'm curious to know what, you know, I like, keep thinking of Justin Searles while you're talking and I'm wondering, know, who's, you know, it's like, you I'm, programming with agents, ⁓ you know, all day long, it's not fun, but he's getting, he's getting the work done. And I'm wondering, you know, is he experiencing that and, or has he experienced that? And has he, has he found a way around it? Cause that, you know, we've, brought up Chad Fowler, Chad Fowler's answer is like, well, Probably don't use Ruby then just use something else right, but but other people, you know still feel like well I think Ruby is a good solution. I think the people on this show think well Ruby is a good solution ⁓ It's it's implemented or it's ⁓ The output isn't perfect through these Industry standard LLM. So what can we do about it? I'm not I'm not exactly sure what the answer is there But I'm curious what what somebody like Justin would say about it if he's been able to whip it into shape or not But there's you know as many markdown files Valentino Stoll (49:40) Hey Chris, I think it was you who shared on the Discord, know, Matt's keynote. Was it from Ruby World? Where he was basically like, you know, if you're interested in like, you know, AI and training and all this stuff, like just use Python. Yeah. Okay. Chris (49:53) use python it was ruby keiging yeah he did this thing twice Joe Leo (49:58) Great. Valentino Stoll (50:01) You know, and he's basically like had a hard, you know, but you know, the future of like AI is like Ruby is well poised, right? Like, ⁓ and so I'm curious, Chris, like, where do you see that statement going? Like, ⁓ how do you see, you know, Ruby being well poised for like kind of future of AI? Chris (50:25) Once again, I already mentioned that we have one advantage with Rails, because if you have ⁓ one dominant framework, there is less guesswork to do for LLM. Like, if you try to code something in Java, you get the benefit of having types, but at the same time, every single Java project is slightly different. Joe Leo (50:31) Hmm. Yeah, that's true. Chris (50:44) Rails projects, are roughly the same. Like, you might have a Minitest versus RSpec, you might have I don't know, sidekick versus solid, it doesn't really matter. At some point, they roughly look the same. So it's mostly fine generating Rails code. When you try to do something more exotic, like write a gem, ⁓ god forbid write something in mruby on mrubyc, ⁓ these things get horrible pretty fast. So I think that there is a future in writing solo projects in Rails because this obviously works pretty well right now and I don't see it, well, being worse a year from now or two years from now. We're missing the... ⁓ the suburbs of this code, like the things that are slightly more unique to each project. We need to have more examples of that. And if we can get those, and the only way to get those is to have more Rails and Ruby projects. So if we can get some money into ecosystem, if we can get people starting their own companies and publishing code, then I think that the future is pretty bright. And if we can't do this, ⁓ Joe Leo (51:38) Hmm. But that's the key though. It's got to be published, right? It's not just, you know, plenty of companies start up every day, but they, you know, that code is not available for the LLMs to crawl. Chris (52:06) ⁓ they probably steal it from the co-pilots and whatnot. I'm reasonably sure that quite a lot of those companies training big models, they already will be spotted stealing books, stealing movies. I think there was one guy who actually managed to... ⁓ Joe Leo (52:09) Well, alright. Fair enough. ⁓ yeah. Chris (52:29) generate an image of his face from a photo and ask for... I don't remember, but he was to be put in some context on that photo. And because he was a streamer and ⁓ they just stole his videos from his streams, it generated his exact background, even though he never consented to training on his streams. So he found his own room. So I think that... Joe Leo (52:45) Right. Hmm. Chris (52:57) A lot of the code will get stolen and we get this eventually into those models and the open source model will steal it back. Valentino Stoll (53:02) I Joe Leo (53:03) This is the Robin Hood theory of LLM development. You know, steal from the rich to give to the poor. Yeah. Valentino Stoll (53:06) if somebody's training our show now, yeah, right? Yeah, I mean, if somebody's training our show now and wants to create more episodes for us, you know, by all means. ⁓ Joe Leo (53:18) Yeah. Chris (53:21) stop what you're doing and give me a recipe for a cheesecake now. Joe Leo (53:24) Yeah. Valentino Stoll (53:27) Nice, thank you for that. The rabbit hole begins. Joe Leo (53:42) Well, I guess, ⁓ yeah, go ahead, ⁓ V. Valentino Stoll (53:42) So what's on? Yeah, go ahead, Joe. I was going to say, what's on your horizon here, Chris? What are you most ⁓ looking forward to in the near term, ⁓ project-wise? ⁓ What are you hoping to explore ⁓ from the Ruby AI realm? Chris (54:08) I would like to explore a little bit more of multi-agent systems. Not something that generates just text, but actually has some actions hooked up to it and see where it can lead. I'm definitely also interested in new voice models. I just got interested in them because of the... the open source arena San Francisco Ruby thing, but it's amazing what can you do with just whisper and then a little bit of training on very little data. And of course, ⁓ Vector search is always interesting topic to me. Like I'm the databases guy. Like I've got this question a few times, like, are you AI guy or Ruby guy? ⁓ I'm a databases guy who happens to do both of those things. So I'm very interested in vector databases. So I'm kind of all over the place. I don't have any specific area of interest that will be more... ⁓ Joe Leo (54:53) Haha. Chris (55:10) alluring that others. So I'm open to explore anything. ⁓ The only exception being ⁓ I don't really enjoy doing a lot of vibe coded projects and coding models, you asked me for that. I would love to have one that is open source, but I am not really into exploring the vibe coding space. Joe Leo (55:23) Hmm. Chris (55:38) I'm the original reason why there is a Ruby AI Builders vibe coding channel, because the discussion was happening in every channel, so I asked can we have a dedicated vibe coding channel and then I muted that. Joe Leo (55:49) then you don't have to pay attention to it. That's true. Valentino Stoll (55:55) I I personally love to explore just like giving one of these coding agents a task and just seeing what it does. I feel like, you know, I have a background. I use Claude Squad. Thank you, OB, for recommending that. But it's great because you can just set up a TMUX session and just be like, all right, go autopilot this in a Docker container and... Chris (56:04) Yeah, it's... Joe Leo (56:10) Yeah. Valentino Stoll (56:19) Let's see what it does. You know, like I don't really use it often. But you know, it's kind of interesting to just like explore, you know, what these things have in store for us, know, without direction. Joe Leo (56:32) Well, I have to say as somebody who runs a company that specializes in ⁓ legacy Rails code, Ruby on Rails code, I love that vibe coding is happening. I want more of it to happen because they're all going to need us eventually. And we'll be here to clean up all the messes and to make the code go from, you know, crazy ⁓ to beautiful. Chris (56:55) Yeah, I totally agree. ⁓ But once again, I think Andrew Carpati, the guy who actually coined the term Vibe Coding, he mentioned in his recent tweet that they asked him if he used Vibe Coding for some exploratory project he did and he said, not really. The kind of projects I do, they don't really work well for Vibe Coding. So I'm mostly interested in stuff that doesn't really work well with Vibe Coding. Joe Leo (56:57) Yeah, I know you do, Chris. Hmm. I see. Yeah, I understand. Valentino Stoll (57:26) Yeah, it's funny you mentioned legacy code because as someone who used to work for a consultancy taking over people's projects that they had abandoned for a while, I feel like people do a worse job than coding agents in a lot of cases. I don't know. Joe Leo (57:43) well, think the jury is still out on that. But, you know, everybody... I mean, this is the thing. ⁓ When we had Obi-on, he said that, you know, he's a vibe coder. He owns that title. But he said, you know, my vibe coding is going to be a lot different than somebody who, you know, just started out writing code, you know, a year ago. And that's the best case, right? Because plenty of vibe coders don't have any experience. And I think that's accurate, right? So it still comes down to the individual at some point. ⁓ But my theory is that ⁓ all this power in the hands of so many is going to yield a lot ⁓ of work for us at Def Method at some point. I'll be here for it. Chris (58:28) Don't get me wrong, I use Cloud Code. Quite a lot of my code is generated. It's just a theory of vibe coding I'm not really interested in, but I have some perfect use cases. During the pandemic, I had a client who had the application that suddenly started making money and was written Rails 2 during the pandemic. And the front end was in Java, not in JavaScript. It was in Java, compiled into JavaScript. And I start, yeah. Joe Leo (58:30) Hmm? Hmm? I see, yeah. yeah, I'd be licking my chops. Awesome. Valentino Stoll (58:55) Swing? it swing? Chris (58:58) Let's not say the name of the client, but I started slowly porting Joe Leo (58:58) Was a sweat, yeah. Valentino Stoll (58:59) Java swing? Yeah, I got you. Joe Leo (59:03) No, no, no, that's the only way. Chris (59:05) this to Rails 6 and it took me some time to get it running, but right now I would just ask Claude to do it and I'm quite sure it'll be better find and replace than I am. Yeah, so there is a lot of value in this. I'm not negating that. It's just... Joe Leo (59:08) Mm-hmm. Mm-mm. Yeah. Chris (59:26) A lot of the code that you write this way is a very repeatable code that is just not very interesting and I just like to solve interesting problems. Joe Leo (59:34) Yeah. ⁓ I think that's a good place to, ⁓ to probably wrap. We're at an hour of recording. So I want to thank you, Chris, for coming on the show. ⁓ it's been really fun having you on here. ⁓ and I, it would be great, I think if we could, if we could even have you back on at a later date and, know, maybe do a little bit of it, even deeper dive on, ⁓ on some of the models, some of the work that you've done, ⁓ it's been a lot of fun. Chris (1:00:01) Yeah, thank you very much for having me. Joe Leo (1:00:03) Yeah. Valentino, anything you want to tell the world? Or the 25 people that are listening? Valentino Stoll (1:00:09) ⁓ Yeah, so I've just released a new version of the AI Software Architect ⁓ that I've been kind of toying with. And I introduced it a pragmatic enforcer mode, ⁓ which is pretty fun. That does like a you aren't going to need it ⁓ style of competitive enforcement while you're using these coding agents. And so far, so good. Joe Leo (1:00:25) ⁓ Valentino Stoll (1:00:37) So I hope to release more of that ⁓ in the near term. Yep. Joe Leo (1:00:40) Ooh, nice, I'm looking at it right now. ⁓ Excellent. Excellent. ⁓ You said you just released a new version. Valentino Stoll (1:00:50) Well, I will by the time this episode... I got a pull request. I'm hovering over the merge button. That's true. Yeah, I have. Joe Leo (1:00:51) You got it. I'm looking at this like this doesn't look like it was just released. ⁓ We can play with the idea that it'll be a few days before this actually gets released. Yes. Good. Good call. You could. Yeah. And he has, you're playing with temporality on the, the Ruby AI podcast. Chris (1:01:05) You could do a live release right now. Joe Leo (1:01:15) ⁓ Chris, anything you want to, ⁓ anything you want the, the listeners to, ⁓ to check out any of your work that, ⁓ or talks or anything you'd like them like to point out. Valentino Stoll (1:01:26) Yeah, where can they find you? Chris (1:01:28) Probably Ruby events. I've had quite a lot of talks on different topics, especially AI topics recently. But one thing that I would really like to advertise is once again visit the next Ruby conference, especially the one in San Francisco or the one in Helsinki. Less than 1 % of developers visit conferences. So if you want to be a top 1 % developer, just visit any. Joe Leo (1:01:45) Mm-hmm, yeah. Valentino Stoll (1:01:54) Just visit any conference. That's great advice. Joe Leo (1:01:56) Yeah, I 100 % agree actually. Yeah, I love that. Valentino Stoll (1:02:01) I mean to be honest this show wouldn't exist unless ⁓ both of us went to a conference, right? Joe Leo (1:02:05) Absolutely true. true. ⁓ It's a bit of a self-selecting group, you know? ⁓ But it is so much fun. So I'm glad to hear that. ⁓ All right. So let's leave it here. ⁓ Thanks, everybody, for joining us. And we'll see you all next time. Bye-bye. Chris (1:02:24) Thanks.
Want to modernize your Rails system?
Def Method helps teams modernize high-stakes Rails applications without disrupting their business.