Episode 74: A Tale as Old as Time

Amos King

Chris Keathley

The Elixir Outlaws now have a Patreon. If you’re enjoying the show then please consider throwing a few bucks our way to help us pay for the costs for the show.

Support Elixir Outlaws

Episode Transcript

Amos: Alright.

Chris: Hey, Amos.

Amos: How you doin’, Chris?

Chris: I’m good. Actually, I’m not good, in like a thousand ways, but I did - it took a year and a half, but I replaced one of our main Ruby things with Elixir just last week. It’s done.

Amos: Nice.

Chris: It didn’t actually take a year and a half. It was like - we just maintained the Ruby thing for a long time and then it was actually like three months of work to replace it. But it’s done and that’s awesome. It feels so good.

Amos: Was it like microservice or what was it?

Chris: It was like a lot of background processing Kafka stuff. So, it was originally written in Ruby. Went through a bunch of iterations in Ruby, and finally, friend of the show, Jason, and I were fed up. We said ‘we’re fed up! I’m done with this nonsense. Give me a runtime that can do more than one thing at a time, please’. And we rewrote it in Elixir into a thing that can do more than one thing at a time.

Amos: What was the process of rewriting, ‘cause I have a few of those and I have lots of ideas of how to approach it and I’m wondering how you approached it.

Chris: Yeah, that’s a great question. I think that’s a really, really great question. ‘Cause I think most of the time, everybody will tell you not to rewrite stuff. Which is, you know. There’s reasons to not rewrite things. I… I mean, it was like a no-brainer in our case.

Amos: What if I call it replace instead of rewrite?

Chris: Yeah, well - I think. There’s all these ways to talk about a rewrite without calling it a rewrite. And, in fact, we did rewrite the thing, but we also made some substantial improvements, and the way that staggered, which I think is the real question you’re asking – is, “How do you go about doing something like that - is, um, it required many, many, many phases?” In fact, the actual code is not what took us that long to do. It’s the rolling it out such that you don’t break everything that required the three months to do. And so the way we staggered the work is - we actually had a bunch of the things that we did in Ruby before we started moving to Elixir.

Amos: Can you elaborate on that or is it internal?

Chris: Yeah, so we essentially did a bunch of pre-work. And the reason we did a bunch of pre-work is I am a huge believer in the idea that if you want to replace a service or a thing with another equivalent thing but in a different language or a different runtime or whatever, you first have to do the replacement and then do the improvement. Like, those have to be two distinct steps. And you can't actually do the improvement until you’ve done the replace step. And it goes back to the stuff that we’ve talked about on the show before, which is kind of akin to API design, and it kind of goes back to the whole notion of, you know, why I think non-breaking changes aren’t really like a thing, and changing only the internals of a thing is like. Opaque to the rest of the world. Which is why I think that’s not really a thing. Because what inevitably happens is that you attempt to replace the thing and what happens is even if you do a one-to-one translation into your fancy new runtime, you’re still going to screw stuff up. Or the semantics of the thing is going to be wrong or whatever. And it’s going to turn out that all the other consumers of that service or those messages or whatever were dependent on the broken behavior. On the thing that wasn’t correct. But they’ve worked around that system. So if you change the payloads. If you change the Kafka payloads you’re producing, even if it’s right now, you’ve potentially and probably broken the consumers of it ‘cause they’re not anticipating it being correct. They’ve built their systems anticipating it being incorrect. So I’m a huge believer in the idea that if you want to improve something - and by improve something, I mean ‘we want to get this thing into Elixir and then we’re going to be able to run this massive concurrency thing and do batch processing and be really highly parallel and concurrent - you can’t actually do any of that work until the old thing is dead. The old thing actually has to be completely replaced before you can do any improvement. And so I’ve been known to do things like replace a service and have it be less good even though it’s in Elixir. I’ve been known to do the completely wrong thing just so it’s compatible with the old thing.

Amos: Are you saying bug-for-bug or are you saying like - make it slower?

Chris: Yeah, like, make it slower - or do it more inefficiently or do the wrong thing and leave notes where it’s like ‘this is incorrect. This shouldn’t be how this is’. Now that we’re seeing the fullness of being able to look at the entire system and be able to understand it - we know this is incorrect but we’re doing it anyway because downstream systems rely on that.

Amos: So you’re - those are all going into the new codebase? Those notes and everything?

Chris: Yeah. Yeah, so you do that - you do all that first. That all has to happen first. You can’t simultaneously extract something from an existing service and also try to change it and improve it. That just doesn't’ work. You have to replace it sort of - whole cloth. ‘Cause if you try to do both, you’ll get bogged down and you’ll never complete it. Or it’s been my experience that you’ll never complete those things. Or - it’s been my experience where if you do it all at once, you’ll never complete the task. And this goes back to another thing - and this goes back to the ‘you’ll never be able to complete it’ - is you need to put it into production as quick as possible without - you need to put it into production as quick as possible without breaking the old thing - transparent to everybody else. No one should know that it’s in production. But it needs to be in production, and so a big part of the fixes we were making that we did in Ruby is making sure that every item we put - most of them were already, but we needed to do some initial extraction of stuff, we needed to ensure that certain notification sending things - it wasn’t possible to send like. Duplicate notifications out. We had to do all of this work ahead of time to make sure everything was item potent. Because if it wasn’t item potent, then we couldn’t run the old Ruby stuff next to the new Elixir stuff, and that was key.

Amos: So your plan was to run them together.

Chris: And we did run them together, essentially. Once we did all of this ahead-of-time work. So we did a bunch of things in Ruby to fix things and to make things ready for us to start porting the launch - and as soon as we ported anything, we pushed it to production because at that point it was safe to do. So then, if for whatever reason we did have to drop the project, like - the world wasn’t going to move so far ahead of us that we couldn’t catch back up again. Because as we moved stuff out of Ruby and into Elixir, we went, got that working, and then went back into Ruby and deleted it. Basically, the window of time where there were two things running in production that could become divergent - where somebody needs to add something to Ruby, and then like. Do you go and update the Elixir one to match the thing in Ruby - and that whole sort of like. Middle period - that you need to like. Make that window tiny. That window needs to be measured in hours.

Amos: ‘Cause then you have a bug fix in two places.

Chris: Exactly. Now you’re back to - this will never get done if you have to make changes in two places - and it’s essentially the same of having a long-living branch. We all know the costs of having a branch that lives for too long - it gets out of touch from master. You have to keep rebasing it, if you diverge too far away, you’re going to have these crazy conflicts, you have to try to keep up with everything… The point being, if you get out of sync too much, it’s never going to get done. So you need to get it into production as quick as possible, and getting it into production needs to be safe. And that’s basically the key - like, if you can get into production as quickly as possible, then you’re good to go. So we did a bunch of work to ensure we could get this stuff running side-by-side for at least a small window of time.

And then, as soon as it looks good and we verified it in production and everything is like looking cool we would go in and delete the Ruby stuff.

Amos: We had like a release of the Elixir app. Watch 30 minutes of time.

Chris: Or, some amount of time. Let it soak overnight.

Amos: Already, already have the Ruby staged with the deletion so you’re not trying to play with it after.

Chris: Yeah, and then once we were done, we just went in and deleted it, redeployed the Ruby stuff and let it do its thing and like we just slowly ported stuff over like that bit by bit by bit.

Amos: So from the time that you actually released the first Elixir bit at all until you remove the absolute last bit of Ruby- How long was that? Was that three months? Six months?

Chris: Yeah, that was like a three-month project. I mean there was like a little bit of work to - I don’t remember exactly the timelines but there was a fair bit of work to like just make sure that was safe and just scaffold a bunch of stuff out and spin a bunch of stuff up and then kind of like incrementally roll these things out and then yeah but by the time it was done - basically, the benefit of that is no one noticed. We had to send an email out to the business that was like ‘we did this thing’. People that are managers knew we were working on it but at the end of it, we sent out this email that was like ‘so now this is just better’. We did it and hopefully nobody knows - ‘cause if anybody noticed that we've done our job incorrectly. If you ever doing a rewrite of stuff and people notice that you're doing it - like you sort of failed. There should never be a cutover point where you’re like now we're going to go to production and now it's like a big deal and now we do this giant cut over like those things don't work they never work and then because - the first step is like they never get done. If your plan is we're going to work take six months and rewrite this thing and then take all of our traffic and pointed at the new thing - well first of all that's never actually a project that's going to happen. You're never going to get that project done. Second of all, even if by some miracle it does get done and like it'll probably take double the amount of time that you say it is going to take what you can and once you get it into production, it'll be broken. Nothing about it will work and everyone will notice what you're doing. And you’ve failed, at that point. That's not acceptable as far as I'm concerned. So - you have to think of it as - how do we get this out today? Like what's the step one that we can take today to put this into production because it has Live in production and we have to be ablet to code over incrementally. We have to - to minimize the possibility of failure. We have to minimize the possibility of user seeing that failure. All those things. Especially if it’s something critical to your business and let's face it the only reason to rewrite anything is because it's critical to your business. The stuff that doesn’t get rewritten is the superfluous stuff that no one cares about. You only right stuff that ends up mattering ‘cause that's the stuff that everybody's touching, everybody needs to depend upon, it needs to be more reliable - all that kind of stuff. Unless you just are really bored, you're not going to go rewrite, you know, the widget service that no one uses or gets like 10 requests every couple years or something like that.

Amos: So, so I'm trying to figure out a way to port stuff over because of the same reasons - like give me a language that can do more than one thing at a time. And we have a pretty crafty code base that we’re trying to move over. My original thought - and you’re kind of changing my mind a little bit, maybe - but I think was to go to something more like what Steve Bussey did at, at Salesloft and that is - take something that doesn't have to write to a database. Like some kind of reporting dashboard type thing as a lot of data on it and move it over and allow it to take advantage of like a gen server and Phoenix websockets and channels and stuff like.

Chris: That can work as well. There are lots of ways to do it.

Amos: But the moving processing over there was probably the bigger bang for the buck if I can figure out how to do that. ‘Cause a lot of stuff that we do we want to be doing in more parallel fashion.

Chris: Right. Yeah, I mean the tricks are always going to be - I actually don’t think it matters too much what you pick. It matters a lot how you choose to do it. So obviously if you can pick a problem that fits this shape which is like - you can put some sort of proxy up in front of all your calls. You know, let’s say it’s a webservice. You can put a proxy in front of that - and actually, as part of this, we did have a web service we need to hit. We actually did extract that functionality as well, as part of this whole big thing. And so literally like one of the first tickets we did as we went into our front door API and added - well, it's a little library. It’s based on something some scientists on Github maintains. I wrote a version of that for Elixir but it basically allows you to execute to code paths concurrently but always returns the result of, you know, your control - like the thing that you want to maintain. But it does things like compares the response time and it also compares the result from both of them.

Amos: What's this song called?

Chris: It’s called Alchemy.

Amos: And it's open?

Chris: Yeah, it’s on my Github. It’s like the first thing I ever created for Elixir.

Amos: Nice.

Chris: But it lets you do things like compare the results and filter the results and that kind of stuff, and so that was literally the first thing we did. And the other service wasn't even really alive yet. It had like an up in point.

Amos: So, Oracle property testing in production.

Chris: Exactly.

Amos: I like it.

Chris So we just had it short circuit our new thing - out of our bad thing - and then just always return a good thing, but we put that in production immediately because then it was safe. It was totally safe to deploy a new service and we can literally track in a data dog graph how close we were getting to correct. For certain things, you're going to be fuzzy, because there’s timing involved and there is asynchrony involved, and so you just got to be close. So we did a lot of like - close checks. Like, how close are we getting?

Amos: So you mentioned data dog. Do you think that having metrics telemetry tech data in the old and new is like a prerequisite?

Chris: Oh, yeah, yeah, yeah. You need to be able to measure all of that stuff. I mean maybe it is for your use case, but it's probably not acceptable to roll out a brand-new service and then double the latency.

Amos: Well, yeah. If you’re doing a service especially.

Chris: Well, and for background processing - with Kafka processing, one of the main things we wanted to do was decreased error rates and increase throughput. That was a major reason we wanted to do this. Um, ‘cause there was stuff in Ruby, man. I’m so over it. Ruby is such a joke. Like and the tools around it are such a joke like. As long as that thing has run in production, if it ever dropped a database connection, it couldn't recover from it, I've never seen it be able to recover from it.

Amos: We've run into that.

Chris: How is that not a solved problem? Like, what in the world, team?

Amos: If a writer goes down and a reader is promoted to a writer, the whole system stops working.

Chris: I've seen it multiple times. The only solution is restart the entire Ruby VM. We tried - I read so much active record source code trying to figure out how that crap worked and that was impossible.

Amos: Our only thought is - we put a - like a catch at the top level of the system ‘cause if I get it my SQL error, disconnect and then reconnect but with active record, I think that you have to do that with every model.

Chris: Oh, so let me tell you about adding giant amounts of tricatches around everything and then including the same four lines everywhere which is - drop the connection, force it to reconnect, make sure you catch that error in case it can't reconnect and so you don't retry and create a crazy loop blow the whole thing up. Yeah let me tell you all about how that. I literally - we spent so long just trying to get active record to reconnect to databases that we eventually gave up. It was like - we just can't do it. It's actually impossible, as far as I can tell. Every permutation of try-catch-rescue-active record-no sequel error - oh my gosh. Like, I was at my wit's end.

Amos: Yeah, you just need to tell your containers to restart.

Chris: It is actually staggering to me that it's like, essentially an unsolved problem. How do people run this crap in production?

Amos: I don't know. I don't know how I did it for so long either. I remember running into those problems and we had stuff set up to do what you said like - if you get a my sequel error, just exit and then we had something like a Monett that would restart it.

Chris: Yeah, so. So that's fixed and then I'll - I mean obviously the other benefits are like, our throughput is way higher now. I mean we had have tail latencies in the Ruby consumers that would just be like. Many seconds. I don’t know why that happens. Like, memory bloat. We had to constantly restart the thing… It was just a total mess and beyond that, at the end of day, none of us - the people who really knew how to write Ruby at Bleacher Report are gone. We're essentially an Elixir shop, at this point. Like we know how to write Elixir. We hire people now and there are still some of the old guard and I've been guilty of this who assume that people know Ruby because they know Elixir and they don't. They know Elixir. And maybe like, JavaScript.

Amos: That’s nice to see too. ‘Cause at one point, everybody that you high that did Elixir was either an Erlang developer or a Ruby developer and most of them were Ruby Developers. And now - now we’re, we're getting new people into Elixir that are like - we came from .Net or Python or whatever. It nice to see that community growth right there. I think that's a good metric.

Chris: Yeah. I'm really excited about it. But that was another reason to move away because we didn’t know how to debug that crap in production. You know, there's no like - there's no tracing. There's no ability just to get into the system and start poking around. Or maybe there is but very few of us know how to do that. Maybe there's a handful of people in like really understand the real Ruby BM, but I certainly don't. It’s a critical enough system that we need that amount of like visibility into it. We need to be able to like remote into the running box and start using Recon to poke around and figure out why stuff’s not working. But, yeah, it’s been great. Every - it's the - it's a tale as old as time. We dropped 50 actual victims to two ECS tasks. So it’s like not even a fair comparison.

Amos: Wow!

Chris: That was just in production, so. And the performance is better. That’s the other thing is the performance is way better. And it’s doing more. By and large that Elixir consumers are doing way more than the Ruby ones did. So it's pretty great.

Amos: That’s amazing.

Chris: The absolute max latency that we've seen from any Elixir consumers for any given a message that it’s processing is about half of the average latency of the Ruby consumer. So max is half of average.

Amos: Wow. Did you find any downstream systems were not able to handle the new speed.

Chris: No, nothing couldn't keep up ‘cause most of the other downstream stuff was already Elixir. And the downstream stuff isn’t as time critical as the rest of it was. Like, this is very publicly user-facing when it gets slow and that was the other great thing, like - once we did it, once we ported everything, then - now we’re in the phase of ‘okay, sweet how do we make this even better’ and there's a lot of making it even better to be done which, is really cool and that's, that's the really exciting stuff. ‘Cause now we can actually start to optimize it. So - we knew it would be faster. I'm pleasantly surprised with how much faster it is. To be honest, I would have taken it I would have taken 0% faster if it just ran without me having to worry about it everyday. Like if it just ran in production. So, we got the best of all worlds. It’s been really fun. And it’s been really cool to see a project like that that you wanted to do for a long time to see it through the end. It’s of the few times I've been a part of a replacement or rewrite project that really did kind of - that went all the way to the Finish Line. Like, the old thing is done. We nuked the code out of the repo, it just sits there with like a ‘read me’ at this point and just a few crufty files and stands as a testament of time and the people who came before us.

Amos: You’re leaving it around for code arcology?

Chris: Basically, yeah. Like the repo - I’m not going to, to delete the repo from git. It’s just going to sit there forever. Be sort of a - an object lesson on VM choice. This is the other thing - this is the thing that I can’t get over - we essentially did a one-to-one rewrite of what I consider to be not highly optimized code and - essentially like we're swapping the VM. And we’re not doing anything fancy. We’re not doing a ton of extra currency. The concurrency that we use - it’s fault tolerance more than its parallelism and it's still just. Like. Better. Basically, just because you swapped out the VM. That's bonkers. Like, it's just the VM.

Amos: Well, yeah. I've had similar experiences in like small cases, where I swapped out something that was in Ruby or node actually, and swapped in for a small amount of Elixir code and ended up having to build a pool because otherwise I was killing the downstream service, so I had to build it that when you check the resource back into the pool, it actually wouldn't let it be checked out for a certain amount of time. Just so it would slow it down enough.

Chris: Gotta get some back pressure in there.

Amos: Yeah… I've shutdown elasticsearch that way. Mongo. That's really easy to do. And I can't remember - there was a third thing that I ended up shutting down. I was like, “Look how much faster this is gonna be.” No - it wasn't faster because we killed the downstream thing. It ended up being faster ultimately, but day one, we turned it on and we were like, “Oh, crap!”

Chris: So, yeah, everything about it's been better. Our visibility into the systems better as well. Part of that's ‘cause we understand the tracing, you know, ‘cause we had a hand in a lot of that. It’s easier to find errors now. It's easier to see kind of like what's going on. And it’s not all upsets, I will say. Like, there are more lines of code in Elixir version, but a lot of that is because so much of the Ruby stuff is just implicit. Like, all the actor record junk - like you’re randomly doing database calls that you don't even started really see. And the other big thing is error handling. Like, all the Ruby stuff will just throw exceptions without really knowing why, and in all the Elixir stuff, you know, you’re typically returning - you either explicitly know you're going to raise an exception or you have a pattern match - like, a direct pattern match on the Elixir result type - the okay error tuple. Because of that, the error handling became much more explicit in the Elixir version, so there’s a lot more code. But that was a pretty reasonable tradeoff to make.

Amos: What was the hardest thing about building it this way?

Chris: The hardest thing? The actual rewriting of all the - well, there’s a handful of things. One is - entropy had not been contained very well in the Ruby one. I think that's just like inherent to how code bases like that grow. Entropy just spreads. But honestly the hardest thing is like, politics. The hardest thing is just convincing people that this is a good use of time.

Amos: So how did you do that?

Chris: I wrote a document. That’s what I spend half of my time doing - is writing up why we’re going to do something. You know - justify why to do it. ‘Cause I think that is actually important. Like you need to figure out why you want to do something to validate it. Like, you don't want to do it just because it's fun. You need to do it because it provides like, some sort of value. Some of those things are easier to measure. Like, we can pretty easily measure money and in this case we could save a fair bit of money on just server costs or the time expenditure. You know, you can make it back in a pretty reasonable time. It’s enough money that it's super worth doing. Then you can also say, like, you know, here's the development costs. Here’s how hard it is to work on stuff right now. We believe that we can make it easier for you to work on these things. We believe it we can make it easier to test and deploy to production and deploying to all of our staging environments and all these kinds of things and all these benefits, right. So, it's a real big - if you can show people actual wins and show them a plan - and I think, really plot out what the steps are going to be. It's funny, like, when we first show people our plan - which was like a many step plan, right? ‘Cause again it goes back to the idea that I have which is, you have to do this in chunks, you have to do it in a safe way, you have to take the time to make sure that no one's going to notice you did it. I was literally told by someone higher up that that was the opposite of what we needed to do - we needed to make a huge deal out of this and get everybody like, aware of it and we needed to do one big cut over and all this kind of stuff and I was like that's never gonna work. So I’m just not doing that. Because they say the list of ten steps and then, you know, were like - that's going to take too long. We’ll never get buy-in to do this. That person’s not with the company anymore so I don’t feel bad maligning them as I’m doing right now.

Amos: You didn’t say their name, so we’re good.

Chris: Of course not. That’d be terrible. I’d be a jerk. But, luckily, other people trusted us that we were on the right track of how to roll this out and so having that plan in place is really, really important. Having actual, tangible metrics of like why we should do it is really important and I think the big thing and I think this is a thing that people overstep a lot because it seems obvious to the people served in the middle of it is - you need to qualify what success looks like. You need to be able to show people like, if we do this, here’s how we’ll know we are successful. And here's how we can literally measure that success. It might be saving a bunch of money on your AWS bill. It also might be through put or it might be whatever like - what it whatever metric you want whatever key metric you're trying to optimize for. But you do need a way to say we will know we're done and successful when this happens, and that goes a long way towards putting people's minds at ease.

Amos: So, do you get down to the nitty-gritty when you specify those goals? Is it like - we're going to we're going to lower our AWS bill by at least 5% or is it like we are going to hit this number- I don't know, but maybe at least 5% is still that. Is it - I'm just going to lower our AWS bill? Or is it I'm going to lower it to X?

Chris: If you're talking about measuring success, you need to be pretty specific about what it is that you're actually trying to achieve. And in certain cases, you may not know. For instance, we knew we were going to have better performance. We didn’t know how much. We are pleasantly surprised by how much we could save and we figured that out pretty quickly. Once we had the first little bit of it done, we could kind of extrapolate out how hot are we running right now, are we going to be able to have X number of throughput per compute resources, etc. We could do that math to find the line how much money we think we’re actually going to be able to save. And that took a little bit of pre-work. You can't just guess at those things. That's not useful. You need to actually go and try to measure and it do a real benchmark and find out. But, you know, once we can do that, we could actually say like we're going to save X number of dollars. So you have to do a little bit of work. I think it's really worth doing that, because it validates - not everything needs to do this. Like some things are small, right? Some things you just - it doesn't take much because you intuitively understand it, but when you ask for 3 months - when you asked for an entire quarter of your fiscal year to go work on something, you need to be able to provide value. This is just the name of the game, whether you like it or not. I don't like capitalism but this is part of the game. I also want to build cool stuff and part of building cool stuff is showing that what you're going to do has value, and the more specific you can make that value, the better you going to be. Like, don't make crap up, but, you know, show them some sort of success metrics. Which is hard. That's the hard work that's not fun.

Amos: It's hard knowing, like you said. You know, you know that you're going to probably be more performant, but how much? And that’s hard to come up with without writing it. You could show them a ‘hello world’ app in rails and a ‘hello world’ app in Elixir and be like, “Here’s the times,” but that doesn’t extrapolate out to “We’re processing files that have a million lines in them.”

Chris: Well, and it’s really hard to quantify stuff like - this code base is going to be easier to use. It's going to be easier to do development work and that's really hard because that’s 100% subjective. Not eve - not questionable, that is a subjective metric.

Amos: Unless you're me and then its objective. It’s an objective truth.

Chris: So, you have to find stuff that’s slightly more quantifiable. You could certainly list those as benefits - like, we believe that we can reduce the amount of complexity by thin. This goes back to the idea of not using weasel words. Like, refactoring. You know, like actually say what you're going to do and say why you think it's going to be beneficial and then people can choose to believe you or not based on their own subjective measurements of the thing. But the more objective things that you can get like - we believe we can hit this through put, you know. Those are really tangible things, so. Unless one of your success metrics is - I have this new employee and they're going to get up speed on dealership features within a week or whatever. That’s a thing you could write down. It’s going to be hard, probably, to craft that. It’s going to be hard to execute that plan.

Amos: I have definitely seen in places where one of the things that you can write down is - you're going to be able to keep your staff. Because they all want to leave and you need to hand them a carrot. Plus that has all these other benefits too. Plus, you have all of these other benefits too. Like, the old code base is so hard to work in that nobody wants to be there. And they all have an interest in this new thing that has all kinds of benefits over the old one. So if you do this, you’ll probably keep them and get the added benefits.

Chris: I mean, we definitely included the fact that most of our employees don’t understand Ruby at a deep level as a reason to do this for sure. I am a firm believer in that you should use your favorite tools. You should use your favorite hammer to hit most nails. Even if they’re not nails.

Amos: Screw with a hammer. That's fine.

Chris: I’ve probably said this before, but one of my favorite professors from college - he was probably more filled with anecdotes than any sort of actual teaching. I learned a lot of anecdotes from this dude. But he was rad and he would totally like, talk to you in very real language about what it was like to be a real engineer. And one of the things he would always say is - if you give me a big enough engine, I'll make this building fly and I think there's like a lot of wisdom in that in that, in that you can pretty much make whatever tool you want work. And what matters so much more than the tool is your excitement and energy and your team’s excitement energy, and their knowledge, how much infrastructure you have around that specific tool… Like, you can totally make it work. If we were a different company with a different team, we would have made Ruby work. Not a problem. Lots of people make Ruby work and they run it at really high scales, so I'm completely convinced that somebody could have made those Ruby systems function. That's not us. We're not the team to do that because none of us care. None of us like Ruby. Or the ones of us who do are completely fine replacing it. Like have no love for it in as much as they're not like, ready to get in in like fix it - fix the Ruby code. We were all very excited about it being an Elixir. That’s our team makeup and so it makes sense for us. It just so happens to be that this is a good fit for Elixir as well. There's plenty of things that we do with Elixir that aren’t what fits, quote on quote, for Elixir. But we do them anyway. We do them anyway because that's the tool that we like, that’s the tool you understand and that's what it will be like and at the end of the day it doesn't matter. Slack makes PHP work. PHP is not on its face a good fit for a real time chat company.

Amos: Is that what my Slack is broken all the time?

Chris: Well I mean... I cannot say but like, there’s plenty of - Github runs on Ruby. Pinterest ran on Python for a long time.

Amos: Facebook is…

Chris: Facebook had to build their own VM, so… none of those I would classify as the quote unquote “best tool for the job,” because that's a dumb saying that doesn't mean anything. Like, the best one for the job is the tool that your team is using, that they love, that they are happy to be working in. And of course, languages and VMs have tradeoffs. They all have different characteristics and those do matter. Those differences are an order magnitude less important than does your team want to use this every day? It does not matter that Java is faster at math than Erlang is. Like, if your team hates Java. Or hates any of the languages that run on the JVM or if your team doesn’t know how to tuneJVM. Or if you have no experience with JMX metrics. There's all these reasons to choose things. The capabilities of the VM are certainly in that list, but they’re so much less important than ‘my team wants to do this, the whole team is excited about this’. If you’ve got a whole team of people who are excited about something, that is your choice. There is no other choice because they'll just figure out how to make it work. So for us it made total sense to make it in Elixir and we’re much happier. It just so happens to be that Elixir is just really well suited for this problem as well.

Amos I’m at this point where I'm just like really happy for you. I can see in your demeanor that you’re better.

Chris: I needed a win. I needed an emotional victory, and I feel very fulfilled that it's now, that it’s happened in the past week. I sort of needed that, you know, emotionally.

Amos: Well, congratulations to you and, and Jason Stewart and everybody else at Bleacher Report.

Chris: It’s good. It’s been great. I’m very, very happy.

Amos: I don't really have anything else for today. I don't know if you have anything. We are keynoting virtual Elixir Conf EU.

Chris: I’ll be - I don’t know if by the time this comes out it will matter, but I'll be speaking at Code Beam virtual which, is on the 28th of May, so this won't be out at the time - that might be coming out the same day, which might be slightly too late.

Amos: Find the recording if it’s past. And Elixir Conf EU is - it’s the 18th? 19th?

Chris: It's June. What I know is that it's June.

Amos: It’s June sometime. So, virtual.elixirconfeu.com. And you’ll find the right dates whether or not we’re right. You and Anna and I are doing a keynote together.

Chris: So we can see how that goes.

Amos: We can, we can barely talk to each other weekly. I don't know how we're going to get through 30 minutes.

Chris: It’s early in the morning too, for us so.

Amos: That was one of my questions. It might be live. It might be pre-recorded. You never know. You'll have to show up to find out.

Chris: Either way, it will be unique. It’s gonna be an event. I can't promise you that it’s going to be good, right? Train wrecks aren't good but they are unforgettable.

Amos: So, come and cheer at us or laugh at us. Either way - we're pretty excited alright. On that note, enjoy your lunch.

Chris: Yup. See you next week.

Episode 74: A Tale as Old as Time

Amos King

Chris Keathley

Recent Posts

Quick Links

Find Us

Subscribe