The Not Unreasonable Podcast

Actuaries and Data Scientists at Root

August 17, 2020 David Wright Season 1 Episode 49
The Not Unreasonable Podcast
Actuaries and Data Scientists at Root
Show Notes Transcript

My guests for this episode are from Root Insurance: Matt Bonakdarpour, VP of Data Science, Alex Carges, Chief Actuary and Isaac Espinoza, who leads Root's reinsurance efforts. 

This episode covers everything that I, at least, have been burning to ask some people deeply steeped in the data science of auto rating. Does territory really matter? Does technology really matter? Actuaries vs Data Scientists? Do such distinctions matter in the limit? 
Listen in for more!

Twitter: @davecwright
Surprise, It's Insurance mailing list
Linkedin
Social Science of Insurance Essays

David Wright :

My guest today are from route insurance. We have Matt binocs, a poor VP of data science leading efforts on risk scoring, pricing, reserving and marketing. Matt's background is in quantitative finance. We have Alex karges Chief Actuary who leads the actuarial efforts on remaking and reserving at route. Alex previously worked at the Hartford and allied world. And finally we have Isaac Espinosa like myself a reformed quanta, an actuary who leads reinsurance at root, Isaac prevelly previously worked at greenlight ri. Gentlemen, welcome to the show.

Matt Bonakdarpour :

Thanks for having us.

David Wright :

Okay, first question. So let's say I had a model for a personal auto portfolio that only has two parameters. First is the garaging address. And the second is a perfect, perfect moral hazard score where I know the drivers predisposition to filing spurious claims through either neglect or malice. How good is this model at predicting claims cost?

Matt Bonakdarpour :

Maybe I'll jump in first here. So in this hypothetical world, what are competitors doing?

David Wright :

a great question. So let's say that They have a less good model of the same thing, although we can dig into that assumption later on. So they're trying to do the same thing. But you guys have a better model for the moral hazard score in particular.

Matt Bonakdarpour :

Yeah, yeah. In that case, I think it would do just fine. You know, if a good model is a model that lets the company become profitable, provide fair and adequate rates compared to competitors and not get adversely selected against, I think, I think that that can be achieved. Meaning, you know, if our competitors are using those variables, or a subset of them, but we're superior in predicting lost costs, we wouldn't be able to do it. However, you know, I think the predictive power of the model could be drastically improved by adding other variables.

David Wright :

Okay. So let me let me let's just keep going on this. So let's say the rest of the world has the current state of the art. So whatever they're doing now, whatever your knowledge is of that, and you can explain some of that if some of that's relevant to answering this question better, but with the against yourself. Somebody else how close can you get with a model like this? And what and what would be missing?

Matt Bonakdarpour :

Yeah. Okay. Yeah. So I think this model would be far, far off. So okay, I think we have to think about So first of all, the variable that kind of handles fraudulent behavior matters to put it. You know, let's say there are no people like that. So we don't want you to need that variable. But we there were people like that that variable would handle it.

David Wright :

Yeah. Are there people like that in reality?

Matt Bonakdarpour :

I would say so.

David Wright :

Okay, let's see. Yes,

Matt Bonakdarpour :

yeah. So then what's left is zip code. And the question is the zip code basically explained the rest of the variance of the last costs, and I think the answer to that is no. Okay. And, you know, I think the purpose of the zip code or territory factor is changed over time. I think maybe originally when it was used, it was mostly used for things like traffic and so on. actually think there's some actuarial literature about this about like, pre collision and post collision type variables. And they show that I'll territory over time has become like a post accident variable that actually tries to capture things like how litigious Yeah, is that zip code or things like that, you know that I forget, I think it's Feldblum. Someone else wrote this paper in the late 90s. And so there's, there's a host of other, you know, predictor variables that are going to explain the variability of loss costs, the ones used today, things like previous accident history and violation points. And, you know, telematics for example, if you zoom into a specific zip, there's quite a bit of variability across all those variables, and they are predictive of loss costs. And so if you ignore those, you know, you're missing out on segmentation at your competitors.

David Wright :

Alex, anything to add to that?

Alex Carges :

Yeah, yeah, I think. Thanks, Matt, for starting the discussion here. I think I would also just break down the problem into into types of variables to where, you know, territory, if you think about you look out your window, you know, do all your neighbors act the same drive the same, where there's other variables that start to get at those individuals. And there's a there's another slew of variables that get at, you know, what type of vehicle is the individual driving, which obviously has a huge impact on repair costs? Yeah, how old the vehicle is might be how, how probable that you might get a breakdown in the middle of driving, which obviously, is creates a dangerous situation. So So territory probably just won't cut

David Wright :

it these days. Let me defend territory for a second. And I know the paper you're talking about Matt, and I will put up a link up to it as well in the show notes. And that paper really surprised me because isn't it interesting that the effect of territory can change over time, the territory is not changing the same place. Right? But how In what way? It adds predictive power changes, which is fascinating to me. So you think of territory as being, you know, what are the things you get with territory, you get a lot of information, right? So you get demographic information, probably that isn't like you get a demographic information because, well, maybe you do get demographic information, you certainly get income distribution information, you know, my zip code where I live is much wealthier one than the 134 down from me, which is in Paterson, New Jersey, which is is radically different kinds of people that live there determined by the zip code, right? You can get in all sorts of really ugly kind of conversations there, but redlining and the like to if you want to do you, you get along with that you get traffic density, right? You get legal environment, of course, which helps you going to capture a proxy for legal environment. I mean, that's pretty tough to get any other way. And so, you know, the insight that I'm just fascinated by and I'm interested to get more reaction from you from is you get a lot of stuff for free with one parameter, right, like one piece of information about somebody You don't necessarily get is you don't get a view into the person's mind or their behavior, because there are people in my town that I think aren't great people, right? No naming names. But that isn't necessarily That to me is, you know, I would imagine, I would imagine somewhat uniformly distributed, like the good and bad person kind of thing across the leaves is not obvious to me that that would vary by by geography. And so is not think you think about like another variable, like traffic density, like, you know, maybe you could enhance that for location, but I don't know like it the persimmony of the model, which only uses one thing to capture many different proxies seems to me to be attractive. pushback, tell me, tell me, where am I off? I agree.

Matt Bonakdarpour :

Yeah. Well, actually, I agree with everything you said. I think you know, how important a variable is it's actually an ill posed question. So okay, I could I could ask you, if you could only use one variable in your pricing model. Yeah. Which one would it be right? And you might have one answer, and maybe it is code. But then I could ask you another one, which is all right, you have the industry standard one. Which variable if you were to remove it would hurt you the most, you might say, a different variable in that case, huh? Right. And it might not be zip code in that case.

David Wright :

Yeah. Well, what is the answer to that question?

Matt Bonakdarpour :

Yeah, Maddox is up there for sure. zip code is probably the answer one of those questions. The other that's a kind of an open research question. We've always been tackling internally ever. Yeah. Okay. Yeah.

David Wright :

Alex, anything more to add to that?

Alex Carges :

No, I think that the what you talked about kind of highlights that like when you when you use zip code, as you know, a proxy for territory in our discussion here. You do have all those elements, the demographics, and certainly you have jurisdiction, which is useful. But you have kind of the average effect versus getting at the individuals behavior. I mean, Even traffic density not everybody in the same zip code drives the same roads. And so you can actually get down to the individual level these days and figure out you know, who is actually driving during rush hour? on you know, dangerous intersections or in Ohio, we have a lot of roundabouts. So like, do you drive through those roundabouts? Which are super confusing, I don't recommend them for civil engineers listening. But, or do you avoid them? So that can make a big difference?

David Wright :

Yeah. So to me like you're introducing the need to introduce granularity to the model right? And and you had this trade off perhaps between adding more variables, you know, you have the classic kind of trade off to 90 more variables, you know, you wind up over fitting the model and like or maybe you don't have enough you don't have enough fit data in the pieces and so you need like all a lot of information or over overcome that. And maybe most companies don't have this. I don't know why do you guys think about the need for scale to to adding in variables to model, do you measure that explicitly? We don't have enough data and another one here. How do you how do you think through that?

Matt Bonakdarpour :

Yeah, absolutely. You know, these standard approaches from the statistics, machine learning literature to kind of serve that bias variance trade off, that you're talking about, not overfitting too much, but also not creating too parsimonious of a model where you can't describe variations at all. And, you know, there are standard approaches to curtail overfitting, like cross validation, which allows you to, you know, hold out data sets and ensure that, you know, you're not just memorizing the data set you're training on, you can actually make predictions on data you've never yet seen. As, as we collect more data, as any carrier collects more data, they can create more and more complex models by adding in variables that are shown to be predictive out of sample. And that's really, you know, that's the fundamental principle I think you need to think about when you're adding variables in

David Wright :

Alex, and Isaac, so I don't know if you guys remember The current I think it's current exam. Five would probably be the one but hold out data sets. Like why on earth don't actuaries have that as part of literature? Maybe they do now, things are changing pretty pretty rapidly. But you know what I was blown away when I started studying data science. After taking the actuarial exams. You had this whole idea of cross validation and overfitting, which to me never get anywhere near the actuarial literature. You guys have any reaction to that? Are you as horrified as me having worked with Matt?

Alex Carges :

Yeah, I think, I think this is Alex. I think overfitting keeps us up at night. Generally, here especially like as, as the world goes from a place where we're constantly, you know, scouring for new datasets into a world that has more data than we can possibly handle. And so the techniques have to change kind of away Along with that, and I think that's why you're seeing so many data scientists start to work with actuarial, we certainly are in route. I mean, you can you can tell just by the three of us here on the call, but particularly my teams and mats, teams work hand in hand on a number of problems. Because it because you have to start to be able to come through this plethora of data in a really disciplined way, in order to get the right outcomes. And so yeah, Crossfield validation, true out of sample through out of time sample testing is extremely important. And no cheating, as we put it,

Matt Bonakdarpour :

to give a little bit of credit to the actuarial community. I think the the recent heystop released on modeling does talk about holdout data sets. Yeah, I did search the record cross validation. I didn't see any mention of that, but there is mention of holdout datasets, so there's some transference of knowledge across the domains.

David Wright :

Yes, hello. It made me an underdeveloped and I'm sure you know i'm sure actuaries are going to are going to kind of move forward in that in that direction. I've no doubt, but it is fascinating to me that they had never done it before. Right? So, if you think about actuaries, as in some ways, original data scientists 100 years ago, there were actuaries who are doing this stuff of life tables, at least, right before. Before I would imagine there's any kind of Applied Statistics being done maybe at any other organization. I mean, I don't know maybe Matt, you have a perspective on that. But actuaries missed this boat man, like big time. Any thoughts on kind of how or why that might be from you guys?

Alex Carges :

I mean, I don't know if that's the right categorization. I mean, certainly in my past a well before you know route here. There was a lot of use of very disciplined, very rigorous model testing that included all this testing. I think there is an interesting effect for actual In general, and maybe it's maybe it's part of our personality, maybe it's part of the the competitive nature of the insurance industry. But not everything that we do is written down. Maybe similar to stock trading. Matt used to do that in his past. Like, if you find a good model, you're not going to publish it necessarily. And you only only when it's 20 years out of print, do you then publish and say, Hey, this is what you should do. And everybody like that reads that? Or is taking exams, maybe giggles a little bit and says, Yeah, that's what you would have done, you know, 20 years ago? If you think there is that effect, if you think about who the ultimate,

David Wright :

you know, arbiter of actuarial techniques are is actually I think, certainly, first of all, it'll actually push back against this view if you'd like Alex or Matt, is the regulator. Right? So when is the regular gonna say, Show me your cross validation score, and your test set like that how to do all those things like is this real or not? Right? Have they ever done that for you guys?

Matt Bonakdarpour :

If you reread filings these days, you know, I think all the carriers have to publish out of sample holdout results. They do. Yes requirement? Yes. Yeah. Well, I don't know if it's a requirement. They do want to show, you know, doing the work. Yeah, exactly. But I think you made a good point. I think regulatory lag is part of it, why it takes so long, you know, GLM will probably invented 2030 years before they were used in the auto industry in the 90s, or whenever deal I'm started. And another one, I think, is the the career path for professionals in the insurance industry. You know, there's an actuarial career path with, like a codified set of standards, professional and ethical. And also you go through certain exam coursework, and that's the curriculum for getting into insurance. And so for a long time, you know, if that's the way you were filling the roles of people creating pricing models, then your competitors aren't Using the more advanced techniques, and there's no reason for you to use them, right until there is this blending of data scientists and people from other disciplines coming into insurance, which has happened, probably in the last decade more than any time before that, where new methods are coming in, and just to stay competitive, you need to use them. But you know, if the standard methods work, because your competitors aren't using more sophisticated ones, why, you know, there would be no need.

David Wright :

So how I imagine you guys are using more sophisticated methods than you're putting in your rate filings? Probably, right. How do you adapt the work you do internally So Matt, you wander and you're like, Hey, I'm, let's just throw out the neural networks, guys. And then and then there's a timeout pal. That's not what's gonna get into a rate filing isn't gonna put a bunch of weights and you know, like, your depth, you know, is like, Yeah, right. So how do you translate insight you might gain from one set of tools into something that you're actually going to you know, put it The canonical methodology for finding rates?

Alex Carges :

Yeah, I think the first step I have Matt do is is print out all of the neural network paths. And then I send it a list of the do I for review?

Matt Bonakdarpour :

Yeah, so yeah, neural networks. Maybe it would be an interesting conversation after this question. insurance industry, but yeah, for sure. I think we're still kind of making the transition to more modern methods. You don't see tree based approaches being filed with regulators. And I think part of the reason is, again, the curriculum, the education, there is this idea that genomes are more say interpretable Sure, where people are coming from when they say that however, when you can fit a GLM and get a coefficient that makes no sense, right? There's a correlation between them. So

David Wright :

it's flipped, right? Yeah.

Matt Bonakdarpour :

Yeah. Because of other variables. Yeah. Could it holds up out of sample and so in that case, it's not Really interpret what you don't get that guarantee with the GLM. You can look at the coefficients you can tell a story over time, I think, as you know, companies are able to educate regulators and also as regulators go through new modern curriculum, and they understand where how these tree based models work, why we should really trust them, especially out of sample, you know, I think we'll see more of them file. The only difficulty, or the main one, in my opinion, is how do you ensure you're not discriminated against protected attributes with a tree based model? That's hard? GLM, it's hard to do too, by the way, but it's even harder with a tree based model. And there is a growing literature in they call it you know, fairness, accountability and transparency in machine learning. there's a there's a conference for it that focuses on this issue. What is fairness? How do you measure it quantitatively? And as those, you know, as AI research matures, I think we're going to start seeing that become part of the regulatory process as well.

David Wright :

I'll put in a quick plug for a previous podcast. with Cathy O'Neill, author of weapons of mass destruction, which is very much consistent with this, but, you know, do you do you now like Tell me a bit a bit more detail if you can about what, what tools you can't put into rape filings or how you might turn one thing into another? I assume, Alex, that's something that one of the important points of collaboration between the actuaries and data scientists.

Alex Carges :

Yeah, we have to and we have to walk a fine line here. As you know, we're we're deep into kind of secret sauce territory, but I think I think the regulation I guess, in my perspective, I'm a little more optimistic about it. I think they are more open minded than many of us believe them to be about and filing things directly. But there has to be that level of transparency and then you have to be willing to sometimes share that transparency publicly, which creates which creates hurdles there's a there's a number of kind of statistical techniques, though, that you can do to reduce kind of the complexity of models, you can even, you know, take an output of one model and transfer it and use it as a target of another model. Those, you know, those types of techniques can help tactically, but I think, really, the regulatory story is, is to be able to prove that, you know, you're not unfairly discriminating against people that you are using the right procedures. Here are the inputs or the outputs, here's the evaluation. I think that goes a long way. Another I think another hurdle hurdle just internally is you got you got to have the right tech stack to use some of these techniques and to get them out to production. So I think there's, I think there's some technical capabilities that you need to you need to have in order to even use these things practically. What would you say?

David Wright :

Right, so you guys are in the personal auto business and, you know, I've worked With personal auto organizations in my past, and particularly, you know, startups and younger ones, smaller ones, and the kind of going sort of line in that business is I learned it through my clients the hard way sometimes is that, you know, it's fine for non standard auto. That is, you know, the much more price sensitive customers who are much more fickle, who move faster to new carriers, and who are tougher risk to underwrite as a result, you can you can win at that, but for preferred auto for the, for the great, glorious, you know, people like me, who I bought my auto policy from Geico, five years ago, six years ago, 10 years. I don't know how long ago it was, and I haven't looked at the damn thing since. Right. So I don't know how much I'm paying. I'm a great customer, phenomenal customer. Right, who gets me nobody gets me. They get me because I can't be bothered to even look at the thing. So how do you compete with Geico to get me or you know, you have How do you guys win at this kind of business? because presumably, this is a goal of yours is to get people like me.

Alex Carges :

I think you bring up you. I mean, it's definitely a valid point. There's lots of different customer segments and they value different things. One, I'll do a shameless plug that we're not only an auto these days, but tell me more that we branched out to renters. And actually we have, we have a home partnership going on too. So we actually can get to a bundle and so if you're interested, they that you could you could download our app and and, and see how much you're paying versus versus a quote, but, but for discussion purposes, I think Yeah, customer it's, I think actuary is actually sometimes can I make a broad statement can reduce kind of the problem and to just we just need to price better. And that's just business. strategy in the insurance industry, there's a lot more to it than that. You can't, you know, pricing is is a necessity, you have to be good at that, especially in personal auto, especially in like high transactional flow businesses. But you also have to build an incredibly strong claims operations, customer service operations, you have to have good organizational decision making, you have to be kind of disciplined, you have to be able to recruit. Like, those things are not easy, either. And so all those things have to come into the right strategy. And to get kind of like that product market fit across different customer segments to what they value, otherwise, you're not going to be able to get the customer. So fairness might resonate with one customer, another customer might not care about fairness at all. They just want ease of doing business or they just want to compare 10 quotes all the time.

David Wright :

So it's a marketing messaging problem and you know, Matt in your title hour, at least in the bio that you guys gave me, marketing's in there. And so marketing is a is a scale communication problem. Right. So presumably that susceptible data science as well, I assume that you're a part of the solution of how to compete in that domain. What can you tell us?

Matt Bonakdarpour :

Yeah, absolutely. Yeah, I think at the end there, cars really hit it, you know, it's part of it is what we stand for, which is fairness that resonates with people, especially more and more in this, you know, environment for sure, you know, telematics is really decreasing the reliance on variables that perhaps are just correlated with risks are not causally related. And as a result, we you know, we stand for fairness, and I think that resonates with the population you're talking about David, at least a subset of them. And then the other part is the is you know, doing it right. You know, providing a great customer experience and value beyond just predicting against future lost boss. But David, you know, I think you have a few kids when they start driving, you might be interested in how good they are at driving? You know, I certainly, that's something we could provide you through the telematics score and through an app, it's very easy to use. So it's really providing value beyond just the insurance product that we get from these datasets. To answer the last question, you just asked marketing, certainly, you know, we want to make sure we are providing the best experience to our customers and getting the message that's truthful, but also resonates and that's how we capture the segment that we're interested in. But also, you know, we're interested in any segment we have a very wide umbrella underwriting bro, we're not just targeting very niche customer, and, you know, the data science approach to marketing is just allocating your budget effectively. I mean, that's really a huge part of the insurance problem. You know, you don't for those of you who stay a long, long time, you know, we'd be willing to spend more marketing dollars because of the return on that investment. So that that kind of decision making, we want to do it in a systematic way, it does require, you know, data science models and getting very hard about how to do that allocation.

Unknown Speaker :

Map because one thing worth noting, David is, you know, we've been talking about a lot of the actual pricing a little bit about telematics. But we have nearly 200 developers and data scientists spread throughout the organization. And so in addition to our bread and butter, which is the telematics side, they also get involved on the claim side, on the actuarial side, on the marketing side, you name it. And on the marketing side, as Matt mentioned, a higher value customers such as yourself, someone with a much stronger retention attached to it, we're willing to pay more in terms of an acquisition cost. So because we're factoring in a higher lifetime value of bringing someone like you on so in the real world, we see it as you know, we will modify our budgets accordingly, to how long and how profitable we expect our customers to be.

David Wright :

So you know what Probably this question for me is the point that I think Alex you're making there on on actual the technology stack being an important, an important piece of infrastructure, a capability. And one of the greatest competitive advantages in the world are the ones where your competitors couldn't do it, even if they wanted to write. And it seems to me that redesigning the way that an insurance organization uses this data, analyzes this data and produces results could be such a thing, right? It's very hard for a company to I mean, shoot Isaac, you you're you probably know this as much as I do, how many how many insurance companies are, are changing their systems, like, every single one, all the time. And if you're always changing your systems, guess what, your systems suck, they suck because everything in transition sucks. And then as soon as you get it done, you're gonna change it again. Right? I mean, that's just how that's just how it's a if you're stable, truly perfectly integrated systems. It seems to me that can be an extraordinary competitive advantage. What's your reaction to that, guys?

Matt Bonakdarpour :

I I absolutely agree with you. And that, you know, across the entire organization doesn't just have to do with pricing, right. It's your systems for pricing, claims, marketing, customer service, everything needs to be integrated. You can't have separate systems either, right? These, these datasets really inform each other to provide the best product experience for the customer. And starting from scratch allows us to do that. Right. Like you said, a lot of people are changing. I think there's probably a lot of companies who also don't change because the innovators dilemma, worried about rocking the boat and turn it in customer experience. And from a data science perspective. You know, I think it's incredibly valuable to think hard about the infrastructure. You know, I've never seen a case or very rarely ever seen a case where we have this business problem. We put a smart person in a room with a blackboard, and then they emerge with an industry changing idea. I've never seen that happen, but what can happen is creating an infrastructure for researchers to test Ideas quickly and rigorously get them and then provide the infrastructure to facilitate putting them into production right away, not a year from now, not two years from now, but right away. And that's what you know, we were able to do like starting from scratch, we set that infrastructure up. And that's the way our organization kind of handles data science and actuarial projects.

David Wright :

So can you tell me about, you know, what, how does that work for an AI researcher in your organization? What data did they get? And what do they, you know, they'll they'll then pull my imagination in my imagination, there's a there's like a somewhere some kind of data lake, they pull a bunch of data into a local environment that they just mess with, or maybe they don't pull in the local environment, but they interact with it, right? With a you know, whatever cool things they want to use Python kolab, whatever they want to do, right? And crazy stuff. And then they're saying, Okay, this works. And then they'll have to hand it off to somebody to some kind of like a planning process to get it integrated, so to speak, and tell me more about That process how that works and what's smooth about it, or you think is cool about that? Absolutely. So I think,

Matt Bonakdarpour :

you know, thinking about the research problems separate from the production problem is a mistake, you have two different code paths then right? If you if you search, let's say, just a pricing model, you do everything we call it research plan, if you were to do it in a research plan, and you vet the model, and everything looks good in research land, and then you'd have to go and re implement that completely from scratch, working with an engineer, perhaps no domain knowledge, and then you put it into production, there's no guarantee that the results you get in production land are going to match the results you get in research plan. Oftentimes, it never happens. So I think a huge, you know, just fundamental principle is you need to collapse those code paths. You know, when you get it working and research land, you're done. You basically flip the switch, you plug it into the production path, it's the same code, the same process. So that I think is a huge step change and you see this happen. Not just an insurance, you see it in finance, you see it every quantitative business where these kinds of infrastructures are separate, and you got to do this handoff, then you have that a team that reconciles. It says, Hey, is production matching research? It's not everything I thought was working in research. Now I don't even know if it's working in production.

David Wright :

So you do you say that that there's a myth? rephrase this question, actually. What do you lose by doing that? Do you lose AI research flexibility, right? Because it seems to me that makes a lot of sense, of course. But there's a reason why other people do that. What sets things up that way? And I imagine it's to maximize the amount of creativity and freedom you give to the researchers.

Matt Bonakdarpour :

Yeah, that's a great point. You're absolutely right. So I think the first thing that you lose is talent pool, because people who can do both the research and the implementation and production that makes it very difficult to find that talent now. Data science curriculum has kind of become widespread across many universities now that data science departments, it's more and more the case that people are educated with both computer science, software engineering and statistics. So it is becoming easier, but it's not easy. But you're absolutely right, we have to give the flexibility to go outside the system architecture. And we do that. And that kind of goes to like, how do you handle kind of short term projects that you can put into production quickly versus long term bets. And you know that that's more of like a management philosophy of how do we allocate and do that kind of division of labor, how much how much resource to be allocated to the long term goal and Mr. reality to the short term, we are always looking towards the long term goal. But then at the end of the day, you do this harebrained idea, it works, you compare it against whatever the baseline is what you have in production, if that improvement is big enough, now you go and re architect the system, we architect meaning get the flexibility in a production system. So now everyone can operate in this new space. That's usually how we handle it.

David Wright :

So this is all very engineering data science. Right, Alex? What are the actors doing? And all those? Where are they? Where do they live in this? How do they interact with this? This really cool tech stack?

Alex Carges :

Yeah, I mean, they're at root, we really integrated the teams and so that they're hand in hand. And and I think, I mean from, from, you know, coming in as an actuary working with some data scientists in the past, but really kind of reflecting on some of the differences I see especially, you know, observing, you know, as we build up the teams that route, what I see one of one of the main advantages of the data science tests coming in that we've recruited, that we've gotten many, many really, really deep experts into kind of very specific techniques, right. And so we have a natural language processing, you know, expert and we even have a GLM expert. But they don't necessarily know insurance almost almost none of them came with insurance background we do have a few. And and many of them don't know, like different techniques they've spent the last 10 years really researching a specific technique and so when we bring actuaries in that they're kind of like inverted as far as the the insurance domain knowledge and then the technique knowledge we have like kind of thin surface usually practical level understandings of some of the techniques. And and sometimes we've used them in the past, we have really deep insurance economics and kind of almost game strategy, knowledge of the of the insurance product and so we marry those up to make sure that we're getting the kind of the best output because even like tree based like random forests is great if you have infinite data and the data is perfect. But everyone here know, on this call knows that the insurance world Real World is never full of that. And in fact, we have to throw out tons of data, just because we know it's hand entered. And it's and it's complete garbage or, you know, we collected it one way today in a different way tomorrow. And it means completely different, even though it's, you know, headed onto the same column name. So those are the those are the how we kind of marry those two forces together. And then what you see over, you know, a few years of after these, these people have collaborated, collaborated, is that they start to blend into one sort of profession, and I think is actually is where the actuarial field is going. And then, also, you know, I think maybe we'll have a subfield of data science that's maybe just insurance maybe that's just my hopeful thinking, but that would certainly help recruiting.

David Wright :

Yeah, well, it seems like you're breaking ground in that very area in that very, very direction. It feels to me like a unification. You know, it's almost like here's here's like a A proposal or a supposition? Like if you took the limit of domain knowledge of a data scientist and insurance to infinity, you get an actuary. Is that true? I mean, what's wrong with that model?

Matt Bonakdarpour :

I totally agree. I my opinion is, you know, if sufficiently enlightened, there is no difference between an actuary and a data scientist, right? We've been in the insurance industry for a couple years are indistinguishable from actuary. Yeah, same way. actuaries just spend their time in our in Python fitting models, doing model diagnostics, putting them into production, and then monitoring them indistinguishable from data scientists. There's no difference in my opinion. Hmm.

David Wright :

But domains are different, right. Oh, sorry. Alex, did you have something you wanted to add to that?

Alex Carges :

No, I was just, I was just gonna make a wise last remark. And Matt's always telling me that data scientists are bigger, stronger, more handsome.

David Wright :

The data that was different, the domain matters right and a man I'm wondering what you what surprised you maybe about the data? And I don't necessarily mean I mean like the things you wouldn't expect the quirks of the data in the insurance business what's different between that and say the quant finance world you came from?

Matt Bonakdarpour :

Yeah, so I was in a high frequency trading space for a while. So you know, time series is our bread and butter in that space. And until recently, in insurance, it was mostly just a rectangular data set, you know, you have a row per policy or something like that. And then you fit models on top of that time series is quite different. However, you know, with the advent of telematics, sure, time series, yep, in the insurance industry, and so that that's kind of collapsing that dimension a little bit. The finance datasets were much bigger, much, much bigger and you know, part of that could be because route is a smaller, you know, startup so you know, we're growing our book of losses experience however, telematics datasets are petabytes now. So again, yeah, prevention is collapsing too. I think, you know, really, when telematics was introduced to insurance, a lot of the differences in big data kind of went away. There are certainly cultural differences, especially when you compare high frequency trading to insurance from a data science perspective, in high frequency trading, you make a change to a training algorithm. You basically know within a week if that worked or not, you know, if I were to ask you, David, how long do you think it takes to have confidence in a rating plan update? What would you say? Three years? Yeah, right. So it takes a few years, maybe two years, I'm very uncomfortable with that, you know, I'm still kind of wrapping my head around it. We're doing a lot of work to tighten that feedback loop or, you know, they're leading indicators you can look at and Model Driven ways to make it tighter, but that, that makes it different. And when you have that tight feedback loop, it just feels higher stakes. You know, Put in a change, you know, by the end of the week or something up. You know, if the feedback loop is three years, you know, you probably forgot what you did. After six months, you probably made a few more changes since then how do you do attribution to that trains four years ago? Yep. It didn't work or not?

David Wright :

Well, let me I can't believe the following fact, of my own experience is that, you know, I work with some data scientists now. And the words the the words look ahead, bias was something that I've heard for the first time. Not that long ago. And definitely, only since I left the insurance industry. Matt, do you guys or Alex, do you guys think about look ahead bias? Could you define it for me? Is this the right way of phrasing it?

Matt Bonakdarpour :

Take a try. I know what you're talking about.

David Wright :

Okay, so look ahead bias. Maybe this is a user edition. Back to my group, we're looking at biases, you have a bunch of data. And in a time series in particular, that data could change in the future. And that and what could change about it? You know, the reason why this is blowing my mind is that this is exactly what actuaries exist to solve, right? And so, you know, here's here's a, here's an example of a data science model that our team was working on, where we wound up. We have a bunch of claims data or years of claims data, and doing holding out holdout data sets and fitting some model some predictive model to the claims costs. And the holdouts that they're that they are, that they're, they're they're, they're holding out is is a couple of specific years, right? And the most recent years, so let's say up to 2017 is the model fit dataset 2018 is this cross validation and in 2019 is some test set. And, and I I was part of the model view, and I was saying well, what about you know this cyclicality to this business, right so things will be different 2018 and 20 1720 16 so if you have a look ahead bias is the problem here, where if you include a 2019, in the fit set to the point you're making there, Matt, you know, you I have changes that happen to that is that maybe, maybe, you know, in the non insurance framing of this, it's like, well, there could be updates to the data set, or there was dirty and they correct any corrections or enhancements or whatever, to the data, you know, the 2015 data and ETL and 2018 because they realize they screwed something up. And in the actual real world, that's even worse. And that whole idea, I was amazed to see that it exists outside of the actual profession, that it has a name, which I guess is not that common of a name. But but it's it's something that they have rediscovered in some ways, actuarial techniques to to adjust or maybe they haven't remembered it doesn't matter and they use other things to handle it. Any reaction to that?

Matt Bonakdarpour :

Okay, yeah, totally. Yeah. So we call that kind of analysis out of time. And, you know, it's kind of playing on level two to ensure that when you're playing the out of sample you're making I'm sure you're doing it in the future. Now, you know, that might look great, but you still might not get the predictive power improvement you expect after you watch that model, because some other change happen after you watch it, that really comes down to monitoring model decay, you have to look at, say, the inputs to your model, you know, like, what's the distribution of telematics stores? Is that shifting over time, or prior insurance history and so on. If there are shifts in your distribution in the data set, then you might expect the model not to be able to handle it. Well, you should also continuous continuously do the out of time analysis, you know, real time doing, hey, the predictive power only got this month or next month, a month after that after I launched it there. Yeah, that is certainly top of mind. Like Alex said, it keeps us up at night. It's one of those like known unknowns, right. You can't really know about now if it's cyclical, though. And you've seen the cycle in the past, you can account for that in the model, right? Like if there's seasonality. For example, you can put protections to ensure that your model isn't just picking up on the season. ality when it should be picking up on other things, and you remove that part. And I think, for example, last trending, I think is a good example of, you know, an actuarial approach to solving a problem with inflation and so on. Yeah.

David Wright :

Alex, any, any, any reaction to that, too?

Alex Carges :

Yeah. I think locally here at root, we call some of these threats. Cheating as well. And you have to be you have to be extremely careful that none of your variables are cheating into the future. Yeah, one of the one of the, one of the classic, you know, quick analysis, things that I've, that I think we've probably all seen, done incorrectly in the past is, you know, you have a book of business, and you say, Okay, I'm just going to cancel everybody who had a loss and now, you know, next year, I'm not gonna have any losses. So, you know, we just got to be really careful not to do those things. And it sounds simple on the surface, but it can actually be like extremely hard. I mean, one of the more subtle things that can happen is, so you want to implement a new credit model. I think most I think most people aren't careful along this dimension now, but so Okay, you go in and retro score all of your policies for this new credit model, or, you know, a new credit score vendors selling you or something like that. Well, if you scored it with the most recent data, that actually includes financial hardships that have happened over the period that you're testing and some of those financial hardships are also related to potential like insurance losses, you know, especially like on the home front where like your house burns down or something like that, like that's a severe traumatic, you know, experience that will show up in loss history, and then your credit score, but if you went back to the credit score, you would have rated before the loss and like you know, at the time you would have written the policy, you might not have seen that hardship in the in the credit history. And so you got it you you really have to be maintained these like disciplined approaches, which is, which is also just again, like where I think actuarial science is going to continue to go and where, honestly, data science shows us up a little bit from a historical track record, I think we can learn from them.

David Wright :

How do you think about risk management as an organization? Right. So these are different interesting sources of risk, which are not solely from an insurance portfolio, although there are also sources of risk in the insurance portfolio and Isaac no doubt, in the reinsurance role that you you are thinking hard as well about risks. But how do you if I could put it this in this immense this immense way? How do you as an organization, think about risk management? Is there a, you know, is there an effort that crosses domains or do you tend to think of risks, right? So you have risks that emerge from, let's say, changing models versus risks that emerged from hurricanes? They're pretty distinct. But what do you how do you how do you guys assess and evaluate and deal with risks in your organization?

Alex Carges :

Yeah, maybe I'll I'll talk about the operation. But Isaac, do you wanna just talk more about maybe the insurance risks?

Isaac Espinoza :

Sure. Yeah. could go first on the insurance risk side. So, definitely, I'm thinking about it more from the aggregate level about our whole portfolio. Yep. Fortunately, in the auto insurance industry, catastrophe is not a major risk for us. The largest contributing perils are going to be something from hail storms, floods and tornadoes, as opposed to the typical drivers on the property side, which are going to be earthquakes and hurricanes. But we do have in place, a program that we purchase annually to protect us from catastrophe risk, in fact, any single event no matter how big, should contribute less than half a point in the worst case scenario to our loss ratio. So we feel like we have that under pretty good control. In addition to that we protect ourselves on purpose. From severity losses, so we do have a retention of a double digit, thousand dollars per risk, but we buy above that. So any extremely high liability loss, we're protected from across our entire book. And then your timing is actually good for that question too, because with an auto book, in addition to catastrophe risk, and what we call severity risk, I think a real driver and an auto book is going to be more around frequency. And one thing we're looking into deeply is looking to protect our tail risk on the frequency side, which is an aggregation of attritional losses being higher than expected. So we're actually looking looking at that right now. And we think it could be pretty, pretty good for managing our risk in the tail.

Alex Carges :

It's interesting, we're kind of living through an event right now. I mean, luckily for us, not not great for society in general, but without the driving, it's plummeted. frequency and it really hasn't. It's recovered a little bit. As the economy recovers a little bit, obviously, there's state differences, but um you know, you can you can see the capability of the world to really, you know, kind of shock everybody. In autos case it was favorable, but it certainly could be unfavorable at the same at the same time. From a, your, your question went into, you know, sort of like, operational risk and model risk. And I think reserving in general has a lot of this in side the actuarial profession. And I and I think sometimes, especially as we get excited about sophisticated models, we I don't know I I liken it to like the Lord of the Rings, like one model to rule them all type of framework, we will often want to do that. But it's, it's also dangerous if you have one model, you know, telling you everything if that model is wrong, and certainly we know it's wrong, at least in some cases. Then you're going to be wrong everywhere. And so at least some, some of the philosophy of having actually multiple models that are not correlated with each other that come from different methodologies that come from different perspectives, even different levels of information from a brain standpoint, is actually useful. Rather than just having one, one model that is the most complex and most sophisticated thing, and what you do is you can like you ensemble them together. If you want to use a fancy word for like weighted averages, and you get the diversification effect of the errors, and so hopefully, if your models are different, they're not all wrong at the same time. And that really can actually help the corporation kind of shed some of that model risk or even that operational risk. It's easy to see and reserving. You don't want just one model reserving that If that model is wrong, and you're using that to you know, set your reserves and you're using that to price then everything's wrong at the same time, you want multiple models so that you can you can see where the differences are, so that you can diversify some of that risk.

David Wright :

How do you make decisions about it? How do you how do you as a as an organization, say yes, we're going to add this as another part of the ensemble to minimize risk. I mean, how how elevated is that question? And you know, is there must be a committee I guess some people talk about it together. How do you how do you think through these decisions because it's hard to beat you know, you're not gonna be right And to your point there right.

Alex Carges :

committees and that word like makes me shiver. Now but but we do we have discussions. Maybe Yeah, sure thing, guys in

David Wright :

consensus is Yeah, yeah.

Alex Carges :

I think it it is a struggle. From for humans to move from like, this kind of point estimate world to a range of, of estimates world from decision making process and it's taken a lot of sort of sort of discipline, reiteration, self training group training to kind of move to that, because the reality is that we, we, especially in insurance, we're dealing with uncertainty, always. And point estimates are useful. But really what's even more useful is what is the what is the range of outcomes we're likely to see in the future? You know, and fill in whatever your question is. And then given that range, you know, if we're on the high end, or low end, the, the middle, what is the best strategic direction for the company, and that's kind of how we we've we frame these. So we try to get the most narrow most predicted range, and then make you know, Plan A B See, according to that,

David Wright :

you know, it seems to be really a couple minutes left here, but it seems to me oppression I get here is is there is probably necessarily an element of conservatism to this. And I think back to the, you know, to the initial the beginning of our conversation here today we're, we're an idea that came into my mind, as you were talking about was that in order to change, let's say, the actuarial profession, you kind of need to change the curriculum, which is a collective action problem, right? So like, you have to get, like at least some kind of plurality or majority of actuary is on board with the change before the change can happen. And to me, that's a kind of risk management, right? It's like, well, there's new things going on. But is it real? I mean, you're talking about if you need a couple years to tell whether any technique is working, particular implementation of it, how do you know where the technique itself is so universally applicable that you need to teach out to all of actuaries. And so there's this you know, there's this thing that slows actuaries down with three Expect to change. But it also seems to me that that there's a benefit there that everybody's on board. And and that helps everybody change at the same time. So that like this is trade off, I'm imagining interesting, get your guy's reaction to this, of, you know, consensus in the industry can help, actually, the transparency, the regulators are more comfortable because everybody's doing the same thing. But in order to move that consensus, you need, you know, a fair bit of evidence. So, do you agree with the premise that consensus helps, and with my kind of diagnosis of maybe why they actually take them a little longer to come to the data science world?

Matt Bonakdarpour :

I totally agree with that. And the data helps part in particular, you know, Alex was saying, you know, we have discussions and we have ranges. But, you know, there are some predictions we're making where we are confident because of the back testing, that this prediction is the best prediction we have. And it's really in the tails where we're highly uncertain, and like you said, it's very important. to quantify that uncertainty, where we need the actuarial judgment, that's where it becomes just paramount. It's so important. And I agree to get these data science techniques, more prevalent. It's going to be a data driven process. But in those tales of uncertainty, I think it'll take quite some time to get people comfortable. So I think you know, there'll be a middle ground where things that are easier to predict based on our predictors, Battlestar getting traction, right, the more sophisticated techniques, but the long tails with huge error bars will still require actuarial judgment. Until you know, we're able to explicitly write down what our utility function is, how risk averse we are, or risk seeking we are and then make decisions based on that.

David Wright :

Thank you guys over at a time, so I wonder if you can close maybe have some plug for for route. I assume you're hiring. What would you like to say in closing Where can people go to learn more? Take it away?

Alex Carges :

Yeah, I think this discussion really does highlight how we are bridging data science and actuarial techniques together and really, really moving the science of ratemaking reserving insurance economics game strategy. Forward, you know, with the with the speed of technology in the world today, I think one is it to the earlier question. I don't think the world is going to wait for the sea, the CIS to change their curriculum. I think CS is doing good doing a decent job of moving that forward anyways, but the world will not wait. If it's valuable. You know, companies are going to find people that can do this things and we're definitely one of those companies that are that are seeking out talent across the board for data science and and actuarial

Matt Bonakdarpour :

Yeah, we have open positions, data science, actuarial engineering for anyone who's passionate about the rigorous development and effective deployment of modern techniques in the service industry. It's a joy to work on them. It's a fun challenge.

Alex Carges :

It's fun stuff.

David Wright :

Okay, join route Comm. I think that's where they go.

Matt Bonakdarpour :

Yep, the careers page on joinder. calm.

David Wright :

Great. My guests today are Matt, Alex and Isaac, guys. Thank you so much. Great conversation. Appreciate it.