Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 08 February 2024

Women and insurance pricing policies: a gender-based analysis with GAMLSS on two actuarial datasets

  • Giuseppe Pernagallo   ORCID: 1 ,
  • Antonio Punzo   ORCID: 2 &
  • Benedetto Torrisi   ORCID: 2  

Scientific Reports volume  14 , Article number:  3239 ( 2024 ) Cite this article

164 Accesses

Metrics details

  • Engineering
  • Mathematics and computing

In most of the United States, insurance companies may use gender to determine car insurance rates. In addition, several studies have shown that women over the age of 25 generally pay more than men for car insurance. Then, we investigate whether the distributions of claims for women and men differ in location, scale and shape by means of the GAMLSS regression framework, using microdata provided by U.S. and Australian insurance companies, to use this evidence to support policy makers’ decisions. We also develop a parametric-bootstrap test to investigate the tail behavior of the distributions. When covariates are not considered, the distribution of claims does not appear to differ by gender. When covariates are included, the regressions provide mixed evidence for the location parameter. However, for female claimants, the spread of the distribution is lower. Our research suggests that, at least for the contexts analyzed, there is no clear statistical reason for charging higher rates to women. While providing evidence to support unisex insurance pricing policies, given the limitations represented by the use of country-specific data, this paper aims to promote further research on this topic with different datasets to corroborate our findings and draw more general conclusions.


The research question of this paper stems from a popular belief, common in many countries. There are numerous quips regarding female drivers, who are often depicted as less skilled drivers than men. In Italy, for example, men usually say “donne al volante, pericolo costante”, which can be (approximately) translated as “women driving, peril thriving”. Albeit the issue may seem frivolous, it assumes great importance from the perspective of insurers, risk analysts and policy makers. If women were indeed worse customers for insurers, gender would represent an important variable to model insurance-related data. This study aims to provide evidence to determine whether insurers are statistically justified in treating women and men differently using claims data.

We have two main research objectives. Firstly, we look for the best model for the loss distribution (a largely debated issue in literature) and we investigate whether gender makes differences in some aspects of the distribution such as, for example, location or scale. Secondly, we evaluate whether gender affects the magnitude of losses, controlling for other available covariates. In particular, we give emphasis to the largest claims (the right tail of the loss distribution), which are of relevant importance for insurance companies.

In summary, our contribution is mainly empirical in nature, but also partly methodological. Empirically, we aim to provide evidence to answer the important policy question of whether gender is a relevant variable for insurers. These results are limited by the use of available data, but have important economic value for both insurers and policy makers (see “ Potential limitations ” for a discussion). On the other hand, we also contribute to the methodological literature by proposing the use of many statistical models neglected in similar works and introducing a bootstrap test to test for differences between groups in the tails of their distributions.

There are several studies related to the issue. Sivak and Schoettle 1 study the representativeness of gender in six different crash scenarios. Even though the results may be influenced by different factors, the authors find that, in some scenarios, male-to-male crashes tend to be underrepresented, whereas female-to-female crashes tend to be over-represented.

A study of prominent interest for insurers was carried out by Massie et al. 2 on passenger-vehicle travel data. The authors find that elevated crash involvement rates per vehicle-mile of travel are registered for young individuals (aged 16–19) and old drivers (75 and over). Men are more likely to experience a fatal crash whereas women are more frequently involved in injury crashes and in all police-reported crashes. Santamariña-Rubio et al. 3 provide contrasting evidences: first, the authors find the presence of an interaction effect between gender and age in road traffic injury risk; second, in some age groups men show excess risk compared to women, while in others they observe the opposite, with a dependence on the severity of the injury and the mode of transport.

Several studies have shown that, in general, women drive more cautiously than men 2 , 4 , 5 , 6 , 7 . Moreover, as documented in Regev et al. 8 , p. 131, “ driver’s age and gender have also been shown to affect the severity of crash outcomes (i.e. the risk of fatal injury given a crash) ”, with a higher likelihood to be exposed to fatal injuries in a crash for male and elderly drivers than female and young drivers 9 , 10 .

The theme of this paper is merely economic: if gender affects the likelihood of being involved in a crash or the severity of a car accident (and therefore economic losses for a company), then insurance companies may require different rates. The debate is still open. For example, a recent article of the HuffPost (Car Insurance Companies Charge Women Higher Rates Than Men Because They Can, by Elaine S. Povich, 2019, HuffPost) revealed that several studies in 2017 and 2018 showed that women over 25 generally pay more than men for auto insurance. As reported in the article, in many cases (and for the same policy) women paid $500 more than men for no reason other than their gender. The European Union, as reported by The Guardian (How an EU gender equality ruling widened inequality, by Patrick Collinson, 2019, The Guardian), introduced rules to avoid gender discrimination by car insurance companies, a practice detrimental for the principle of unisex pricing. One may argue that the variable “gender” is fully controlled by legislators, but this is not true for many relevant geographical contexts. As reported by the Business Insider (Car insurance rates are going up for women across the US—here’s where they pay more than men, by Shayanne Gal and Tanza Loudenback, 2019, Business Insider), in 44 US states insurance companies can use gender to determine a driver’s car insurance rate, whereas only the states of California, Hawaii, Massachusetts, Montana, North Carolina, and Pennsylvania have banned the practice. Therefore, the present study is of prominent interest for legislators of many states.

Risk classification is necessary in the insurance industry. Hence, some sort of differentiation is needed to operate optimally in the market, but such decisions require a “fair justification” 11 . As analysts, this means that gender-based price discrimination should be statistically motivated. Loss or claims data have been treated in literature generally without differentiating by gender (which is surprising given the huge quantity of studies in the field). These studies (see “ Literature review ”) consider many aspects of the data, from the distributional properties to predictive models. With this paper we want to check whether similar results hold when we separate data based on the claimant’s gender, using two important datasets provided in R packages.

We believe that our study is of interest for five main reasons. First, to our knowledge this is the first study that implements distribution fitting to claims data separating by gender. Studies in this field are generally concerned only with finding the best model for the whole distribution. Second, our empirical analysis shows the good performance of many statistical models neglected in the field. Third, we introduce a new parametric test to check whether VaR computed for females data differs from VaR computed for males data. The test has been conceived for our case study but can be used also in different contexts. Fourth, we show the power of GAMLSS modelling when dealing with asymmetric and/or non-mesokurtic data, or when a researcher aims to modify existing distributions, for example, via truncation or adjusting for zeros. Indeed, this approach can yield enormous benefits in modelling economic or financial data. Last but not least, we provide guidance for policy makers, encouraging the application of a fair pricing.

The paper is structured as follows. “ Literature review ” presents a review of the existing literature. “ Data ” describes the data used in the empirical analysis. “ Methodology ” illustrates the adopted statistical methodology. “ Distribution fitting results ” describes the results of the regression analysis when the available covariates are not included (hereafter often referred to as “distribution modelling/fitting”) where we also test for differences in the two distributions. “ Regression results ” shows the regression results when the available covariates are included in the analysis: we check whether gender is related to insurers’ claims, considering the whole distribution and the tail of the data. In “ Potential limitations ”, we discuss a series of shortcomings that could undermine the validity of our results. “ Conclusions and policy implications ” concludes the paper. Appendices (A, B, and C) are distributed as online supplementary material .

Literature review

Distribution modelling.

Regarding the first research question of this paper, we need to understand whether the claims of females and males behave differently in distribution. It has been shown in many works that the distribution of insurance losses is generally heavy-tailed 12 , 13 , unimodal hump-shaped or multimodal 14 , 15 , 16 and skewed 13 , 17 , 18 . Moreover, it is important to account for the positive support of the distribution 16 , 19 , 20 , 21 , 22 .

Among the many parametric models proposed in literature for the loss distribution 19 , Eling 18 assesses the performance of the following classical distributions: Normal, Student’s t , hyperbolic, generalized hyperbolic, normal inverse Gaussian, variance gamma, gamma, Weibull, Cauchy, skew-normal, skew- t , logistic, log-normal, exponential, Pareto, chi-square and geometric. As pointed out by Eling 18 , the Pareto distribution is a relevant statistical model in catastrophe insurance to describe, especially, large losses, and many authors have used it as a starting framework for modelling losses and lifetime data, or in any context characterised by heavy-tailed distributions 23 , 24 , 25 . The more flexible family of the generalized Pareto distributions, albeit promising to fit insurance data, has not found the same favour by researchers, probably because estimation methods like the maximum likelihood and method-of-moments are undefined in some regions of the parameter space, making the fitting procedure a difficult routine 26 .

Recently, some authors have focused their attention on more sophisticated, but also more flexible, composite 14 , 24 , compound 16 , 22 and mixture 20 , 27 , 28 models. All these approaches share the common principle to combine the characteristics of two or more distributions, so modelling many aspects that a single distribution cannot represent.

We provide novelty to this already large stream of papers in different ways. Firstly, we fit renowned, but also less used, parametric models to important car insurance datasets. Secondly, we avoid the boundary bias issue 29 , 30 , that in our case means allocation of probability mass to negative values, by considering distributions with a positive support or by applying convenient transformations to distributions defined on the whole real line. We accomplish the latter task by truncation or using a log-transformation. Thirdly, while the aforementioned works are concerned with the whole amount of claims, we fit the competing models splitting the data by gender to see whether relevant differences exist. Finally, we test whether gender makes differences in all (or some of) the parameters of the model used to describe the distribution of claims, and we introduce a bootstrap-based parametric test to see whether significant statistical differences exist between the value at risk (VaR) predicted by the various fitted models for females and males.

Regression modelling

With the second research question we want to assess whether gender has an effect on the magnitude of the claims, controlling for other available covariates. However, traditional regression techniques are problematic when dealing with actuarial data. Rousseeuw et al. 31 point out that in many applications (such as insurance data), outliers have relevant effects on the estimates. Traditional ordinary least squares (OLS) regression does not satisfy the requisite of robustness, because it is sensitive to outliers. Indeed, in the OLS method the underlying distribution is Gaussian 32 whereas insurance data, as discussed in “ Distribution modelling ”, depart severely from a Gaussian distribution. For these reasons, traditional OLS cannot be used for our purpose. Among the many alternative models that can solve these problems, quantile regression gained the favour of many analysts thanks to the fact that quantiles, such as the median, are less sensitive to outliers; moreover, quantile regression models are distribution-free 33 . However, Rigby et al. 34 note that “ quantile (and expectile) regressions are less reliable in the extreme tails of the distribution because of sparsity of data points ”. For this reason, the authors consider an alternative procedure for modelling the tail of a distribution under a regression perspective, which is used in the present work (see “ Regression results ”). From the point of view of an insurer, knowing the behaviour of the data in the tail of the distribution is fundamental to prevent and assess adequately the largest losses. Then, we also explore the relationship between extreme losses and gender.

We worked with two important insurance datasets. The choice of these datasets descends from the need of having enough covariates and a variable for gender. It is important to note that while these are large and reliable datasets, they are country-specific and therefore our results are difficult to generalize. An in-depth discussion of this issue is provided in “ Potential limitations ”.

The automobile bodily injury claims ( AutoBi ) dataset

The first dataset is freely available in the R package insuranceData and is called “Automobile Bodily Injury Claims” ( AutoBi ). This dataset derives from a 2002 survey conducted by the Insurance Research Council (IRC), a division of the American Institute for Chartered Property Casualty Underwriters and the Insurance Institute of America. The survey asked participating companies to report claims closed with payment during a designated 2-week period. The sample available in the package is made by 1340 bodily injury liability claims.

The variable of our interest is the claimant’s total economic loss (abbreviated as Loss ) in thousands of dollars from a single state. Furthermore, thanks to the variable Clmsex , i.e. the claimant’s gender, we were able to subset the original data dividing the losses for men and women. The split of the data causes the loss of some observations since the claimant’s sex is not available for all the reported losses. The variable Loss is also used in the regression model as dependent variable; however, for the description of the model and the included covariates we invite the reader to look at “ AutoBi ”. This dataset is also used, among the others, by Frees 35 in his book.

The automobile claim datasets in Australia ( ausprivauto0405 )

The second dataset is freely available in the R package CASdatasets and is named “Automobile claim datasets in Australia”. Specifically, we use the dataset ausprivauto0405 , made of 67,856 observations, which represent 1-year vehicle insurance policies taken out in 2004 or 2005 in Australia. Among the available policies, 4624 have at least one claim, the rest of the data are all zeros. All the losses are expressed in Australian dollars, but for scaling purposes, we rescaled the data to work with hundreds of dollars. In this case there are no missing observations. The rescaled variable ClaimAmount is also the dependent variable for the regression model, but all the information regarding the model are provided in “ ausprivauto0405 ”. This dataset is also used, among the others, by De Jong and Heller 36 in their book. It is important to note that given the presence of many zeros, all the models considered for this dataset have been zero adjusted, which means including a probability mass at 0 37 . In this way we have two different views for the phenomenon: the first dataset is focused only on losses, whereas the second one considers also policy holders without reported losses, in this way accounting for the possibility that car accidents can be more frequent depending on driver’s gender.


As already detailed in “ Literature review ”, we evaluate the variable of interest, namely the Loss variable (denoted by Y ), from the point of view of its distribution (“ Distribution modelling ”) and as a function of some covariates of interest \({\varvec{Z}}\) (“ Regression modelling ”), giving particular attention to the Gender variable. For uniformity sake, we handle both these research objectives under a model-based paradigm which uses the very flexible family of generalized additive models for location, scale and shape (GAMLSS), proposed by Rigby and Stasinopoulos 38 to overcome some of the limitations associated with the generalized linear models (GLMs)—such as, for example, the exponential family distribution assumption for the response variable—and generalized additive models (GAMs). In the GAMLSS methodology, the systematic part of the model is expanded to allow equations not only for the mean, but also for the other parameters (scale and shape) of the distribution of the response variable.

The GAMLSS regression framework

A GAMLSS model can be expressed as

where \({\mathcal {D}}(\mu ,\sigma ,\nu ,\tau )\) is a four-parameter distribution (but it can have less or more parameters), with \(\mu\) and \(\sigma\) usually characterizing location and scale, respectively, and with \(\nu\) and \(\tau\) known as shape parameters (i.e., skewness and kurtosis). We denote with \(i=1,\ldots ,4\) the generic i th equation in the system, \(\eta _i\) is a predictor of the parameter (one for each of the four parameters), \(g_i(\cdot )\) is a function to model the parameter of the distribution (in the empirical part of the paper we use the default functions of the commands gamlss , gamlssML , and gamlssZadj ), \(\varvec{Z}_i\) is a vector of covariates, \(\varvec{\beta }_i\) is the coefficient vector, and \(s_{ij}(\cdot )\) is a nonparametric smoothing function applied to the covariate \(\varvec{z}_{ij}\) , \(j=1,\ldots ,J\) , with J denoting the number of covariates. The smoothing terms \(s_{ij}(\cdot )\) introduce nonlinearities in the model, and are unspecified functions estimated using a scatterplot smoother, in an iterative procedure called the local scoring algorithm 39 .

The form of \({\mathcal {D}}(\mu ,\sigma ,\nu ,\tau )\) is general and only implies that the distribution should be in parametric form; it can be any distribution (including highly skew and kurtotic continuous and discrete distributions) and it can model heterogeneity (e.g., cases where the scale or shape of the distribution of the response variable changes with explanatory variables). All the distributions defined on \((0,\infty )\) can be zero adjusted to \([0,\infty )\) by including a probability mass at zero using the gamlss.inf package 40 . The resulting new distribution can then have up to five parameters, the four parameters of the original distribution defined on \((0,\infty )\) plus a parameter \(\xi _0=p_0=P(Y=0)\in \left( 0,1\right)\) that represents the probability mass at 0. Computationally, the function gen.Zadj() creates a mixed (continuous-discrete) probability density function (pdf) given by

where \(f(y|\mu ,\sigma ,\nu ,\tau )\) denotes the pdf on \((0,\infty )\) .

How the research objectives of the paper are handled

Firstly, we look for the best model for the loss distribution (see “ Distribution modelling ” for related literature) and we investigate whether Gender makes differences in some aspects of the distribution such as, for example, location or scale. We handle this first objective by regressing all the parameters \(\mu\) , \(\sigma\) , \(\nu\) and \(\tau\) of \({\mathcal {D}}(\mu ,\sigma ,\nu ,\tau )\) on Gender , i.e. on only one covariate ( \(Z_1=Z_2=Z_3=Z_4=Z\) ) in ( 1 ). Thus, in case of differences due to gender in the loss distribution, that we can identify by looking at the significance of the coefficients \(\beta _1\) , \(\beta _2\) , \(\beta _3\) and \(\beta _4\) in ( 1 ), we have the advantage to detect the aspect(s) (location, scale and/or shape) affected by this variable.

We try several models for the loss distribution not only to have a large set of models within which to look for the best one, but also to make the evaluation of gender differences more robust with respect to a wrong model specification. Thanks to the package gamlss and its extensions 41 , 42 , we consider both classical distributions already defined on \((0,\infty )\) and new distributions on \((0,\infty )\) . These new distributions are created from those with support \((-\infty ,\infty )\) , using the inverse log (i.e. the exponential) transformation through the function gen.Family() with argument type=“log” , and by truncation using the function gen.trun() 42 . In detail, we consider the following 30 parametric models: Box-Cox Cole and Green, Box-Cox Power Exponential, Box-Cox t , Burr, Dagum (Burr III), Exponential, Gamma, Generalized Beta type 2, Generalized Gamma, Generalized Inverse Gaussian, Generalized Pareto, Inverse Gamma, Inverse Gaussian, Log-Gumbel, Log-Johnson’s SU, Log-Logistic, Log-Normal, Log-Power Exponential, Log-Skew Normal Type 2, Log-Skew t Type 5 43 , Log- t Family, Pareto Type 2, Truncated Exponential Gaussian, Truncated Johnson’s SU, Truncated Logistic, Truncated Normal, Truncated Power Exponential, Truncated Skew t Type 5 43 , Truncated t Family, Weibull.

The distributions were fitted via the maximum likelihood (ML) approach. It must be noted that, for the ausprivauto0405 dataset, we did not implement all the distributions because of computational problems related with the zero adjusted routine 44 . However, considering that we use a large number of distributions, it should not be a great loss to exclude these models from the analysis. Once the regression models are fitted, we rank them via the Akaike information criterion (AIC 45 ) and by the Bayesian information criterion (BIC 46 ), which represent the most popular criteria in the actuarial literature 16 , 18 , 27 , 28 .

Secondly, as concerns the objective of assessing the impact of Gender on Loss , controlling for other covariates, we always use the GAMLSS regression framework to model the whole distribution and its tail. The research question in this case pertains to whether female claimants generate higher losses for insurers such that the application of higher rates can be supported by a “fair justification” 11 . The use of heavy-tailed distributions overcomes the problem of extreme values in actuarial datasets. Nonetheless, knowing how gender impacts the mean or one of the other parameters of the losses distribution is less interesting for insurers than knowing the impact of gender on the tail of the distribution, where the highest losses are placed. To study this portion of the data, without recurring to nonparametric methods like the less reliable quantile regression 34 or more complex approaches like entropic/symbolic methods 47 , we use a procedure that can be found in “ Regression results ” of the present paper 34 , 48 .

Comparing the tail behaviour

Comparing the female and male distributions in their tails is important information for insurers because of its relation to VaR. In detail, we define a parametric (model-based) bootstrap test that can be schematized as follows.

Compute the sample values at risk, \(\text {VaR}^F_\alpha\) and \(\text {VaR}^M_\alpha\) , separately for females and males, but at the same probability level \(\alpha\) , and compute the test statistic \(\text {AD}_{\text {obs}}=\left| \text {VaR}^F_\alpha -\text {VaR}^M_\alpha \right|\) .

Fit the GAMLSS model of interest— \({\mathcal {D}}(\mu ,\sigma ,\nu ,\tau )\) or \({\mathcal {D}}(\mu ,\sigma ,\nu ,\tau ,{\xi }_0)\) , depending on the available data—to the whole data of size \(n=n_{\text {F}}+n_{\text {M}}\) , where \(n_{\text {F}}\) and \(n_{\text {M}}\) are the sample sizes for females and males, respectively.

For \(r=1,\ldots ,B\) :

generate two samples of sizes \(n_{\text {F}}\) and \(n_{\text {M}}\) from the model fitted at step 2;

compute the AD statistic, say \(\text {AD}_r\) , on the generated samples.

Under \(H_0\) (VaRs for males and females are statistically non-different), \(\text {AD}_1,\ldots ,\text {AD}_B\) are equally likely and the p value of the testing procedure can be computed as

where \(F_{\text {Boot}}\left( \cdot \right)\) is the (stepwise) cumulative distribution function of \(\text {AD}_1,\ldots ,\text {AD}_B\) 49 .

In real data analyses, whose results are described in “ Distribution fitting results ”–“ Regression results ”, we consider a sufficiently large number of bootstrap replicates ( \(B=1000\) ); moreover, as usual in the insurance practice/literature, we consider the probability levels 0.95 and 0.99.

Distribution fitting results

Autobi data.

We start with the AutoBi data described in “ The automobile bodily injury claims ( AutoBi ) dataset ”. Supplementary figures C.1 – C.3 in Appendix C (online) show histograms and normal Q–Q plots for the total amount of losses (Supplementary figure C.1 ), for the losses reported by female claimants (Supplementary figure C.2 ), and for the losses reported by male claimants (Supplementary figure C.3 ). On the histograms we superimpose also a kernel density estimate (the red line) to give an idea on how the density of the observed data should look like. The horizontal axis of the histograms in Supplementary figures C.1 – C.2 is restricted to 250 for the sake of readability.

From the Q–Q plots we see that the distribution of losses for both females and males cannot be approximated by a Gaussian distribution (which is quite obvious); furthermore, the underlying distributions appear to be right skewed and heavy-tailed, as we expected. From all the histograms we confirm another recurrent feature of insurance loss data: the presence of a large amount of small losses and a lower number of high losses 16 , 18 . However, it should be noted that the maximum loss is registered for female claimants (1067.70), whereas the maximum for male claimants is much smaller (222.41). The kernel density estimate in the three cases seems to suggest a similar distribution, highly right-skewed and highly peaked. Further detailed information on the differences among the data can be obtained looking at the descriptive statistics in Table  1 . The mean loss is higher for females than males; however, looking at the median (less sensitive to extreme values) we see that there are no remarkable differences. Nonetheless, the variability (and then the risk) is much higher for females, as evidenced by the range and by the standard deviation. The females data are also more skewed and exhibit a more pronounced leptokurtosis. The VaR shows that an insurer should expect (with confidence at 99%) higher losses for male policy holders.

Supplementary Tables A.1 – A.3 in Appendix A (online) show the results of the distribution fitting. The results can be summarized as follows. First, we see that among the best models we have the Box-Cox t (selected by both the AIC and BIC as the best model for the total losses and females’ losses), the Truncated t and the Truncated Skew- t . Similar results are obtained for the female and male claimants, with a good performance of the Log-Johnson’s SU model, whereas also the Generalized Pareto and the Log-Power Exponential are competitive models. Second, we do not observe drastic differences in the selection of models for females and males. Finally, we see that distributions often neglected in applied works, such as the generalized Pareto or the log-Johnson’s SU, represent good alternatives to traditional models, whereas the variants of the normal distribution perform poorly for these data.

In order to check whether gender may explain differences in the loss distribution, we ran a GAMLSS regression for each model as described in the first part of “ How the research objectives of the paper are handled ”. The results are reported in Table  2 . The coefficient of gender was significant only for few distributions parameters and for an exiguous number of distributions. This is a strong evidence against the fact that the loss distribution is affected by gender, regardless of the considered parametric model.

Supplemntary tables A.4 – A.6 in Appendix A (online) show the VaR at 95% and 99% (computed numerically) for the three typologies of data for each of the selected models. We compared these results with the observed VaRs. In this case the ranking is very different because is based on the fact that the best distribution is the one that minimises the absolute distance from the empirical VaR. Summarily, we note that the results are very different if we consider a different confidence level. Furthermore, the results for the males in this case seem to differ from the results for the females. This is reasonable since extreme values are placed in the tail of the distribution. To test if these tail differences are statistically significant, we performed the parametric bootstrap test illustrated in “ Comparing the tail behaviour ”; the results are reported in the left part of Table  3 . For many models the differences resulted statistically significant; therefore, we should conclude that for these models the tail behaviour differs by gender. This does not necessarily imply that female claimants are riskier than male claimants, it simply means that VaRs are different.

ausprivauto0405 data

We now analyse the distribution fitting results for the ausprivauto0405 data. Supplementary figures C.4 – C.6 in Appendix C (online) show histograms and normal Q–Q plots for the total amount of losses (Supplementary figure C.4 ), for the losses reported by female claimants (Supplementary figure C.5 ) and for the losses reported by male claimants (Supplementary figure C.6 ). We remember that for scaling purposes the variable ClaimAmount is expressed in hundreds of dollars; furthermore, since we are considering only reported losses, we have excluded for the moment the zeros. In this case there was no need to restrict the horizontal axis of the histograms. The analysis of the histograms and of the normal Q–Q plots confirm the findings observed in the first dataset and characterising the majority of claims data: non-normality deriving from severe right skewness and heavy-tailed distributions, and the fact that the majority of the observations are concentrated in the first bins of the histograms. The analysis of the plots including also the zeros is redundant.

Table  4 shows the descriptive statistics for the ausprivauto0405 data (zeros excluded), whereas Table  5 shows the same statistics including also the zeros. We note that with respect to the other dataset, the losses for males are higher, on average and median, and more variable than the females. The females’ loss distribution is slightly more peaked but less skewed, whereas the males’ distribution including also the zeros shows higher kurtosis and skeweness. The VaR shows that an insurer should expect (with confidence at 99%) higher losses for male policy holders.

Supplementary Tables B.1 – B.3 in Appendix B (online) show the results of the distribution fitting. The ZA Generalized Gamma was selected as the best model by both the AIC and BIC for the total claims, and both the females and males claims. The ZA Log-Skew Normal, the ZA Log-Johnson’s SU and the ZA Generalized Inverse Gaussian were competitive models for all the three groups of data. Table  6 shows that, for this dataset, gender seems to play a role in explaining differences in the location parameter, and for some distributions also the dispersion parameter. As for the AutoBi data there is weak evidence that gender could explain the shape of the distribution.

Supplementary Tables B.4 – B.6 in Appendix B (online) show the estimated VaR values at 95% and 99% using the ZA parametric models. We can say that ZA Truncated Power Exponential, ZA Generalized Pareto and ZA Log-Skew Normal are good models to describe the tail behaviour of these data. As in the previous dataset there are differences between the ranks obtained using the two different levels. However, in this case the VaR bootstrap tests highlight that there are no significant differences in the tail of the distribution of male and female claimants when we consider a level of 95%, whereas significant differences emerge for a level of 99% (see Table  3 ).

Regression results

In this section we tackle the second research question of the paper, i.e. whether gender affects the claims distribution controlling for other available covariates. We fit regression models on the whole dataset and on the right tail of the data. The former approach is useful to quantify the effect of gender on the conditional location, scale and shape of the losses, the latter to quantify the effect of gender on the largest claims. For insurance companies this information is of relevant importance because it influences the solvency of the company and its policies. The GAMLSS framework consents to exploit the results of the distribution fitting in order to use the best model as underlying distribution.

The choice of functions \(g_i(\cdot )\) , \(i=1,\ldots ,4\) , to model the parameters of the considered models (refer to “ The GAMLSS regression framework ”) is limited to those available in the gamlss package. To model the tail of the data we used a different approach 34 , 48 . These are synthetically the steps followed.

We fitted a \(\alpha\) (95% and 99%) smooth quantile curve for LOSS (or ClaimAmount) against the explanatory variables using the R package cobs with automatic smoothing parameter selection.

We selected the cases above the \(\alpha\) quantile curve to work only with the tail of data.

We fitted a suitable GAMLSS truncated distribution to the tail data with the fitted \(\alpha\) quantile as truncation parameter. Since fitting via regression all the distributions is computationally prohibitive, the choice of an adequate distribution is determined using the best models obtained in “ Distribution fitting results ”. For the whole dataset we used the best model on the total claims distribution, while for the tail of data we used the best model as suggested by the VaR difference between the empirical VaR and the theoretical VaR. For the asprivauto0405 dataset we used GAMLSS zero-adjusted distributions.

We fitted regression models to assess the magnitude of the gender coefficient on the distribution of claims using, for the tail of data, the truncated distribution as determined in step 3.

The AutoBi dataset contains the following explanatory variables:

Attorney : whether the claimant is represented by an attorney.

Clmsex : claimant’s gender.

Marital : claimant’s marital status (= 1 if married, =2 if single, = 3 if widowed, and = 4 if divorced/separated).

Clminsur : whether or not the driver of the claimant’s vehicle was uninsured.

Seabelt : whether or not the claimant was wearing a seatbelt.

Clmage : claimant’s age.

As before, the dependent variable of the regression model is Loss , the claimant’s total economic loss (in thousands of dollars). In order to perform the regression model, we create dummy variables for Attorney (1 if yes), for Clmsex (1 if female), for each marital status, for Clminsur (1 if yes) and for Seatbelt (1 if yes). To avoid the dummy variables trap we exclude from the regression the dummy relative to divorced/separated, which becomes the benchmark category. Due to the presence of missing observations we use listwise deletion to eliminate the rows with missing information, therefore, the final dimension of the dataset in terms of rows is 1091.

Tables  8 and 9 show the result of the GAMLSS regressions. We could not fit the best model for the 99% quantile because the cases above it are too few to fit a suitable regression model. Figure  1 shows the wormplots for the AutoBi data. We used also other graphical tools for diagnostics and we estimated many models but we omit them from this paper for the sake of synthesis. The interested reader can contact the corresponding author for further elaborations.

figure 1

Wormplots of models I–IV (Tables  7 , 8 ) for the AutoBi data. Upper panels: model I on the left, model II on the right. Lower panels: model III on the left, model IV on the right.

AutoBi : regression model on total claims

In Table  7 , we report the results of two regression models. In model I we model only the equation of the parameter \(\mu\) using all the data and all the explanatory variables. The best model, as suggested by the analysis performed in “ AutoBi data ”, is the Box-Cox t distribution. The coefficient of our interest is the coefficient of Clmsex . Female claimants are associated with a positive and significant (at 5%) increase in the insurer losses (in thousands of dollars). The fit of the model is good enough as evidenced by the wormplot of the model in Fig.  1 (upper-left panel). However, we can obtain better estimates if we model also the other parameters, i.e. the scale parameter \(\sigma\) and the skeweness and kurtosis parameters \(\nu\) and \(\tau\) . To achieve this purpose, we gone through several models estimation. These models do not exhaust all the possible cases: given the fact that we can model four equations using several explanatory variables, the number of cases is high. This happens because not only can we create many models by simply changing the set of explanatory variables among those available (all models with one variable, with two variables, with three variables, and so on) but we can test these different combinations in four different equations (one for the mean, one for the dispersion parameter, and so on). However, we tried to cover all the relevant cases for the research question of this paper. These relevant cases are all those in which it was possible to retain the gender variable (given the research question of this paper), and were considered the best (using information criteria and graphical tools such as wormplots) among those with the gender variable for which the algorithm was able to converge.

Model II represents the best model, with respect to the many models that we estimated, in terms of computational feasibility(with this term we refer to the fact that some models were not computationally feasible and/or showed excessive time complexity), AIC and BIC, and goodness of fit as exhibited by the worm plot. The wormplot (Fig.  1 , upper-right panel) shows a better fit since all the points lie within the 95% confidence intervals given by the two elliptic curves. The coefficient of Clmsex preserved the same sign and approximately the same magnitude. On the other hand, Clmsex does not affect significantly the other parameters of the distribution. Finally, the significant coefficients of the other explanatory variables are economically reasonable. For example, considering the \(\mu\) equation, if the claimant is represented by an attorney, the insurance company tends to pay bigger amounts; if the age of the claimant increases, also the loss for the company increases, probably because elder people suffer more physical damages in car accidents.

AutoBi : regression model on the tail of data

The analysis for the tail of the data is reported in Table  8 . In this case the best distribution is selected according the result for the VaR estimation reported in online Supplementary table A.4 . Once again, we first estimate a model (III) only for the \(\mu\) equation and with all the explanatory variables ( Widowed is dropped because on 54 cases there were not sufficient observations for this variable). The other model (IV) is again the best one in the sense specified in “ AutoBi ”. In model IV we include a smoother for Clmsex ( pb is a smoothing additive term based on P-splines) for both the \(\mu\) and \(\sigma\) equations. Modeling also the other equations is not possible due to the low number of cases available in the tail of data.

These results are probably more interesting for an insurer. The coefficient of Clmsex is strongly significant and negative in both models. This means that female claimants entail lower losses for insurers, which means that the biggest losses are made for male claimants as confirmed by other works 9 , 10 . In model IV we also learn that the variable Clmsex has a negative effect also on the scale parameter, which means that female claimants decrease the spread in the tail of the distribution. Both the wormplots of model III and IV show a satisfactory fit (Fig.  1 , respectively, lower-left and lower-right panels). Once again, the presence of an attorney is associated with the biggest losses for the company.


The dataset asprivauto0405 contains 9 variables. The dependent variable in our study is ClaimAmount , which is the sum of claim payments. In this case we do not use the term loss because the variable ClaimAmount contains also zeros. The explanatory variables available in the dataset are:

Exposure : the number of policy years.

VehValue : the vehicle value in thousand of Australian dollars.

VehAge : The vehicle age group divided into 4 classes: old cars, oldest cars, young cars and youngest cars. We created a dummy variable for each category.

VehBody : the vehicle body group divided into 13 classes: Bus, Convertible, Coupe, Hardtop, Hatchback, Minibus, Motorized caravan, Panel van, Roadster, Sedan, Station wagon, Truck and Utility. We created a dummy variable for each category.

Gender : the gender of the policyholder. We created a dummy variable for female claimants ( Female ).

DrivAge : the age of the policyholder divided into 6 classes: old people, older working people, oldest people, working people, young people and youngest people.

ClaimOcc : a dummy variable that indicates occurence of a claim.

ClaimNb : the number of claims.

We proceed as for the AutoBi dataset with the only difference that for this dataset we use the zero-adjusted GAMLSS framework. Also in this case we estimated several models but we report only the relevant cases for the sake of synthesis, which are, as mentioned earlier, those for which the gender variable could be retained and were selected as the best model among those for which the algorithm was able to converge.

ausprivauto0405 : regression model on total claims

We started with the ZA Generalized Gamma (GG) as underlying distribution since it was the best one to model the total amount of claims (online Supplementary table B.1 , Appendix B). Unfortunately, for this model the regression algorithm cannot reach convergence and this affects the reliability of the estimates. Given the problem of convergence, we tried the second and third best models as suggested by the analysis of Supplementary table B.1 (Appendix B, online), but for the ZA Log-Skew Normal Type 2 and the ZA Truncated Power Exponential we had also the same problem. Consequently, in order to improve the reliability of the regression model we discarded them. For the fourth best model, the ZA Log-Johnson’s SU, the algorithm converged.

Model V in Table  9 is the best in terms of computational feasibility, AIC, BIC, and wormplot. Nonetheless, we should warn the reader that better models could be obtained removing the variable Female , but this is not the purpose of this paper. Even though the coefficient of the variable ClaimOcc in the \(\xi _0\) equation is not significant, we include it to obtain a satisfactory wormplot (Fig.  2 , upper-left panel). We did not model also the equation for the \(\tau\) parameter because this would have increased enormously the time complexity. Just to give an idea, Model V in Table  9 converged after 220 iterations, a model with all variables in the four parameters did not converge even after 1500 iterations (a routine of about 24 h on a computer Intel Core i7-6500U CPU with 16 GB of RAM).

The variable Female affects significantly both the \(\mu\) and \(\sigma\) parameters and the sign is negative, which means that for female claimants the location and spread of claims is lower respect to male claimants. No significant effect resulted for the coefficient of Female on the parameter \(\nu\) . We also tried a model where the variable Female appeared also in the \(\xi _0\) equation, but the coefficient was highly non-significant. As in the AutoBi dataset we find the same effect of gender on the spread, but in this dataset, where we consider also the case of no-claims, we find that female claimants seem to be better clients for insurers also in terms of the location parameter.

ausprivauto0405 : regression model on the tail of data

We shift now our attention to the tail of the distribution. Since now we deal with data above the 95% and 99% quantiles, we are eliminating from the analysis all the zeros and dealing only with losses. In this case the regression framework becomes again the traditional GAMLSS without any need for zero-adjustment. Moreover, including the variable ClaimOcc becomes redundant because in the tail there are only realised claims.

Table  10 shows the results of the best model for cases above the 95% quantile among many competing models. The choice of the Truncated Power Exponential was determined by the results obtained comparing the empirical VaR with the VaR predicted by the models (online Supplementary table B.4 , Appendix B). One may notice that the analysis of VaR was conducted using ZA distributions, but this is a minor concern since the wormplot shows that the model offers a good fit for the data (Fig.  2 , upper-right panel). The coefficient of Female is significant and positive in the \(\mu\) equation, which means that claims in the tail increase for female claimants, whereas the coefficient of Female for the scale parameter is non-significant. We excluded the variable from the \(\nu\) equation because it was non-significant and it affected severely the goodness of fit of the model.

figure 2

Wormplots of model V–VIII (Tables  9 , 10 , 11 ) for the ausprivauto0405 dataset. Upper panels: model V on the left, model VI on the right. Lower panels: model VII on the left, model VIII on the right.

Table  11 shows two possible models to describe the behaviour of extreme losses. Both models are good in terms of fit as highlighted by the wormplots in Fig.  2 . However, model VII should be preferred in terms of AIC and BIC. In model VIII the variable Female was removed from the equation for the location parameter because it was non-significant. The choice of the underlying distributions is again determined by computational feasibility and the results of Supplementary table B.4 (Appendix B, online). The coefficient of the variable Female is negative and significant at 10% for the location parameter in model VII and for the dispersion parameter in model VIII. These results are in line with the observed tail behaviour in the AutoBi dataset (Table  8 ).

Potential limitations

In this section, we address a series of shortcomings that could undermine the validity of our results.

Finding adequate data when dealing with actuarial studies is a relevant problem. Since in most cases researchers need micro-data, these data should contain enough information, especially when one aims to run regressions. In our case a suitable dataset must report the claimant’s gender and a sufficient number of other variables to avoid endogeneity problems. Furthermore, the ideal dataset should include an high number of observations and should contain data on a relevant geographical context to draw useful policy proposals. Nonetheless, the search of these data was not painless. We think that the data used in our study are a good compromise. The AutoBi dataset allows us to study the American context, where the problem of pricing based on gender is currently relevant. Moreover, the ausprivauto0405 dataset allows us to extend the analysis to a different geographical context, including also policy holders with no claims.

One may argue that the data used are old. We think that this is not a serious problem for many reasons. It is customary in actuarial studies to work with important and established datasets. Working with reliable and significant data is more important than working with new data. Furthermore, as already mentioned, finding data is very difficult. The literature is plenty of works dealing with older but established datasets. Just to mention: the Danish Fire losses dataset contains data gathered over the period 1980–1990, yet it is still one of the most used in contemporary studies 18 , 27 ; Fuzi et al. 33 used private car policies in year 2001; Blostein and Miljkovic 28 used data for the time period 1988–2001. Another relevant aspect to consider is that the distribution of claims generally presents the same statistical features over time and across countries.

We are aware of the fact that many other variables should have been added in the model, such as locations of accident, time of the accident, reason of the accident (drug, traffic rule disregard, etc.) and so on. Nonetheless, a dataset with such a detailed information, to our knowledge, is not freely accessible. The data used in this paper are among the most complete we could have found. Nevertheless, we must stress that the use of country-specific data limits the conclusions drawn from these datasets to the cases analyzed; therefore, further research using the same methodology but different data would help corroborate the results of this work. In this regard, the hope is that more insurers will make the data freely available to advance actuarial research.

The regression models used in our analysis served to study the relationship between gender and claims; however, no causal effect can be drawn from this setup. The point is that even conceiving a study capable of assessing the existence of a causal effect is troublesome because car accidents, and hence the amount of claims, are too complex to ideate any experiment. The lack of data makes this problem even worse. Nonetheless, the study of correlations is important to investigate whether a fair justification supporting a pricing practice exists.

External validity

One major drawback from using data of US and Australian companies is the impossibility of drawing general conclusions also for other countries. In general, a representative sample is needed to generalize the results to different countries. As one of the referees pointed out, it is reasonable to assume that our data are not representative of the many policy holders who have contracts with insurance companies. This obviously limits the application of our results to the scenarios analyzed, and their application to broader contexts depends strictly on how close one thinks our data are to a representative sample.

Despite this, our results are useful for different reasons. First, as we point out in “ Introduction ”, the problem of price discrimination based on gender is particularly relevant in the US. This work therefore can be used to provide statistical substance to the debate. Second, Australia and USA are two prominent markets for insurers worldwide. Third, even though driving habits are very different from country to country, countries with similar backgrounds can still use the results of our analysis. Fourth, the loss distribution is characterized by stylized facts that make the present study useful also for different data. Finally, our work can serve as a stimulus to produce further empirical evidence on this topic, providing new insights into the external validity of our results.

Conclusions and policy implications

This paper provides several results that extend and enrich the existing literature. These results can be split into two parts. In the first part of the paper, we focus our attention on finding the best statistical model to describe the distribution of claims. The variables investigated are taken from two important R packages. The Autobi dataset allows us to work on losses, as is commonly done in the literature 16 , 18 , 27 , whereas the ausprivauto0405 includes also zeros, allowing us to adopt the zero-adjusted distribution framework. Moreover, we conduct the analysis not only on total claims but also distinguishing by gender and analysing the tail behaviour of the data.

In the first part of the paper, we learn that male and female claims can be approximated by similar distributions, for example the Truncated Skew t Type 5 or the Truncated t Family for the AutoBi dataset. Secondly, regarding the effect of gender on the parameters of the distribution, we find a significant difference for the location parameter of many distributions for the second dataset (Table  3 ). Finally, thanks to a parametric bootstrap test based on the difference between VaRs, we can conclude that for many distributions a significant difference exists between the tail distribution of male and female claimants. Based on this evidence, few statistical differences seem to exist between male and females. However, this just evidences that the best model to describe the data may differ by gender. Unfortunately, these results are limited by the use of the only available data we could find. Therefore, this evidence, although based on sound statistical methodology, should be supported by the analysis on additional data to be generalized.

The second part of the paper is devoted to build a GAMLSS regression model to capture the “effect” of gender on the claims reported by the insurer. In this case we conduct the analysis using all the data and the tail (cases above the 95% and 99% quantiles). It seems that for female claimants the spread of losses is lower than for male claimants. For the \(\mu\) parameter the results are contrasting. For the AutoBi dataset we find evidence of a positive effect of female claimants on the location parameter when we consider all the data, whereas the effect is negative when we consider only the cases above the 95% quantile. For the ausprivauto0405 dataset we find evidence of a negative effect on location when considering all the data and on extreme losses (cases above 99%), and a positive effect when considering cases above 95%. The negative effect on the location parameter on the whole dataset is, in our opinion, a more reliable result than the positive effect for the AutoBi dataset because the inclusion of zeros accounts for the fact that females can be safer policy holders.

Nonetheless, the regression framework presents some limits. The principal limits are related to the high complexity of the computational routines and to the lack of data. We must rely on the adequacy of the control variables provided in the R packages. The strength of the empirical analysis is that the GAMLSS framework allowed us to study the phenomenon thoroughly, including also equations for the other parameters of the distribution (quite often neglected in empirical works) and weighting also the information carried by the zeros. The main limitation is the use of old, country-specific data, which reduces the scope of these results, although the analysis is robust and allows useful policy implications to be drawn for many countries.

In conclusion, our research enlightened that finding a “fair justification” 11 for applying different rates to male and female claimants is difficult. However, female claimants seem in most of the investigated cases to decrease the location parameter for extreme losses and when zeros are included. Furthermore, in our data female claimants have a beneficial effect on the scale parameter of claims, since for females the spread of losses decreases. We do not think that these results represent incontestable statistical reasons to differentiate policy rates by gender. Indeed, if we read our results together with other works that show that female policy holders are safer than men, we do not see any clear reason to charge women with higher rates. The same argument can be made for male policy holders. The evidence collected suggests in part that men may be riskier for insurance companies in some cases, but the evidence is not strong enough to justify charging higher rates. Future research can make use of the methodology presented in this paper to see if similar results are obtained for different data. In any case, this paper offers guidance to policy makers in the countries considered on whether unisex pricing policies should be promoted.

Data availability

Data can be accessed downloading the R packages reported in the paper.

Sivak, M. & Schoettle, B. Toward understanding on-road interactions of male and female drivers. Traffic Inj. Prev. 12 (3), 235–238 (2011).

Article   PubMed   Google Scholar  

Massie, D. L., Campbell, K. L. & Williams, A. F. Traffic accident involvement rates by driver age and gender. Accid. Analy. Prev. 27 (1), 73–87 (1995).

Article   CAS   Google Scholar  

Santamariña-Rubio, E., Pérez, K., Olabarria, M. & Novoa, A. M. Gender differences in road traffic injury rate using time travelled as a measure of exposure. Accid. Anal. Prev. 65 , 1–7 (2014).

Åkerstedt, T. & Kecklund, G. Age, gender and early morning highway accidents. J. Sleep Res. 10 (2), 105–110 (2001).

Kim, K., Brunner, I. M. & Yamashita, E. Modeling fault among accident—involved pedestrians and motorists in Hawaii. Accid. Anal. Prev. 40 (6), 2043–2049 (2008).

Ma, L. & Yan, X. Examining the nonparametric effect of drivers’ age in rear-end accidents through an additive logistic regression model. Accid. Anal. Prev. 67 , 129–136 (2014).

Zhou, H., Zhao, J., Pour-Rouholamin, M. & Tobias, P. A. Statistical characteristics of wrong-way driving crashes on Illinois freeways. Traffic Inj. Prev. 16 (8), 760–767 (2015).

Regev, S., Rolison, J. J. & Moutari, S. Crash risk by driver age, gender, and time of day using a new exposure methodology. J. Saf. Res. 66 , 131–140 (2018).

Article   Google Scholar  

Vorko-Jović, A., Kern, J. & Biloglav, Z. Risk factors in urban road traffic accidents. J. Saf. Res. 37 (1), 93–98 (2006).

Kim, J.-K., Ulfarsson, G. F., Kim, S. & Shankar, V. N. Driver-injury severity in single-vehicle crashes in California: A mixed logit analysis of heterogeneity due to age and gender. Accid. Anal. Prev. 50 , 1073–1081 (2013).

Thiery, Y. & Van Schoubroeck, C. Fairness and equality in insurance classification. Geneva Pap. Risk Insur. Issues Pract. 31 (2), 190–211 (2006).

Embrechts, P., McNeil, A. & Straumann, D. Correlation and dependence in risk management: Properties and pitfalls. Risk Manage. Value Risk Beyond 1 , 176–223 (2002).

Article   MathSciNet   Google Scholar  

Bernardi, M. & Maruotti, A. Skew mixture models for loss distributions: A Bayesian approach. Insur. Math. Econom. 51 , 617–623 (2012).

Cooray, K. & Ananda, M. M. A. Modeling actuarial data with a composite lognormal-pareto model. Scand. Actuar. J. 2005 (5), 321–334 (2005).

Jeon, Y. & Kim, J. H. T. A gamma kernel density estimation for insurance loss data. Insur. Math. Econom. 53 (3), 569–579 (2013).

Punzo, A., Bagnato, L. & Maruotti, A. Compound unimodal distributions for insurance losses. Insur. Math. Econom. 81 , 95–107 (2018a).

Lane, M. N. Pricing risk transfer transactions. ASTIN Bull. J. IAA 30 (2), 259–293 (2000).

Eling, M. Fitting insurance claims to skewed distributions: Are the skew-normal and skew-student good models?. Insur. Math. Econom. 51 , 239–248. (2012).

Klugman, S. A., Panjer, H. H. & Willmot, G. E. Loss Models: From Data to Decisions Vol. 715 (Wiley, 2012).

Punzo, A., Mazza, A. & Maruotti, A. Fitting insurance and economic data with outliers: A flexible approach based on finite mixtures of contaminated gamma distributions. J. Appl. Stat. 45 (14), 2563–2584 (2018).

Punzo, A. A new look at the inverse Gaussian distribution with applications to insurance and economic data. J. Appl. Stat. 46 (7), 1260–1287 (2019).

Tomarchio, S. D. & Punzo, A. Dichotomous unimodal compound models: Application to the distribution of insurance losses. J. Appl. Stat. 47 (13–15), 2328–2353. (2020).

Article   MathSciNet   PubMed   PubMed Central   Google Scholar  

Guillen, M., Prieto, F. & Sarabia, J. M. Modelling losses and locating the tail with the Pareto positive stable distribution. Insur. Math. Econom. 49 (3), 454–461 (2011).

Scollnik, D. P. M. & Sun, C. Modeling with Weibull–Pareto models. N. Am. Actuar. J. 16 (2), 260–272 (2012).

Pernagallo, G. & Torrisi, B. An empirical analysis on the degree of gaussianity and long memory of financial returns in emerging economies. Phys. A Stat. Mech. Appl. 527 , 121296. (2019).

Brazauskas, V. & Kleefeld, A. Robust and efficient fitting of the generalized pareto distribution with actuarial applications in view. Insur. Math. Econom. 45 (3)), 424–435 (2009).

Miljkovic, T. & Grün, B. Modeling loss data using mixtures of distributions. Insur. Math. Econom. 70 , 387–396 (2016).

Blostein, M. & Miljkovic, T. On modeling left-truncated loss data using mixtures of distributions. Insur. Math. Econom. 85 , 35–46 (2019).

Mazza, A. & Punzo, A. DBKGrad: An R package for mortality rates graduation by discrete beta kernel techniques. J. Stat. Softw. 57 (Code Snippet 2), 1–18 (2014).

Mazza, A. & Punzo, A. Bivariate discrete beta kernel graduation of mortality data. Lifetime Data Anal. 21 (3), 419–433 (2015).

Article   MathSciNet   PubMed   Google Scholar  

Rousseeuw, P., Daniels, B. & Leroy, A. Applying robust regression to insurance. Insur. Math. Econom. 3 (1), 67–72 (1984).

Hill, R. C., Griffiths, W. E. & Lim, G. C. Principles of Econometrics (Wiley, 2018) ( ISBN 9781119342854 ).

Google Scholar  

Fuzi, M. F., Jemain, A. A. & Ismail, N. Bayesian quantile regression model for claim count data. Insur. Math. Econ. 66 , 124–137 (2016).

Rigby, R. A., Stasinopoulos, M. D. & Voudouris, V. Discussion: A comparison of GAMLSS with quantile regression. Stat. Model. 13 (4), 335–348 (2013).

Frees, E. W. Regression Modeling with Actuarial and Financial Applications. International Series on Actuarial Science (Cambridge University Press, 2010).

De Jong, P. & Heller, G. Z. Generalized Linear Models for Insurance Data (Cambridge Books, 2008).

Book   Google Scholar  

Stasinopoulos, M., Enea, M., & Rigby, R. A. Zero adjusted distributions on the positive real line. (2017a). .

Rigby, R. A. & Stasinopoulos, M. D. Generalized additive models for location, scale and shape. J. R. Stat. Soc. Ser. C (Appl. Stat.) 54 (3), 507–554 (2005).

Hastie, T. J. & Tibshirani, R. J. Generalized Additive Models (CRC Press, 2017) ( ISBN 9781351445962 ).

Enea, M., Stasinopoulos, M., Rigby, B., & Hossain, A. gamlss.inf : Fitting Mixed (Inflated and Adjusted) Distributions (2019). . Accessed 12 Mar 2019.

Stasinopoulos, M. D. & Rigby, R. A. Generalized additive models for location scale and shape (gamlss) in R . J. Stat. Softw. 23 (7), 1–46. (2007).

Stasinopoulos, M. D., Rigby, R. A., Heller, G. Z., Voudouris, V. & De Bastiani, F. Flexible Regression and Smoothing: Using GAMLSS in R (CRC Press, 2017).

Chris Jones, M. & Faddy, M. J. A skew extension of the \(t\) -distribution, with applications. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 65 (1), 159–174 (2003).

Tomarchio, S. D. & Punzo, A. Modelling the loss given default distribution via a family of zero-and-one inflated mixture models. J. R. Stat. Soc. A. Stat. Soc. 182 (4), 1247–1266 (2019).

Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 19 (6), 716–723 (1974).

Article   MathSciNet   ADS   Google Scholar  

Schwarz, G. Estimating the dimension of a model. Ann. Stat. 6 (2), 461–464 (1978).

Pernagallo, G. An entropy-based measure of correlation for time series. Inf. Sci. 643 , 119272. (2023).

Rigby, R. A., Stasinopoulos, M. D., Heller, G. Z. & De Bastiani, F. Distributions for Modeling Location, Scale, and Shape: Using GAMLSS in R . Chapman & Hall/CRC The R Series (CRC Press, 2019) ( ISBN 9781000699968 ).

Bagnato, L., De Capitani, L. & Punzo, A. Testing serial independence via density-based measures of divergence. Methodol. Comput. Appl. Probab. 16 (3), 627–641 (2014).

Download references


The authors are grateful for the comments made by the three anonymous Reviewers and the Editor.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and affiliations.

Department of Economics and Statistics “Cognetti de Martiis”, University of Turin, Lungo Dora Siena, 100A, 10153, Turin, Italy

Giuseppe Pernagallo

Department of Economics and Business, University of Catania, Corso Italia 55, 95129, Catania, Italy

Antonio Punzo & Benedetto Torrisi

You can also search for this author in PubMed   Google Scholar


All the authors contributed to all sections. The programming codes were written in R by G.P. and A.P.

Corresponding author

Correspondence to Antonio Punzo .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary information., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit .

Reprints and permissions

About this article

Cite this article.

Pernagallo, G., Punzo, A. & Torrisi, B. Women and insurance pricing policies: a gender-based analysis with GAMLSS on two actuarial datasets. Sci Rep 14 , 3239 (2024).

Download citation

Received : 24 May 2023

Accepted : 25 January 2024

Published : 08 February 2024


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

research paper car insurance

Insurance 2030—The impact of AI on the future of insurance

Welcome to the future of insurance, as seen through the eyes of Scott, a customer in the year 2030. His digital personal assistant orders him a a vehicle with self-driving capabilities for a meeting across town. Upon hopping into the arriving car, Scott decides he wants to drive today and moves the car into “active” mode. Scott’s personal assistant maps out a potential route and shares it with his mobility insurer, which immediately responds with an alternate route that has a much lower likelihood of accidents and auto damage as well as the calculated adjustment to his monthly premium. Scott’s assistant notifies him that his mobility insurance premium will increase by 4 to 8 percent based on the route he selects and the volume and distribution of other cars on the road. It also alerts him that his life insurance policy, which is now priced on a “pay-as-you-live” basis, will increase by 2 percent for this quarter. The additional amounts are automatically debited from his bank account.

When Scott pulls into his destination’s parking lot, his car bumps into one of several parking signs. As soon as the car stops moving, its internal diagnostics determine the extent of the damage. His personal assistant instructs him to take three pictures of the front right bumper area and two of the surroundings. By the time Scott gets back to the driver’s seat, the screen on the dash informs him of the damage, confirms the claim has been approved, and reports that a mobile response drone has been dispatched to the lot for inspection. If the vehicle is drivable, it may be directed to the nearest in-network garage for repair after a replacement vehicle arrives.

While this scenario may seem beyond the horizon, such integrated user stories will emerge across all lines of insurance with increasing frequency over the next decade. In fact, all the technologies required above already exist, and many are available to consumers. With the new wave of deep learning techniques, such as convolutional neural networks, 1 Convolutional neural networks contain millions of simulated “neurons” structured in layers. artificial intelligence (AI) has the potential to live up to its promise of mimicking the perception, reasoning, learning, and problem solving of the human mind (Exhibit 1). In this evolution, insurance will shift from its current state of “detect and repair” to “predict and prevent,” transforming every aspect of the industry in the process. The pace of change will also accelerate as brokers, consumers, financial intermediaries, insurers, and suppliers become more adept at using advanced technologies to enhance decision making and productivity, lower costs, and optimize the customer experience.

As AI becomes more deeply integrated in the industry, carriers must position themselves to respond to the changing business landscape. Insurance executives must understand the factors that will contribute to this change and how AI will reshape claims, distribution, and underwriting and pricing. With this understanding, they can start to build the skills and talent, embrace the emerging technologies, and create the culture and perspective needed to be successful players in the insurance industry of the future.

Four AI-related trends shaping insurance

AI’s underlying technologies are already being deployed in our businesses, homes, and vehicles, as well as on our person. The disruption from COVID-19 changed the timelines for the adoption of AI by significantly accelerating digitization for insurers. Virtually overnight, organizations had to adjust to accommodate remote workforces, expand their digital capabilities to support distribution, and upgrade their online channels. While most organizations likely didn't invest heavily in AI during the pandemic, the increased emphasis on digital technologies and a greater willingness to embrace change will put them in a better position to incorporate AI into their operations.

Four core technology trends, tightly coupled with (and sometimes enabled by) AI, will reshape the insurance industry over the next decade.

Explosion of data from connected devices

In industrial settings, equipment with sensors have been omnipresent for some time, but the coming years will see a huge increase in the number of connected consumer devices. The penetration of existing devices (such as cars, fitness trackers, home assistants, smartphones, and smart watches) will continue to increase rapidly, joined by new, growing categories such as clothing, eyewear, home appliances, medical devices, and shoes. Experts estimate there will be up to one trillion connected devices by 2025. 2 World Economic Forum, 2015. The resulting avalanche of new data created by these devices will allow carriers to understand their clients more deeply, resulting in new product categories, more personalized pricing, and increasingly real-time service delivery.

Experts estimate there will be up to one trillion connected devices by 2025.

Increased prevalence of physical robotics

The field of robotics has seen many exciting achievements recently, and this innovation will continue to change how humans interact with the world around them. Additive manufacturing, also known as 3-D printing, will radically reshape manufacturing and the commercial insurance products of the future. By 2025, 3-D-printed buildings will be common, and carriers will need to assess how this development changes risk assessments. In addition, programmable, autonomous drones; autonomous farming equipment; and enhanced surgical robots will all be commercially viable in the next decade. By 2030, a much larger proportion of standard vehicles will have autonomous features, such as self-driving capabilities. Carriers will need to understand how the increasing presence of robotics in everyday life and across industries will shift risk pools, change customer expectations, and enable new products and channels.

Open-source and data ecosystems

As data becomes ubiquitous, open-source protocols will emerge to ensure data can be shared and used across industries. Various public and private entities will come together to create ecosystems in order to share data for multiple use cases under a common regulatory and cybersecurity framework. For example, wearable data could be ported directly to insurance carriers, and connected-home and auto data could be made available through Amazon, Apple, Google, and a variety of consumer device manufacturers.

Advances in cognitive technologies

Convolutional neural networks and other deep learning technologies currently used primarily for image, voice, and unstructured text processing will evolve to be applied in a wide variety of applications. These cognitive technologies, which are loosely based on the human brain’s ability to learn through decomposition and inference, will become the standard approach for processing the incredibly large and complex data streams that will be generated by “active” insurance products tied to an individual’s behavior and activities. With the increased commercialization of these types of technologies, carriers will have access to models that are constantly learning and adapting to the world around them—enabling new product categories and engagement techniques while responding to shifts in underlying risks or behaviors in real time.

The state of insurance in 2030

AI and its related technologies will have a seismic impact on all aspects of the insurance industry, from distribution to underwriting and pricing to claims. Advanced technologies and data are already affecting distribution and underwriting, with policies being priced, purchased, and bound in near real time. An in-depth examination at what insurance may look like in 2030 highlights dramatic changes across the insurance value chain.


The experience of purchasing insurance is faster , with less active involvement on the part of the insurer and the customer. Enough information is known about individual behavior, with AI algorithms creating risk profiles, so that cycle times for completing the purchase of an auto, commercial, or life policy will be reduced to minutes or even seconds. Auto and home carriers have enabled instant quotes for some time but will continue to refine their ability to issue policies immediately to a wider range of customers as telematics and in-home Internet of Things (IoT) devices proliferate and pricing algorithms mature. Many life carriers are experimenting with simplified issue products, but most are restricted to only the healthiest applicants and are priced higher than a comparable fully underwritten product. As AI permeates life underwriting and carriers are able to identify risk in a much more granular and sophisticated way, we will see a new wave of mass-market instant issue products.

Smart contracts enabled by blockchain instantaneously authorize payments from a customer’s financial account. Meanwhile, contract processing and payment verification are eliminated or streamlined, reducing customer acquisition costs for insurers. The purchase of commercial insurance is similarly expedited as the combination of drones, IoT, and other available data provides sufficient information for AI-based cognitive models to proactively generate a bindable quote.

Would you like to learn more about our Financial Services Practice?

Highly dynamic, usage-based insurance (UBI) products proliferate and are tailored to the behavior of individual consumers. Insurance transitions from a “purchase and annual renewal” model to a continuous cycle, as product offerings constantly adapt to an individual’s behavioral patterns. Furthermore, products are disaggregated substantially into microcoverage elements (for example, phone battery insurance, flight delay insurance, different coverage for a washer and dryer within the home) that consumers can customize to their particular needs, with the ability to instantaneously compare prices from various carriers for their individualized baskets of insurance products. New products emerge to cover the shifting nature of living arrangements and travel. UBI becomes the norm as physical assets are shared across multiple parties, with a pay-by-mile or pay-by-ride model for car sharing and pay-by-stay insurance for home-sharing services, such as Airbnb. 3 Some insurtech companies are already designing these types of products; Slice, for example, provides variable commercial insurance specifically tailored for home sharing.

The role of insurance agents has changed dramatically by 2030. The number of agents is reduced substantially as active agents retire and remaining agents rely heavily on technology to increase productivity. The role of agents transitions to process facilitators and product educators. The agent of the future can sell nearly all types of coverage and adds value by helping clients manage their portfolios of coverage across experiences, health, life, mobility, personal property, and residential. Agents use smart personal assistants to optimize their tasks as well as AI-enabled bots to find potential deals for clients. These tools help agents to support a substantially larger client base while making customer interactions (a mix of in-person, virtual, and digital) shorter and more meaningful, given that each interaction will be tailored to the exact current and future needs of each individual client.

Underwriting and pricing

In 2030, underwriting as we know it today ceases to exist for most personal and small-business products across life and property and casualty insurance. The process of underwriting is reduced to a few seconds as the majority of underwriting is automated and supported by a combination of machine and deep learning models built within the technology stack. These models are powered by internal data as well as a broad set of external data accessed through application programming interfaces and outside data and analytics providers. Information collected from devices provided by mainline carriers, reinsurers, product manufacturers, and product distributors is aggregated in a variety of data repositories and data streams. These information sources enable insurers to make ex ante decisions regarding underwriting and pricing, enabling proactive outreach with a bindable quote for a product bundle tailored to the buyer’s risk profile and coverage needs.

Regulators review AI-enabled, machine learning–based models, a task that requires a transparent method for determining traceability of a score (similar to the rating factor derivations used today with regression-based coefficients). To verify that data usage is appropriate for marketing and underwriting, regulators assess a combination of model inputs. They also develop test policies for providers when determining rates in online plans to ensure the algorithm results are within approved bounds. Public policy considerations limit access to certain sensitive and predictive data (such as health and genetic information) that would decrease underwriting and pricing flexibility and increase antiselection risk in some segments.

Price remains central in consumer decision making, but carriers innovate to diminish competition purely on price. Sophisticated proprietary platforms connect customers and insurers and offer customers differentiated experiences, features, and value. In some segments, price competition intensifies, and razor-thin margins are the norm, while in other segments, unique insurance offerings enable margin expansion and differentiation. In jurisdictions where change is embraced, the pace of pricing innovation is rapid. Pricing is available in real time based on usage and a dynamic, data-rich assessment of risk, empowering consumers to make decisions about how their actions influence coverage, insurability, and pricing.

Claims processing in 2030 remains a primary function of carriers , but more than half of claims activities have been replaced by automation. Advanced algorithms handle initial claims routing, increasing efficiency and accuracy.

IoT sensors and an array of data-capture technologies, such as drones, largely replace traditional, manual methods of first notice of loss. Claims triage and repair services are often triggered automatically upon loss. In the case of an auto accident, for example, a policyholder takes streaming video of the damage, which is translated into loss descriptions and estimate amounts. Vehicles with autonomous features that sustain minor damage direct themselves to repair shops for service while another car with autonomous features is dispatched in the interim. In the home, IoT devices will be increasingly used to proactively monitor water levels, temperature, and other key risk factors and will proactively alert both tenants and insurers of issues before they arise.

Automated customer service apps handle most policyholder interactions through voice and text, directly following self-learning scripts that interface with the claims, fraud, medical service, policy, and repair systems. The turnaround time for resolution of many claims is measured in minutes rather than days or weeks. Human claims management focuses on a few areas: complex and unusual claims, contested claims where human interaction and negotiation are empowered by analytics and data-driven insights, claims linked to systemic issues and risks created by new technology (for example, hackers infiltrate critical IoT systems), and random manual reviews of claims to ensure sufficient oversight of algorithmic decision making.

Claims organizations increase their focus on risk monitoring, prevention, and mitigation. IoT and new data sources are used to monitor risk and trigger interventions when factors exceed AI-defined thresholds. Customer interaction with insurance claims organizations focuses on avoiding potential loss. Individuals receive real-time alerts that may be linked with automatic interventions for inspection, maintenance, and repair. For large-scale catastrophe claims, insurers monitor homes and vehicles in real time using integrated IoT, telematics, and mobile phone data, assuming mobile phone service and power haven’t been disrupted in the area. When power goes out, insurers can prefile claims by using data aggregators, which consolidate data from satellites, networked drones, weather services, and policyholder data in real time. This system is pretested by the largest carriers across multiple catastrophe types, so highly accurate loss estimations are reliably filed in a real emergency. Detailed reports are automatically provided to reinsurers for faster reinsurance capital flow.

How insurers can prepare for accelerating changes

The rapid evolution of the industry will be fueled by the extensive adoption and integration of automation, deep learning, and external data ecosystems. While no one can predict exactly what insurance might look like in 2030, carriers can take several steps now to prepare for change.

1. Get smart on AI-related technologies and trends

Although the tectonic shifts in the industry will be tech-focused, addressing them is not the domain of the IT team. Instead, board members and customer-experience teams should invest the time and resources to build a deep understanding of these AI-related technologies. Part of this effort will require exploring hypothesis-driven scenarios in order to understand and highlight where and when disruption might occur—and what it means for certain business lines. For example, insurers are unlikely to gain much insights from limited-scale IoT pilot projects in discrete parts of the business. Instead, they must proceed with purpose and an understanding of how their organization might participate in the IoT ecosystem at scale. Pilots and proof-of-concept (POC) projects should be designed to test not just how a technology works but also how successful the carrier might be operating in a particular role within a data- or IoT-based ecosystem.

Future of insurance: Unleashing growth through new business building

Future of insurance: Unleashing growth through new business building

2. develop and begin implementation of a coherent strategic plan.

Building on the insights from AI explorations, carriers must decide how to use technology to support their business strategy. The senior leadership team’s long-term strategic plan will require a multiyear transformation that touches operations, talent, and technology. Some carriers are already beginning to take innovative approaches such as starting their own venture-capital arms, acquiring promising insurtech companies, and forging partnerships with leading academic institutions. Insurers should develop a perspective on areas they want to invest in to meet or beat the market and what strategic approach—for example, forming a new entity or building in-house strategic capabilities—is best suited for their organization.

This plan should address all four dimensions involved in any large-scale, analytics-based initiative—everything from data to people to culture (Exhibit 2). The plan should outline a road map of AI-based pilots and POCs and detail which parts of the organization will require investments in skill building or focused change management. Most important, a detailed schedule of milestones and checkpoints is essential to allow the organization to determine, on a regular basis, how the plan should be modified to address any shifts in the evolution of AI technologies and significant changes or disruptions within the industry.

In addition to being able to understand and implement AI technologies, carriers also need to develop strategic responses to coming macrolevel changes. As many lines shift toward a “predict and prevent” methodology, carriers will need to rethink their customer engagement and branding, product design, and core earnings. Auto accidents will be reduced through use of vehicles with self-driving capabilities, in-home flooding will be prevented by IoT devices, buildings will be reprinted after a natural disaster, and lives will be saved and extended by improved healthcare. Likewise, vehicles will still break down, natural disasters will continue to devastate coastal regions, and individuals will require effective medical care and support when a loved one passes. As these changes take root, profit pools will shift, new types and lines of products will emerge, and how consumers interact with their insurers will change substantially.

All of these efforts can produce a coherent analytics and technology strategy that addresses all aspects of the business, with a keen eye on both value creation and differentiation.

3. Create and execute a comprehensive data strategy

Data is fast becoming one of the most—if not the most—valuable asset for any organization. The insurance industry is no different: how carriers identify, quantify, place, and manage risk is all predicated on the volume and quality of data they acquire during a policy’s life cycle. Most AI technologies will perform best when they have a high volume of data from a variety of sources. As such, carriers must develop a well-structured and actionable strategy with regard to both internal and external data. Internal data will need to be organized in ways that enable and support the agile development of new analytics insights and capabilities. With external data, carriers must focus on securing access to data that enriches and complements their internal data sets. The real challenge will be gaining access in a cost-efficient way. As the external data ecosystem continues to expand, it will likely remain highly fragmented, making it quite difficult to identify high-quality data at a reasonable cost. Overall, data strategy will need to include a variety of ways to obtain and secure access to external data, as well as ways to combine this data with internal sources. Carriers should be prepared to have a multifaceted procurement strategy that could include the direct acquisition of data assets and providers, licensing of data sources, use of data APIs, and partnerships with data brokers.

4. Create the right talent and technology infrastructure

In augmented chess, average players enabled by AI tend to do better than expert chess players enabled by the same AI. The underlying reason for this counterintuitive outcome depends on whether the individual interacting with AI embraces, trusts, and understands the supporting technology. To ensure that every part of the organization views advanced analytics as a must-have capability, carriers must make measured but sustained investments in people. The insurance organization of the future will require talent with the right mindsets and skills . The next generation of successful frontline insurance workers will be in increasingly high demand and must possess a unique mix of being technologically adept, creative, and willing to work at something that will not be a static process but rather a mix of semiautomated and machine-supported tasks that continually evolve. Generating value from the AI use cases of the future will require carriers to integrate skills, technology, and insights from around the organization to deliver unique, holistic customer experiences. Doing so will require a conscious culture shift for most carriers that will rely on buy-in and leadership from the executive suite. Developing an aggressive strategy to attract, cultivate, and retain a variety of workers with critical skill sets will be essential to keep pace. These roles will include data engineers, data scientists, technologists, cloud computing specialists, and experience designers. To retain knowledge while also ensuring the business has the new skills and capabilities necessary to compete, many organizations will design and implement reskilling programs. As a last component of developing the new workforce, organizations will identify external resources and partners to augment in-house capabilities that will help carriers secure the needed support for business evolution and execution. The IT architecture of the future will also be radically different from today’s. Carriers should start making targeted investments to enable the migration to a more future-forward technology stack that can support a two-speed IT architecture .

Rapid advances in technologies in the next decade will lead to disruptive changes in the insurance industry. The winners in AI-based insurance will be carriers that use new technologies to create innovative products, harness cognitive learning insights from new data sources, streamline processes and lower costs, and exceed customer expectations for individualization and dynamic adaptation. Most important, carriers that adopt a mindset focused on creating opportunities from disruptive technologies—instead of viewing them as a threat to their current business—will thrive in the insurance industry in 2030.

Ramnath Balasubramanian and Ari Libarikian are senior partners in McKinsey’s New York office, and Doug McElhaney is a partner in the Washington, DC, office.

The authors would like to acknowledge the contributions of Gijs Biermans, Bayard Gennert, Nick Milinkovich, and Erik Summers.

Explore a career with us

Related articles.

Future of insurance: Unleashing growth through new business building

The future of life insurance: Reimagining the industry for the decade ahead

How US insurers can build a winning digital workforce for the future

How US insurers can build a winning digital workforce for the future

research paper car insurance


To isolate the factors that can raise—or lower—your insurance rates, we took more than 2 billion price quotes and reverse-engineered them

Published: July 30, 2015

research paper car insurance

Here’s the Math

Price me by how I drive, not by who you think I am!

The price of car insurance should be based on how well and how much we drive. Instead, companies charge based on credit history, shopping behavior and more. Your state's insurance commissioner can do something about that.

Sign our petition.

  • 50 State Insurance Commissioners, Members: National Association of Insurance Commissioners (NAIC)

Thank you for signing our petition! Now turn up the heat!

Take one more moment right now to demand immediate action for fair car insurance. tweet to @naic_news or use our free 800 line to call your own state's insurance commissioner. the nation's insurance commissioners set standards for auto insurance and make recommendations to state lawmakers. if they say we need to base rates on how well and how far we drive, not our credit score, our shopping history or our gender, then state lawmakers will take notice and companies will begin to change., call 1-855-384-6331, #fixcarinsurance @naic_news, sharing is nice.

We respect your privacy . All email addresses you provide will be used just for sending this story.

  • Survey Paper
  • Open access
  • Published: 25 September 2019

A survey on driving behavior analysis in usage based insurance using big data

  • Subramanian Arumugam 1 &
  • R. Bhargavi 1  

Journal of Big Data volume  6 , Article number:  86 ( 2019 ) Cite this article

28k Accesses

60 Citations

10 Altmetric

Metrics details

The emergence and growth of connected technologies and the adaptation of big data are changing the face of all industries. In the insurance industry, Usage-Based Insurance (UBI) is the most popular use case of big data adaptation. Initially UBI is started as a simple unitary Pay-As-You-Drive (PAYD) model in which the classification of good and bad drivers is an unresolved task. PAYD is progressed towards Pay-How-You-Drive (PHYD) model in which the premium is charged for the personal auto insurance depending on the post-trip analysis. Providing proactive alerts to guide the driver during the trip is the drawback of the PHYD model. PHYD model is further progressed towards Manage-How-You-Drive (MHYD) model in which the proactive engagement in the form of alerts is provided to the drivers while they drive. The evolution of PAYD, PHYD and MHYD models serve as the building blocks of UBI and facilitates the insurance industry to bridge the gap between insurer and the customer with the introduction of MHYD model. Increasing number of insurers are starting to launch PHYD or MHYD models all over the world and widespread customer adaptation is seen to improve the driver safety by monitoring the driving behavior. Consequently, the data flow between an insurer and their customers is increasing exponentially, which makes the need for big data adaptation, a foundational brick in the technology landscape of insurers. The focus of this paper is to perform a detailed survey about the categories of MHYD. The survey results in the need to address the aggressive driving behavior and road rage incidents of the drivers during short-term and long-term driving. The exhaustive survey is also used to propose a solution that finds the risk posed by aggressive driving and road rage incidents by considering the behavioral and emotional factors of a driver. The outcome of this research would help the insurance industries to assess the driving risk more accurately and to propose a solution to calculate the personalized premium based on the driving behavior with most importance towards prevention of risk.


An accident is defined as an unfortunate incident that happens unexpectedly and unintentionally, typically resulting in damage or injury. Considering all the consequences that could eventuate after an accident, there are reasons to believe that a normal person does not drive with an ex-ante intention to cause an accident. Holding a valid driving license is pre-requisite to drive in any part of the world and during the licensing process, people are educated about the driving rules and safety measures to be followed. Notwithstanding all these, accidents happen and surprisingly, human factor is attributed to be the foremost reason causing the accidents. Reasons such as distraction, drunkenness, speeding, running red lights and stop signs, recklessness, road rage, aggressiveness and drowsiness are ranked among the topmost human factors.

The transport with self-driving vehicles promises to cut down the human factor, but the common usage of them are many decades away. Till then, the total number of passenger car registrations will continue to rise. The growing trend in the number of passenger car registrations across the world is shown in Table  1 .

The numbers of car registrations are around 800 million in the year 2011 and it has burgeoned to around one billion in the year 2018. It is estimated that this number might cross around 2 billion by the year 2050 [ 1 ]. With more vehicles congesting the roads, the probabilities of accidents are also increasing proportionately. The statistics of road traffic crash based on the geographical location is mentioned in Fig.  1 . For example, road traffic fatalities per 1,00,000 population is 10.3 for Europe, 18.5 for South East Asia and 24.1 for Africa.

figure 1

Road traffic crash

Worldwide mortality as a result of road accidents from 2013 to 2018 [ 2 ] is shown in Fig.  2 . However, this does not reflect the real picture of all the accidents as the deaths of pedestrians and cyclists due to the accidents are not included in the statistics. If those numbers are also included, then the overall mortality figures would be higher. 2018 statistics of World Health Organization [ 3 ] points out that

figure 2

Number of deaths year wise in road accidents

The number of people died in the road crashes each year is reported to be around 1.3 million with an average of 3287 deaths a day and 20–50 million people injured or disabled every year.

More than half of all road traffic deaths occurred among young adults aged 15–44.

Road traffic crashes were ranked as the 9th leading cause of death and accounts for 2.2% of all deaths globally.

Unless some remedial action is initiated, road traffic injuries would likely to become the fifth leading cause of death by 2030.

The causal reasons for accidents are classified into three categories: Bad weather or bad infrastructure (rain, potholes on the road), vehicle malfunctioning (manufacturing defects or wear and tear) or human factors (physiological or behavioral). While the physiological mistakes are happening due to driver fatigue, drowsiness, behavioral mistakes could take many forms such as distracted driving, drunk driving, aggressive driving, road rage, hard acceleration, hard braking and cornering and speeding. Aggressive driving and road rage are a priori behaviors that are potentially leading to fatal or non-fatal road accidents, incidents of physical violence and even murders.

Aggressive driving involves driving the motor vehicle in an unsafe and hostile manner without regards for others which includes unsafe behavior in road such as making frequent or unsafe lane changes, running red lights and stop signs, wrong-way driving, improper turns, tailgating, disrespecting traffic controls.

Road rage is an angry driving behavior exhibited by the driver, which includes making rude gestures, making physical and verbal threats, and exhibiting dangerous driving methods targeted towards another driver in an effort to intimidate or release frustration.

The increase in road accidents proportionately with increases with the frequency of insurance claims made by policyholders. The primary reason for insurers to introduce UBI is to bring in some realistic and correct measurability to ascertain the risk where the customers are exposed to and charge a risk-based premium suggested by an actuary. The premium charging method implies that policyholders who exhibited higher risk during driving need to pay a higher premium. The PAYD model follows a simple premise that a policyholder who drives the vehicle for more miles during a year exposes the vehicle to more hours of on-road risk and consequently the risk of an accident is more than that of a policyholder who drives for lesser miles during a year. The milometer readings are studied to factor in the risk exposure and the motor insurance premium.

The growth of connected technologies leads to matured UBI solutions and well-developed telematics solutions that closely assess driving behavior patterns and factors them for pricing each customer in a personalized way [ 4 ]. Over the years, the method of driver behavior data collection, the parameters collected and the frequency of collection are changing. The insurance companies are receiving a large volume of driver behavior data as a result of data collection. Four different ways are suggested and listed for data collection from the customers:

Black box: An electronic device installed in car to record information related to vehicle crashes or accidents and has one-way outward interaction after crash.

Dongle: An electronic device that allows a server to access the vehicle network. The insurers will install the device into the vehicle and will have only one-way interaction.

Embedded: Car manufacturers provide embedded telematics equipment for vehicles such as remote diagnostics device, navigation sensors and infotainment services.

Smartphones: Smartphone based solutions for telematics is the latest addition to the existing methods. Smartphones work either as stand-alone device or gets linked to information system of vehicle to transmit a variety of information from the car. Smartphones could be used in the following means to gather data:

Smartphone sensors: In-built sensors such as accelerometers, gyroscopes, and magnetometers are used. The advantages of the in-built sensor usage are the less cost and less complexity of implementation to obtain the behavioral events such as hard acceleration and hard braking. However, the drawbacks of the system are the maintenance of stable position for the smart phones and identifying the hard cornering event. Table  2 shows the sample of research papers where in-built smartphone sensors are used.

Global Positioning System (GPS) Data: With the help of GPS signal, values of speed, latitude, longitude, course and altitude are retrieved. Driver behavior events such as hard acceleration, hard braking, hard cornering and over speeding are detected from these values by using big data technologies and different algorithms. The advantages are the possibility of unstable location of smart phones and identification of the hard cornering events. The drawbacks are the complex implementation and high cost of computation. Table  3 shows the sample of industry solutions where GPS data are used.

Most of the research papers discuss the use of both GPS and in-built sensors to collect the driving data. In order to avoid the limitations of using in-built sensors, the plan is to use GPS to collect the data from drivers in the proposed solution. Immediately after a driver starts a journey, the server will start receiving the various attributes of the behavior data in regular intervals. The data from all the drivers at any given moment will be so large that none of the traditional data management tools will be able to store or process it efficiently. Big data technology makes it possible to handle this data deluge comprising huge volumes, high velocity, and veracity. While the existing telematics solutions are focused more on driver aberrations such as over speeding, hard acceleration, and hard braking, this detailed survey on MHYD found that there is a need for a solution to detect aggressive and road rage drivers while driving the car.

The proposed solution is used to identify the aberrations using big data and machine learning technologies. The outcome of this research is to provide alerts to drivers to improve safety during inferred incidents of aggressive driving and road rage and calculate personalized motor insurance premiums.

The rest of the paper is organized as follows: “ Introduction ” section provides introduction to the insurance and big data technologies. “ Background and overview ” section provides a brief introduction to big data and insurance, types of insurance, UBI and different types of UBI are presented. “ Related works ” section outlines introduction to models of UBI and the classifications of MHYD such as driving pattern monitoring, fatigue monitoring, drowsiness detection, and driver distraction are presented. “ Proposed solution for personalized premium calculation ” section proposes a solution to detect the driving behavior and the benefits are highlighted. “ Conclusion and future research directions ” section offers the conclusions and future directions.

Background and overview

Big data is generally characterized by the “three V’s” principle put forward by Laney in 2001: increasingly huge volume of data, variety of data that includes raw, unstructured and semi-structured data, and velocity of the data that denotes the fact that these data are produced, harvested and analyzed in real-time. Gartner defines big data as “high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” Over the years, various other definitions have evolved making the number of V’s increase from three to ten. Figure  3 shows the ten characteristics of big data [ 5 ] introduced by Firican in 2017.

figure 3

Characteristics of big data

Volume: Representation of the size of data, which plays a crucial role in determining values out of data. Insurance companies receive huge volume data on every second from different drivers, which helps to accurately predict the driving behavior.

Velocity: Representation of the speed of data generation from various insurance customers. Driving data is created, saved, analyzed and visualized at an increasing speed, making it possible to predict the driving behavior and visualize high volumes of data in the real driving environment.

Variety: Representation of the heterogeneous data formats and the nature of data, both structured and unstructured data. Insurance companies managed the driving data by using spreadsheets and traditional databases during earlier days. Insurance industry is facing challenges with regard to storage, mining and analyzing driving data at present.

Veracity: Representation of quality or trustworthiness of the data. The data collected by the insurance company should be accurate to generate value and also helps the enterprise to predict the driving behavior accurately.

Value: Representation of business value from the data. The insights gleaned from big data can help insurance companies to derive values towards customer engagement, optimize operations, prevent threats and fraud, and capitalize on new sources of revenue.

Visualization: Representation of data in pictorial form to visualize the huge volume, velocity, variety of data. The insights or the data gained by the insurance company are shared to the insurance customers.

Variability: Representation of data in multiple ways. The challenges of big data for insurance companies not only arise from the sheer volume of data but also from the fact that the data are generated in multiple forms as a mix of unstructured and structured data.

Validity: Representation of accuracy and correctness of data. The personalized premium generated by the insurance companies should be accurate.

Vulnerability: Representation of the frailty that makes the possible threat to become an attack. The changes made in the big data by insurance company should not harm the system or the premium calculation engine.

Volatility: Representation of availability of data. Due to the velocity and the volume of big data in the insurance company, its volatility needs to be carefully considered. Need to establish rules for data by insurance company to ensure rapid retrieval of insurance customer information when required.

The growth of technologies around creating, transmitting, storing, and analyzing the data have made giant strides in recent years. The manifestation of fourth industrial revolution is reflecting in an exponential growth in the volumes of available data, which leads to significant improvement in the computational power, storage, and development of sophisticated algorithms to glean insights. Industries from all verticals including insurance are making huge investments towards research and innovation with respect to big data processing and analytics. Figure  4 shows the increasing investments in big data from 2012 to 2018. Insurers all over the world are planning to increase their investments towards big data over the next 3 to 5 years [ 6 ].

figure 4

Big data investments between 2012 and 2018

Consequent to the impact of big data and the insight it provides using beautiful visualizations, companies are making changes in their core business propositions, as well as products and services. IBM [ 7 ] has surveyed on the latest trends in big data and found that 74 percent of insurance companies are having a yearly report which highlights the use of big data processing and analytics for creating a competitive advantage over other industries. Study done by Madan [ 8 ] reveals almost 2.5 quintillion bytes of data are created each day by various sources.

The concept of insurance revolves around predicting future risk events, calculating the losses that could arise out of them and the premium that they need charge for the underlying risk transfer. The types of insurance can be broadly classified as life or property and casualty (non-life) insurance depending on the nature of the subject of insurance covered. Based on the type of the customer, whether an individual or company, they can be further categorized as personal or commercial lines. Insurers are always depending on several types of data from a multiple source to infer their causal and correlative association with risk events.

The advances in big data and analytics are now transforming the insurance industry. They can leverage big data for identifying more accurate causal associations. In motor insurance, insurers have traditionally looked at the customer profile, historical claims and information from public records such as traffic police fines or violations for insights to classify policyholders as good or bad drivers. UBI is a recent innovation in auto insurance that originated because of the explosion of telematics data. Instead of depending on the proxy and spurious correlations, insurers can now know how a person exactly drives a vehicle and calculate the premium accurately. Initially UBI is started with personal lines catering to individual customers and is now expanding to commercial lines where it is applied to commercial fleet insurance.

UBI can be categorized into the following three types

Pay-As-You-Drive (PAYD): The premium is calculated based on the number of miles driven based on the milometer readings. The only parameter considered is milometer reading and there is no distinction between good and bad drivers resulting in both of them paying the same premium.

Pay-How-You-Drive (PHYD): The premium is calculated based on the customer’s driving pattern such as over speed, hard acceleration, hard braking, and hard cornering, etc. Driver score is arrived at the end of each and every trip for the particular driver. The individual scores are normalized for the policy year to arrive at the overall driver behavior to calculate the premium and discount.

Manage-How-You-Drive (MHYD): The premium is calculated based on the same way as PHYD in addition to the real-time alerts and suggestions to the driver for ensuring safety.

Over the years PAYD and PHYD models have reasonably stabilized with the emergence of a few dominant designs, however MHYD is still evolving and more research is in progress as mentioned in Fig.  5 . A thoroughgoing survey is presented in the further sections specifically on four categories (pattern monitoring, fatigue monitoring, and drowsiness detection and driver distraction) of MHYD. Also, a solution is proposed to calculate the personalized premium using the driving behavior with MHYD under UBI. The focus of this paper is only on the MHYD model.

figure 5

Classification of UBI

Related works

Recent studies have proved that customer satisfaction is more in the case of good drivers who participate in UBI programs [ 9 ]. UBI programs reward the best drivers and help insurance companies not only to classify and price them appropriately but also to engage with them to build better relationships. In this section, a review of the existing works on MHYD is presented and the classification is shown in Table  4 .

The summary of monitoring approach for every classification of MHYD may vary from one another as shown in Fig.  6 . Driving pattern monitoring uses On-Board Diagnosis (OBD), Smartphone sensors or GPS to capture the data from the real driving environment.

figure 6

Summary of monitoring approaches

Driving pattern monitoring

The optimal way to prevent accidents is to monitor the driving pattern and alert the driver in case if there is any abnormal event. The policyholder could get additional discounts on the premium for good driving behavior. There are three classifications of driving pattern monitoring such as pattern monitoring using GPS, pattern monitoring using mobile phone sensors and pattern monitoring using OBD which are summarized in the further sub sections

Pattern monitoring using GPS

IBM Corporation [ 6 ] has developed geo-fencing services, which helps the teen and commercial drivers not to go beyond the defined boundary. If the driver goes beyond the boundary, then the vehicle displays an alert on the car’s dashboard screen and the designated contact also receives an alert. The parameters used are speed, hard braking, hard acceleration and hard cornering. In addition to the alerts, one more insurance capability called “Next generation First Notice of Loss” is also provided.

AllState (Leading US Insurance Company) [ 10 ] has introduced a solution called Drivewise, which is a way for drivers to get rewarded for everyday safe driving. The solution records the time and location of the vehicle during trips, the number of trips per day, the speed at which the vehicle is traveling, hard braking and mileage using big data technologies. After adaptation of Drivewise solution, Allstate has reported that the usual claims are reduced by 12% and the focus is only on PHYD. In the proposed solution, the plan is to focus on MHYD in addition to PHYD.

TD Insurance (Leading Canada Insurance Company) [ 11 ] has adapted a solution called TD My Advantage, which collects and analyzes driving data and assigns a driving score for each trip. The insurance premium is calculated based on the assigned score. The solution records speed, hard braking, acceleration, and cornering using big data technologies. Even though the solution records all the behavioral parameters, it considers only the speed parameter to alert the driver. In the proposed solution for personalized premium calculation, all the behavioral parameters will be used to alert abnormal drivers.

Progressive (Ranked one of the best insurance companies in the United States) [ 12 ] has implemented an UBI program called Snapshot. The program personalizes the insurance amount, based on the actual driving behavior. In fact, Snapshot rewards the average driver with a $130 discount. Plus, they get an automatic discount just for signing up. This program’s focus is only on the post-trip analysis and user rewards. In our proposed model, proactive alerts to guide the driver during the trip will be provided while the trip is in progress.

State Farm (Large group of Insurance and Financial Services companies in the United States) [ 13 ] has activated an UBI Program called “Drive Safe and Save”. The recorded data includes hard acceleration, hard braking, speeding and time of the day the vehicle is driven. The UBI solution focuses on PHYD and also provides some additional services like roadside assistance, maintenance alert and stolen vehicle locator. In the proposed research, an enhanced MHYD solution will be provided in addition to PHYD.

Nationwide (Leading US insurance company) [ 14 ] has presented a solution for the UBI program called SmartRide. It analyzes the collected data and gives personalized feedback to help the driver to drive safely besides providing discount up to 5% to 40% for the good drivers. The parameters used are only hard braking and hard acceleration. In this program, there are no real-time notifications and the drivers will get the feedback only after the trip is over. In our research, we plan to provide real-time notification for abnormal drivers during the trip when the abnormal event occurs.

Pattern monitoring using mobile phone sensors

Yu et al. [ 15 ] have proposed a system called “Fine-grained abnormal driving behavior detection and identification system, D3” to detect real-time high-accurate abnormal driving behavior. SVM and Neural Network algorithms are used to detect the abnormality. The authors collect 6-months driving traces from real driving environment. The parameters used are hard cornering and hard braking. D3 achieved an average total accuracy of 95.36 percent with SVM classifier model, and 96.88 percent with NN classifier model. To improve the accuracy, the plan is to use minimum 12 months of driving traces from the real environment in the proposed solution for personalized premium calculation.

Shi et al. [ 16 ] have employed a solution by normalizing driving behavior based on personalized driver model. Considering only the speed parameter, K-means clustering and neural network algorithms are used. The authors use only the simulated data to test the system and there is a lack of real-time driving data. The authors use simulated data for driving behavior detection. In the proposed solution the possibilities of using driving traces from the real environment are well explored.

Liu et al. [ 17 ] have designed a system called “Deep Sparse Auto Encoder (DSAE)” which extracts the hidden features for visualization using driving behavior visualization method called a driving color map that maps the extracted 3-D hidden feature to the red, green and blue color space. The generated colors do not tend to appear biased, e.g., reddish, bluish, or others. Visualization can yield different results on rotation in the color space even if it uses the same data. The parameters used are hard braking and hard cornering. Deep learning algorithms are used. In DSAE, the authors have considered hard braking and hard cornering. In the proposed research, the plan is to use hard acceleration, hard cornering, hard braking and speed with different machine learning algorithms.

Daptardar et al. [ 18 ] have experimented on a new technique by using Hidden Markov Model (HMM). This is to detect lateral maneuvers and Jerk Energy based technique to detect longitudinal maneuvers. The parameters used are hard acceleration and hard braking. Only android version is available, and the accuracy of the system is 95%. In the energy based HMM research, the authors have considered the behavioral factors alone. In the proposed solution for premium calculation, both the behavioral and emotional factors will be used with different machine learning algorithms to improve the accuracy.

Zhao et al. [ 19 ] have applied the PHYD model to Insurance. The UBI is introduced into personal motor insurance with premium assessment based upon the time of usage, distance driven and the driving behavior. UBI technology used the dongle, black box, embedded devices and smartphone. UBI offered cashback, premium discount, and value-added services. Cars with telematics device captured the driving behavior information. The data is analyzed and sent to the insurance company, which is used to calculate the customer’s premium. Pricing is done based on kilometers driven, speeding, sharp parking and sudden acceleration per 100 km driven. The focus is towards PHYD. The proposed solution for personalized premium calculation focuses on MHYD with live voice alerts.

Tselentis et al. [ 9 ] have offered a solution towards usage-based motor insurance for PAYD and PHYD. Drivers need to pay a premium based on their driving behavior and degree of exposure. Financial incentive is given to drivers to improve their driving behavior such as reducing the number of harsh braking and acceleration events taking place or reducing their degree of exposure such as their annual mileage and the time of day travelling which reduces the traffic risk considerably. The authors focused on PAYD/PHYD and providing personalized premium to the customers with less emphasis on safety aspects. The proposed model focuses on personalized premium calculation with great emphasis on safety aspects as well.

Hu et al. [ 20 ] have devoted research efforts towards the design of a personalized driver model by using a locally designed neural network and the real-world Vehicle Test Data (VTD). Besides, an abnormality index is proposed to quantitatively evaluate the abnormal driving behavior. The parameters used are speed, hard brake, and hard acceleration. The authors also explained that blood pressure and the blood alcohol level which are also useful physiological signals for indicating abnormal behavior. The importance of using behavioral, psychological, environmental, and emotional factors to detect abnormal driving behavior is discussed in detail. Lack of real-time driving data is considered to be the drawbacks of VTD based system. The proposed system for personalized premium calculation explores the possibility of including emotional factors along with the behavioral factors for driving behavior detection.

Zhou et al. [ 21 ] have identified the aggressive/risky driving behavior patterns on horizontal curves using real field Basic Safety Messages (BSM) data. The parameters used are hard acceleration and hard braking. Private Usage-Based Scoring (Pri-UBS) algorithm and Probabilistic Usage Data Audition (Pro-UDA) protocol are used to identify the abnormality. The authors well stated that many environmental factors such as real-time traffic and traffic regulations could influence the driving speed.

Pattern monitoring using OBD

Bergasa et al. [ 22 ] have constructed DriveSafe mobile application for iPhone using the mobile camera, microphone, GPS and Sensors (Accelerometer, Gyroscope). Drivesafe uses lane drifting/weaving, acceleration, braking and turning events to analyze the driving behavior and calculate the driving score and detects inattentive driving behaviors thereby generating alarms during unsafe situations where as making the driving style safe at the same time. DriveSafe classifies the drivers into two categories normal and abnormal. The proposed solution for personalized premium calculation will classify the drivers into four categories based on their driving behavior and health condition to improve the safety of drivers.

Zhang et al. [ 23 ] have built a system called SafeDrive, to detect abnormal driving behaviors from large-scale vehicle data State Graph (SG). The parameters used are hard acceleration and hard braking. The accuracy of SG system is 93%. In SG based research, behavioral factors are considered with less importance given towards personalized premium calculation. The proposed solution for personalized premium calculation addresses the important objective of abnormal driving behavior detection using behavioral and emotional factors.

Nai et al. [ 24 ] have represented the UBI using the Fuzzy Risk Mode and Effect Analysis (FRMEA) method. Analyzing the driving style from the raw data collected by OBD module assesses risk level of each insured vehicle. Risk modes used are jerking low speed, always speed changing and jerking high speed. The parameters used are speed, hard acceleration, hard braking. Since the authors used the OBD device to collect the raw data, alerting the abnormal drivers are not considered. The proposed solution plans to use the GPS to collect the raw data and alert the abnormal drivers.

Several methods are proposed by different researchers to monitor the driving pattern. Researchers around the world have attempted to capture the real driving data by using OBD device, mobile sensors or GPS data. The parameters used by most of the researchers are speed, hard braking, acceleration, and cornering. In all the research works, the algorithms used to monitor the driving pattern are different and each one had its own set of advantages and disadvantages. The research paper [ 6 ] developed a solution for teen monitoring. The research papers [ 9 , 10 , 19 ] focused on providing the reward points based on driving behavior. The research papers [ 11 , 12 , 13 , 14 ] focused on providing personalized premium based on the driving behavior. The research papers [ 15 , 16 , 17 , 18 ] have used machine learning, big data and deep learning algorithms to classify the driving behavior. The research papers [ 20 , 23 ] have suggested to consider other factors apart from behavioral factors to detect driving behavior.

Fatigue monitoring

Fatigue monitoring is the act of using technology to monitor the behavior of a driver to determine their level of fatigue while driving the car [ 25 ]. The benefit of fatigue monitoring includes improved decision-making and response times with increased productivity and a considerable reduction in accident severity. Various studies have suggested that around 20% of all road accidents are fatigue-related [ 26 ]. The following are some of the related works in this space:

Warwick et al. [ 27 ] have investigated a system to calculate the fatigue level by using an eye image. If there are three close frames out of five consecutive frames, then an alert is issued to drivers. The method is simple and the accuracy is less since it uses only one source of data like eye image.

Podder et al. [ 28 ] have discussed and developed a fatigue monitoring system by using machine vision and Adaboost algorithm. Face and eye classifiers are used. Image preprocessing, face detection, eye state recognition, and fatigue evaluation are used to identify the fatigue level. This system is complex in terms of implementation.

Chaitali et al. [ 29 ] have tested a driver fatigue detection and monitoring system using smartphone. Driver fatigue state is estimated by using eye blinking rate of the driver, yawning detection by tracking mouth, head rotation detection and gaze tracking for detecting driver distraction and stress detection from driver’s facial expression tracking. Smartphone front camera to take the driver’s image and back end camera are used to provide traffic sign detection. Big data and open computer vision technologies are used to track face from images.

Qiao et al. [ 30 ] have illustrated a fatigue monitoring system by using eye, mouth and face images. Mobile built-in camera is used to detect the driver’s eyes. Face and eye blink are captured using Haar-like technique and mouth detection for yawning with Canny Active Contour Method. The recorded real-time video is separated as frames and then processed for real-time.

Hu et al. [ 20 ] have generated a machine-learning model to detect the abnormal driving by analyzing normalized driving behavior. Three typical abnormal driving behavior patterns are characterized and simulated, namely the fatigue/drunk, the reckless, and the phone use while driving. Only simulated data is used. Based on the analysis of normalized driving behavior, an abnormality index is proposed.

Mandal et al. [ 31 ] have proposed a system and it consists of modules of head-shoulder detection, face detection, eye detection, eye openness estimation, fusion, drowsiness measure percentage of eyelid closure estimation and fatigue level classification. The system is considered to be easy and flexible for deployment in commercial vehicles. The approach may not be suitable for private vehicles such as private cars.

Al-Sultan et al. [ 32 ] have studied a driver behavior detection using a context-aware system in Vehicular Ad hoc Networks (VANETs) to detect abnormal behavior exhibited by drivers and to warn other vehicles. VANETs used Dedicated Short-Range Communication (DSRC) to allow vehicles in close proximity to communicate with each other or to communicate with roadside equipment. Normal, drunk, reckless and fatigue are used to monitor the driver. Fatigue is estimated by using the eye movements; reckless driving is estimated by driver acceleration; intoxication and drunkenness are estimated by controlling speed. Speed sensor, accelerometer sensor, GPS, cameras, alcohol sensor are used alongside the coordination with traffic management centers which provided information relating to traffic, weather, road conditions and adaptive hello message (vehicle alarm system).

Several methods have been proposed by various researchers to monitor the fatigue level of the driver. Cameras and wearable devices are used to extract the visual cues and contextual information. Different techniques are used to detect the fatigue level and their advantages and disadvantages have been described. The research papers [ 27 , 32 ] monitored the fatigue level by using only eye image. The research papers [ 28 , 30 , 31 ] are implemented by using face and eye blinking and [ 29 ] eye image, eye blinking and head rotation.

Drowsiness detection

Drowsiness detection is an activity for ensuring car safety, which helps to prevent accidents caused by the driver getting drowsy. The following are some of the current systems to learn about the driver styles and detect when a driver is getting drowsy.

Bergasa et al. [ 22 ] have explained a driver drowsiness detection system. Driver heart beat rate and breathing rate are measured using bioharness 3 sensor produced by Zephyr Technology. The fact that the breathing rate goes down and the heart rate goes up when a person falls asleep is used for detection. Processing sensor data is taken care by the filter algorithm and fast Fourier transform.

Ke et al. [ 33 ] have addressed the drowsiness detection by using heart beat rate in Android-based hand held devices. Big data and open computer vision (OCV) technologies are used to track the face from images. Smartphone front camera is used to take the driver’s image and back end camera are used to provide traffic sign detection. Eye, mouth and head parameters are used. In addition to drowsiness, fatigue level are also estimated by using eye blink rate; yawning detection is done by tracking mouth and head movement.

Nai et al. [ 24 ] have conducted experiments for drowsiness detection system by using eye blink. Hamming window and FFT techniques are used to detect the drowsiness. ECG signals are acquired from a sensor and eye blink rate is obtained from the camera and transferred (via Bluetooth) to android device which is used for drowsiness detection. This system will work on all Android devices.

Rohit et al. [ 34 ] have focused on drowsiness detection by using electroencephalogram (EEG) and wearable sensors. Machine learning and big data technologies are used. SVMs are used to classify the drowsy states. The EEG signals are also used to characterize the eye blink duration and frequency of subjects.

Qian et al. [ 35 ] have compared the performance of drowsiness detection with other traditional feature extraction methods. Bayesian Non-negative CP Decomposition (BNCPD) is used to extract common multiway features from the group-level EEG signals. Automatic CP rank determination and plausible multiway physiological information of individual states are used.

Qian et al. [ 36 ] have designed a drowsiness system, to detect individual drowsiness based on the physiological features from EEG signals. Bayesian-Copula Discriminant Classifier (BCDC) is used. The results are not generalized to other experimental environments to detect the drowsiness.

Li et al. [ 37 ] have collected data for the drowsiness detection system using wireless and wearable technology. Brain Machine Interface (BMI) system is dedicated to signal sensing and processing for Driver Drowsiness Detection (DDD). Bluetooth low-energy module is embedded and used to communicate with a fully wearable consumer device, a smart watch, which coordinated the work of drowsiness monitoring and brain stimulation with its embedded closed-loop algorithm. Smart watch is required to detect drowsiness for this research.

Different techniques are proposed by eminent researchers around the world to monitor the drowsiness level of the driver. Different techniques and technologies (Big Data and Machine Learning) are used to extract the contextual information using the camera/wearable. Different approaches are used to detect the drowsiness level and the advantages and disadvantages are explained in detail. The research papers [ 22 , 24 , 33 ] monitored the drowsiness level by using heart rate and breath rate and the research papers [ 34 , 37 ] have used wearable and the research papers [ 35 , 36 ] have used EEG signals.

Driver distraction

Driver distraction is any activity that diverts attention from driving, including talking or texting on the phone, eating and drinking, talking to people in their vehicle, fiddling with the stereo, entertainment or navigation system—anything that takes the driver’s attention away from the task of safe driving. The numbers illustrating the dangers of cell phone use while driving are downright startling. At any given time throughout the day, approximately 660,000 drivers [ 38 ] are attempting to use their phones while driving. The following are some of the driver distraction detection systems available in this space:

Sigari et al. [ 39 ] have built a distraction detection system by using eye, mouth and head images. Big data and SVM with Polynomial kernel are used. The images from driver’s face are captured and the symptoms of fatigue and distraction are extracted from eyes, mouth and head. The success rate of the system is only 91.57% as it used only lesser number of images to build the model.

Abulkhair et al. [ 40 ] have adapted a driver face monitoring system used to identify fatigue and distraction, which captured the driver image and extracted symptoms of fatigue and distraction from eyes, mouth and head. The extracted symptoms are usually the percentage of eyelid closure over time, eyelid distance, eye blink rate, blink speed, gaze direction, eye saccadic movement, yawning, head nodding and head orientation. Haar-like and AdaBoost algorithms are used to process image.

Yuen et al. [ 41 ] have represented a driver distraction detection system by using face detection and head pose. “Kinect” device is used to work on single images at a time. Feed Forward Neural Network (FFNN) is used. 3-D head rotation angles and the upper body joint positions are recorded. Signals collected from the “Kinect” consisted of a color and depth image of the driver inside the vehicle cabin. The drawback of the system is that it used relatively costly “Kinect” devices.

Li et al. [ 37 ] have launched a system for early detection of driver drowsiness by using wireless and wearable BMI. Eye, mouth and head are used for extracting the parameters. A Bluetooth low-energy module is embedded in the BMI system and used to communicate with a fully wearable consumer device. The focus is only on the participant’s behavior changes of pre and post-simulation.

Hu et al. [ 20 ] have studied the drowsiness detection system using data-mining approach. Eye, mouth and head parameters are used. An abnormality index is proposed based on the analysis of normalized driving behaviors and applied to quantitatively evaluate the abnormality. Peripheral vehicle behaviors during gaze transitions are analyzed; classifiers are used to discriminate between the cognitive distraction and neutral states. Classifiers have been trained to manage various situations and provide high classification accuracy.

Multiple techniques have been proposed by respected researchers to detect the distraction level of the driver. Multiple algorithms and mining methods (Machine Learning and Big Data) have been used to extract the real-time video and video to frame conversion that is collected by using camera. Various approaches are used to detect the distraction level and the advantages and disadvantages are well described. The research paper [ 41 ] monitored the distracted level by using face and head while the research papers [ 20 , 37 , 39 , 40 ] used eye, mouth and head.

Proposed solution for personalized premium calculation

The prime challenge faced by the insurance industry is the personalized premium calculation using the driving behavior detection. Assigning the category for the driver based on his mode of driving needs to be taken care. Real time alerts to fine tune the driving towards safety needs to be addressed. The driving behavior is classified into two techniques [ 42 ]:

Real-time detection: Identification of driving behavior based on the continuous stream of data collected in frequent intervals while the vehicle is being driven. The collected data will help insurers to precisely identify the driver behavior and calculate personalized premiums.

Non real-time detection: The complete data is received after each trip is completed and the identification of driving behavior is made on the basis of post trips analysis.

The listed four factors are influencing the driving behavior and the greater understanding of the factors will aid in the development of more appropriate and effective solutions

Behavioral factors: Driving pattern, fatigue, drowsiness driver and driver distraction monitoring.

Environment factor: Traffic and road condition.

Physiological/Psychological factor: Blood pressure and blood alcohol level.

Emotional factor: Stress, fear, heart rate and anxiety.

In the related work section, all the classifications of MHYD have been reviewed and found that in most of the research works, only behavioral factors are considered to identify the abnormal driving. As far as behavioral factors are concerned, ample number of solutions has been implemented to identify fatigue, drowsiness, and distracted drivers. However, niggardly solutions are available to identify the abnormal drivers by using driving pattern monitoring. The driving pattern monitoring is done based on four vital parameters acceleration, braking, cornering and speed where a less number of researches only have considered all the four parameters and most others have considered only two or three parameters. For example, Hu et al. [ 20 ] developed an abnormal driving behavior system and the parameters used are only speed, hard brake, and hard acceleration. The authors considered only behavioral factor, but explained the importance of using two or more factors.

In the proposed solution, the plan is to identify the abnormal driving behavior by considering both the behavioral and emotional factors. All the four parameters from behavioral factor and only parameter heart rate from emotional factor will be considered. The four parameters from behavioral factor are expected to detect abnormal driving behaviors with the identification of specific types of driving behavior such as good driver, regular bad driver, unhealthy driver and road rage/aggressive driver. The classification and detection will be done with the help of machine learning algorithms and big data technologies. The data obtained from the real driving environment is used to train and create the models. Once the road rage/aggressive drivers are identified, then alerts will be given to the drivers to improve their safety.

The proposed solution implementation has been categorized into three phases as shown in Fig.  7 .

figure 7

Data collection phase: GPS Data will be collected from different users and stored in the big data environment after applying few pre-processing algorithms.

Rage and aggression phase: Detection and identification of rage/aggressive drivers will be done in road rage and aggression phase and classified into online part and offline part. In offline part of modeling driving behaviors will build a model using machine-learning techniques based on the collected data and this model will be synchronized to the driver’s smartphone. In the online part of monitoring driving behaviors, after getting real-time readings from the driver’s smartphone, the data is compared with the generated model, which is already synchronized in the smartphone using any prediction (ML) algorithm. Finally, if any of the abnormal driving behaviors are identified, a live voice alert would be sent to receivers or the users.

Enterprise phase: Driver details such as number of normal trips, road rage and aggressive driving trips, time and date of travel and number of alerts are collected. Insurance companies will use the collected data to calculate the personalized premium.

Aggressive driving and Road rage detection are less explored in related work. The proposed methodology could be used as an add-on feature for the example applications such as Drivewise, snapshot, smartride and smiles to calculate the personalized premium and to ensure safety of the driver.

Uniqueness of the proposed solution

Real-time data collection: Extracted huge data from the GPS signals such as speed, latitude, longitude, altitude and course values. Hard acceleration, hard braking, hard cornering and over speeding events are derived using mathematical calculations with the help of extracted values using big data processing in cloud instances.

Driving factors: Considered the stack of all behavioral and emotional factors to identify the abnormal driving behavior during on board in the vehicle to observe the raw real time data and derived real time data.

Driving behavior classification: Classified the driver characteristics into four unique types in the proposed solution which greatly helps in precise personalized premium calculations whereas most of the research works in the related works have classified the driving behavior into two or three types such as normal/abnormal, normal/abnormal/moderate.

Live voice alert: Planned to use the text to voice engine, which will help to convert the personalized text to voice. The user gets the live voice alert or warning based on the driving behavior while driving that helps improving the safety of the driver.

Novel solution: Introduced a new methodology to detect road rage and aggressive drivers by using all the types of huge data collection and processing in the cloud instances which makes a complete eco system for driving behavior detection which is less explored in the related work.

Conclusion and future research directions

The paper is initiated with the introduction and motivation towards taking insurance for the customers on the basis of driving behavior detection where different data collection methods are presented including dongle, black box, embedded and smart phones with sensors and GPS signals. A bunch of parameters adapted by the industries and research papers, which are used for driving behavior detection, are also presented. The need for big data technology for implementing UBI in the insurance industry and the ten V’s of big data are described. The observation of existing research shows that, much focus has been provided on volume, variety, velocity and veracity with less available focus towards inference of values from the big data.

UBI and the three classifications such as PAYD, PHYD and MHYD are presented. MHYD is classified into four categories such as driving pattern monitoring, distracted driving monitoring, fatigue monitoring and drowsiness monitoring. Driving pattern monitoring is elaborated with the data collection sources such as GPS, mobile phone sensors and OBD. The survey provided the most comprehensive analysis of all the categories of MHYD and found that there is a lot of scope for research in MHYD model. Related work in terms of the objective of the research, data collection technique, identified issues/gaps, implementation algorithms, driving parameters, advantages, disadvantages and the inferences are discussed. The paper provides a foundation for further research directions on comprehensive driver behavior pattern monitoring and their applications.

To the best of our knowledge, the survey has motivated us to propose a solution to find the road rage/aggressive driver and offer a personalized premium calculation method. The objective and the need for the proposed methodology are discussed to alert the rage/aggressive driver during the trip on identifying any such aberration. Three phases of the proposed solution data collection phase, rage and aggression phase and enterprise phase for personalized premium calculation including machine learning and big data are highlighted.

The proposed research could be further extended towards many dimensions like consideration of additional factors such as environmental, physiological/psychological factors. The possibility of inculcating the habit of proactive alerts could be enhanced with the help of prediction using machine learning algorithms and big data to improve the safety and life of the drivers which could be a real service to the mankind.

Availability of data and materials

Not applicable.


Bayesian-Copula Discriminant Classifier

Brain Machine Interface

Bayesian Non-Negative CP Decomposition

Basic Safety Messages

driver drowsiness detection

Deep Sparse Auto Encoder

dedicated short-range communication



Feed Forward Neural Network

Fast Fourier Transforms

Fuzzy Risk Mode and Effect Analysis

Global Positioning System

Hidden Markov Model

  • Manage-How-You-Drive

On-Board Diagnosis

open computer vision

  • Pay-As-You-Drive
  • Pay-How-You-Drive

Private Usage-Based Scoring

Probabilistic Usage Data Audition

state graph

support vector machine

usage-based insurance

Vehicular Ad hoc Networks

Vehicle Test Data

Statista. Number of passenger cars and commercial vehicles in use worldwide. 2019. .

World Health Organization. Global status report on road safety. 2013. .

World Health Organization. Global status report on road safety 2015. 2015. .

Verizon Connect. Advanced GPS fleet tracking software. 2019. .

Firican G. The 10 Vs of Big Data. 2017. .

IBM. Telematics for insurance: capitalizing on the rise in connected vehicles to enhance customer engagement and develop new value-added services. 2014. .

IBM. Analytics: real-world use of big data in insurance. 2019. .

Madan N. 3 ways big data can influence decision-making for organizations. 2018. .

Tselentis DI, Yannis G, Vlahogianni EI. Innovative insurance schemes: pay as/how you drive. Transp Res Procedia. 2016;14:362–71.

Article   Google Scholar  

Allstate. Stay smart on the road. 2019. .

TDInsurance. TD MyAdvantage. 2019. .

Progressive. Snapshot means BIG discounts for good drivers. 2019. .

Staefarm. Ajusto rewards safe driving. 2019. .

Smartride. Nationwide’s SmartRide program rewards safe driving. 2019. .

Yu J, Chen Z, Zhu Y, Chen Y, Kong L, Li M. Fine-grained abnormal driving behaviors detection and identification with smartphones. IEEE Trans Mob Comput. 2017;16(8):2198–212.

Shi B, Xu L, Hu J, Tang Y, Jiang H, Meng W, Liu H. Evaluating driving styles by normalizing driving behavior based on personalized driver modeling. IEEE Trans Syst Man Cybern Syst. 2015;45(12):1502–8.

Liu HL, Taniguchi T, Tanaka Y, Takenaka K, Bando T. Visualization of driving behavior based on hidden feature extraction by using deep learning. IEEE Trans Intell Transp Syst. 2017;18(9):2477–89.

Daptardar S, Lakshminarayanan V, Reddy S, Nair S, Sahoo S, Sinha P. Hidden Markov Model-based driving event detection and driver profiling from mobile inertial sensor data. In: IEEE Sensors; 2015.

Zhao J, Lim J, Chung HL, Leung S, Taffel M, Lo L. Introducing pay how you drive insurance; 2016.

Hu J, Xu L, He X, Meng W. Abnormal driving detection based on normalized driving behavior. IEEE Trans Veh Technol. 2017;66(8):6645–52.

Zhou L, Du S, Zhu H, Chen C, Ota K, Dong M. Location privacy in usage-based automotive insurance: attacks and countermeasures. IEEE Trans Intell Transp Syst. 2019;14(1):196–211.

Google Scholar  

Bergasa LM, Almeria D, Almazan J. Driving fatigue detection based driving fatigue detection based: an app for alerting inattentive drivers and scoring driving behaviors. In: Intelligent vehicles symposium proceedings, Dearborn, MI, USA; 2014. p. 240–5.

Zhang M, Chen C, Wo T, Xie T, Bhuiyan MZA, Lin X. safedrive: online driving anomaly detection from large-scale vehicle data. IEEE Trans Ind Inf. 2017;13(4):2087–96.

Nai W, Chen Y, Yu Y, Zhang F, Dong D, Zheng W. Fuzzy risk mode and effect analysis based on raw driving data for pay-how-you-drive vehicle insurance. In: IEEE conference on big data analysis (ICBDA), Hangzhou, China; 2016.

Li Z, Sun G, Zhang F. Smartphone-based fatigue detection system using progressive locating method. IET Intell Transp Syst. 2016;10(3):148–56.

Driver fatigue and road accidents—a literature review and position paper. Royal Society for the Prevention of Accidents; 2001.

Warwick B, Symons N, Chen X. Detecting driver drowsiness using wireless wearables. In: Mobile ad hoc and sensor systems, Dallas, 2015, TX, USA.

Podder S, Roy S. Driver’s drowsiness detection using eye status to improve the road safety. Int J Innov Res Comput Commun Eng. 2013;1(7):1490.

Chaitali Z, Kulkarni KYC. Driver aided system using open source computer vision. Int J Innov Res Comput Commun Eng. 2015;3(5):3779.

Qiao Y, Zeng K, Xu L, Yin X. A smartphone-based driver fatigue detection using fusion of multiple real-time facial features. In: Consumer communications & networking conference, Las Vegas, NV, USA; 2016.

Mandal B, Li L, Wang GS. Towards detection of bus driver fatigue based on robust visual analysis of eye state. IEEE Trans Intell Transp Syst. 2017;18(3):545–57.

Al-Sultan S, Al-Bayatti AH, Zedan H. Context-aware driver behavior detection system in intelligent transportation systems. IEEE Trans Veh Technol. 2013;62(9):4264–75.

Ke K, Zulman MR, Wu H, Huang YF. Drowsiness detection system using heartbeat rate in android-based handheld devices. In: First international conference on multimedia and image processing; 2016.

Rohit F, Kulathumani V, Kavi R. Real-time drowsiness detection using wearable, lightweight brain-sensing headbands. IEEE Trans Intell Transp Syst. 2017;11(5):255–63.

Qian D, Wang B, Qing X. Bayesian nonnegative CP decomposition-based feature extraction algorithm for drowsiness detection. IEEE Trans Neural Syst Rehabil Eng. 2017;25(8):1297–308.

Qian D, Wang B, Qing X. Drowsiness detection by Bayesian-copula discriminant classifier based on EEG signals during daytime short nap. IEEE Trans Biomed Eng. 2017;64(4):743–54.

Li G, Chung W. Combined EEG-Gyroscope-tDCS brain machine interface system for early management of driver drowsiness. IEEE Trans Hum Mach Syst. 2018;48(1):50–62.

WHO. 2013. .

Sigari M, Pourshahabi M, Soryani M, Fathy M. A review on driver face monitoring systems for fatigue and distraction detection. Int J Adv Sci Technol. 2014;64:73–100.

Abulkhair MF, Salman HA, Ibrahim LF. Using mobile platform to detect and alerts driver fatigue. Int J Comput Appl. 2015;123:27–35.

Yuen K, Trivedi MM. An occluded stacked hourglass approach to facial landmark localization and occlusion estimation. IEEE Trans Intell Veh. 2017;2(4):321–31.

Chhabra R, Verma S, Krishna CR. A survey on driver behavior detection techniques for intelligent transportation systems. In: International conference on cloud computing, data science & engineering; 2017.

Download references


Author information, authors and affiliations.

School of Computing Science and Engineering, Vellore Institute of Technology (VIT), Chennai, Tamil Nadu, India

Subramanian Arumugam & R. Bhargavi

You can also search for this author in PubMed   Google Scholar


SA proposed the idea of the survey, performed the literature review, analysis for the work, and wrote the manuscript. BR provided technical guidance and assisted with editing the manuscript. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Subramanian Arumugam .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Arumugam, S., Bhargavi, R. A survey on driving behavior analysis in usage based insurance using big data. J Big Data 6 , 86 (2019).

Download citation

Received : 28 June 2019

Accepted : 09 September 2019

Published : 25 September 2019


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Big data analytics
  • Driving behavior
  • Usage based insurance
  • Machine learning

research paper car insurance

To read this content please select one of the options below:

Please note you do not have access to teaching notes, automobile insurance fraud detection in the age of big data – a systematic and comprehensive literature review.

Journal of Financial Regulation and Compliance

ISSN : 1358-1988

Article publication date: 8 April 2022

Issue publication date: 2 August 2022

The purpose of this paper is to survey the automobile insurance fraud detection literature in the past 31 years (1990–2021) and present a research agenda that addresses the challenges and opportunities artificial intelligence and machine learning bring to car insurance fraud detection.


Content analysis methodology is used to analyze 46 peer-reviewed academic papers from 31 journals plus eight conference proceedings to identify their research themes and detect trends and changes in the automobile insurance fraud detection literature according to content characteristics.

This study found that automobile insurance fraud detection is going through a transformation, where traditional statistics-based detection methods are replaced by data mining- and artificial intelligence-based approaches. In this study, it was also noticed that cost-sensitive and hybrid approaches are the up-and-coming avenues for further research.

Practical implications

This paper’s findings not only highlight the rise and benefits of data mining- and artificial intelligence-based automobile insurance fraud detection but also highlight the deficiencies observable in this field such as the lack of cost-sensitive approaches or the absence of reliable data sets.


This paper offers greater insight into how artificial intelligence and data mining challenges traditional automobile insurance fraud detection models and addresses the need to develop new cost-sensitive fraud detection methods that identify new real-world data sets.

  • Literature review
  • Data mining
  • Automobile insurance fraud detection

Benedek, B. , Ciumas, C. and Nagy, B.Z. (2022), "Automobile insurance fraud detection in the age of big data – a systematic and comprehensive literature review", Journal of Financial Regulation and Compliance , Vol. 30 No. 4, pp. 503-523.

Emerald Publishing Limited

Copyright © 2022, Emerald Publishing Limited

Related articles

We’re listening — tell us what you think, something didn’t work….

Report bugs here

All feedback is valuable

Please share your general feedback

Join us on our journey

Platform update page.

Visit to discover the latest news and updates

Questions & More Information

Answers to the most commonly asked questions here

Book cover

International Conference on Smart Computing and Communication

SmartCom 2020: Smart Computing and Communication pp 163–172 Cite as

Blockchain Technology in Automobile Insurance Claim Systems Research

  • Han Deng 9 ,
  • Chong Wang 10 ,
  • Qiaohong Wu 11 ,
  • Qin Nie 9 ,
  • Weihong Huang 12 &
  • Shiwen Zhang 12  
  • Conference paper
  • First Online: 17 April 2021

549 Accesses

Part of the Lecture Notes in Computer Science book series (LNISA,volume 12608)

Recently, academics and industries are increasingly paying attention to blockchain technology because it is used to improve traditional companies. Insurance company is one type of the traditional and inflexible companies. Its operation is not transparent, based on paper contracts, relying on human intervention and other characteristics resulting in low efficiency. The transparency, tamper-proof, decentralization and other advantages of blockchain-based Internet of Things and smart contract technology enable people to improve the traditional processes and structures of the insurance industry. This paper summarizes the application of block chain technology in the auto insurance claim systems. By comparing the implementations of various systems, the existing achievements and their advantages and disadvantages are described. In addition, this paper prospects the key directions of future research on the implementation of blockchain.

  • Internet of Things
  • Smart contract
  • Auto insurance
  • Vehicle system

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Bader, L., et al.: Smart contract-based car insurance policies. In: 2018 IEEE Globecom Workshops (GC Wkshps). IEEE (2018)

Google Scholar  

Sharma, P.K., et al.: DistBlockNet: a distributed blockchains-based secure SDN architecture for IoT networks. IEEE Commun. Mag. 55 (9), 78–85 (2017)

Article   Google Scholar  

Xie, Y., Liang, W., Li, R.F., Wu, K.S., Hong, C.Q.: Signal packing algorithm for in-vehicle CAN in Internet of vehicles. Ruan Jian Xue Bao/J. Softw. 27 (9), 2365–2376 (2016). (in Chinese)

Oham, C.F.: A liability attribution and security framework for fully autonomous vehicles (2019)

Dai, W., Qiu, M., Qiu, L., Chen, L., Wu, A.: Who moved my data? Privacy protection in smartphones. IEEE Commun. Mag. 55 (1), 20–25 (2017).

Gai, K., Qiu, M., Zhao, H.: Security-aware efficient mass distributed storage approach for cloud systems in big data. In: 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity) (2016)

Ouyang, L.-W., et al.: Intelligent contracts: architecture and progress. Acta Automata 45 (3), 445–457 (2019)

Cohn, A., West, T., Parker, C.: Smart after all: blockchain, smart contracts, parametric insurance, and smart energy grids. Georgetown Law Technol. Rev. 1 (2), 273–304 (2017)

Aleksieva, V., Hristo, V., Huliyan, A.: Application of smart contracts based on Ethereum blockchain for the purpose of insurance services. In: 2019 International Conference on Biomedical Innovations and Applications (BIA). IEEE (2019)

Demir, M., Turetken, O., Ferworn, A.: Blockchain based transparent vehicle insurance management. In: 2019 Sixth International Conference on Software Defined Systems (SDS), pp. 213–220. IEEE (2019)

Raikwar, M., Mazumdar, S., Ruj, S., et al.: A blockchain framework for insurance processes. In: 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pp. 1–4. IEEE (2018)

Morano, F., Ferretti, C., Leporati, A., et al.: A blockchain technology for protection and probative value preservation of vehicle driver data. In: 2019 IEEE 23rd International Symposium on Consumer Technologies (ISCT), pp. 167–172. IEEE (2019)

Dorri, A., Steger, M., Kanhere, S.S., Jurdak, R.: Blockchain: a distributed solution to automotive security and privacy. IEEE Commun. Mag. 55 (12), 119–125 (2017)

Cebe, M., Erdin, E., Akkaya, K., et al.: Block4Forensic: an integrated lightweight blockchain framework for forensics applications of connected vehicles. IEEE Commun. Mag. 56 (10), 50–57 (2018)

Sharma, P.K., Kumar, N., Park, J.H.: Blockchain-based distributed framework for automotive industry in a smart city. IEEE Trans. Industr. Inf. 15 (7), 4197–4205 (2018)

Lamberti, F., Gatteschi, V., Demartini, C., et al.: Blockchains can work for car insurance: using smart contracts and sensors to provide on-demand coverage. IEEE Consum. Electron. Mag. 7 (4), 72–81 (2018)

Vo, H.T., Mehedy, L., Mohania, M., et al.: Blockchain-based data management and analytics for micro-insurance applications. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2539–2542 (2017)

Saldamli, G., Karunakaran, K., Vijaykumar, V.K., et al.: Securing car data and analytics using blockchain. In: 2020 Seventh International Conference on Software Defined Systems (SDS), pp. 153–159. IEEE (2020)

Wan, Z., Guan, Z., Cheng, X.: PRIDE: a private and decentralized usage-based insurance using blockchain. In: 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 1349–1354. IEEE (2018)

Kokab, S.T., Javaid, N.: Blockchain based insurance claim system for vehicles in vehicular network

Oham, C., Jurdak, R., Kanhere, S.S., et al.: B-FICA: blockchain based framework for auto-insurance claim and adjudication. In: 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 1171–1180. IEEE (2018)

Davydov, V., Bezzateev, S.: Accident detection in internet of vehicles using blockchain technology. In: 2020 International Conference on Information Networking (ICOIN), pp. 766–771. IEEE (2020)

Xu, J., Wu, Y., Luo, X., et al.: Improving the efficiency of blockchain applications with smart contract based cyber-insurance. In: Proceedings of the IEEE International Conference on Communications (2020)

Vahdati, M., HamlAbadi, K.G., Saghiri, A.M., et al.: A self-organized framework for insurance based on Internet of Things and blockchain. In: 2018 IEEE 6th International Conference on Future Internet of Things and Cloud (FiCloud), pp. 169–175. IEEE (2018)

Oham, C., Kanhere, S.S., Jurdak, R., et al.: A blockchain based liability attribution framework for autonomous vehicles. arXiv preprint arXiv:1802.05050 (2018)

Yong, Y., et al.: Development status and prospect of blockchain consensus algorithm. 44 (11), 2011–2022 (2018). (in Chinese)

Qiu, M.K., Zhang, K., Huang, M.: An empirical study of web interface design on small display devices. In: IEEE/WIC/ACM International Conference on Web Intelligence (WI 2004). IEEE (2004)

Qiu, M., Jia, Z., Xue, C., Shao, Z., Sha, E.H.M.: Voltage assignment with guaranteed probability satisfying timing constraint for real-time multiproceesor DSP. J. VLSI Sig. Process. Syst. Sig. Image Video Technol. 46 (1), 55–73 (2007)

Download references


This work is supported by the National Natural Science Foundation of China (No. 61702180) and the Doctoral Scientific Research Foundation of Hunan University of Science and Technology (No. E52083).

Author information

Authors and affiliations.

Big Data Development and Research Center, Guangzhou College of Technology and Business, Guangzhou, 528138, China

Han Deng & Qin Nie

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410000, China

Department of Accounting, Guangzhou College of Technology and Business, Guangzhou, 528138, China

Qiaohong Wu

School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China

Weihong Huang & Shiwen Zhang

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Chong Wang .

Editor information

Editors and affiliations.

Texas A&M University – Commerce, Commerce, TX, USA

Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Cite this paper.

Deng, H., Wang, C., Wu, Q., Nie, Q., Huang, W., Zhang, S. (2021). Blockchain Technology in Automobile Insurance Claim Systems Research. In: Qiu, M. (eds) Smart Computing and Communication. SmartCom 2020. Lecture Notes in Computer Science(), vol 12608. Springer, Cham.

Download citation


Published : 17 April 2021

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-74716-9

Online ISBN : 978-3-030-74717-6

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research no longer supports Internet Explorer.

To browse and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

A Study on Customer Awareness on Car Insurance Policies with Special Reference to

Profile image of IJIRST - International Journal for Innovative Research in Science and Technology

Purpose: The purpose of this study is to understand the customer awareness on car insurance policies with special reference to United India Insurance with the important element to improve the customer awareness towards insurance policies based on literature review and case study of successful vehicle Insurance Company. This study mainly focused on customer's awareness and satisfaction level on the car insurance policies offered by the company. Research Design: This research study is mainly based on the method of probability sampling with random sampling techniques, this research study is conducted within shivamogga city with the sample size of 150 respondents from the Primary data which is collected through structured questionnaire as a sample tool for the information's assembly, secondary data is collected by the magazine, journals of the marketing, articles and books, Findings: From the study came to know that respondents or policy holders are not aware about the terms and conditions, procedures of claiming during the time of damage or loss of the insurance policy offered by the company. Results: United India Insurance Corporation is a well-known insurance organization in the field of vehicle insurance Business which is a leading insurance sectors in providing service to the customers and customers are well satisfied with the price of the insurance policies offered by the united India insurance organization to the customers. Conclusion: From this study it is cleared that most of the policy holders are not aware about the procedures, terms and conditions, policies premium calculation procedure based on vehicle ID value, age, model etc. The concept of car insurance policies is very much needed aspects to the people who have owned a car, having car insurance policies makes the customers feel protected from the loss or damage if caused by the accident.

Related Papers

Journal of emerging technologies and innovative research

Josephine Stella A.

Motor insurance contributes to one third of the premium income for the General Insurance industry in India. The growth of the economy and consequently, the standard of living of the people, further supported by the increased choice for the customer and entry of large number of automobile players led to a sharp increase in motor insurance. The main aim of the motor insurance is to protect the people from the loss arising out of accident. It covers loss made vehicle. The awareness of the people towards Insurance is low in India generally it is very difficult to create the buying attitude among the prospective buyers towards the different kinds of insurance. The General Insurance Corporation finds it difficult to identify and to make the clients believe the concepts of Insurance Policy. At the same time, the policy will be valued for only one year. The lack of insurance awareness is the main problem in general insurance particularly in motor insurance. Vehicle owners buy it only on the...

research paper car insurance

International Journal of Research in Commerce and Management

Dhiraj Jain

vikas kumar gautam

International Journal of Management, Technology, and Social Sciences (IJMTS)

Srinivas Publication , Swati Basu Ghose

The primary purpose of vehicle insurance is to cover the vehicle against damage, personal injury, and third-party liability. In addition to this, some insurance companies also provide value-added services such as roadside assistance and other services in return of the amount called as premium which attracts a large number of customers. However, our study shows that vehicle owners give maximum importance to the cost of insurance in terms of the annual premium. Primary data has been collected through questionnaire and analysed to ascertain about the factors responsible for taking out vehicle insurance, choice between private and public sector insurance companies, preferred insurance companies among the major players in the field, factors that play a role in the customers' choice of a particular insurance company, customers' opinion about the affordability of the premium to be paid, customers' satisfaction with their chosen company, whether customers consider fast and efficient service as a deciding factor, and whether the brand value of the company plays a role in the customers' choice.

Turkish Online Journal of Qualitative Inquiry (TOJQI) Volume 12, Issue 2, March 2021: 801-815

veera venkat satyanarayana penumarthi

An effort is being made in this study to show how Vijayawada customers see insurance services. Respondents to a five-point Likert scale questionnaire were used to compile the data for this research. More than 377 people were surveyed to determine their degree of knowledge and attitude about insurance services. According to a new study, Vijayawada customers' perceptions about insurance services are strongly influenced by socioeconomic and demographic factors. Insurance businesses in Vijayawada may use the results of this research as a basis for developing marketing plans that include socio-demographic and economic factors.

Dharmesh Motwani

IJIRIS:: AM Publications,India

IJIRIS International Journal of Innovative Research in Information Security

The journey of new India Insurances scheme has stated 17th century England. Insurances are the co– operative device who distribute the loss caused by a particular risk.When any insurances company face any types of mismanagement, they should look for a market for that policy instead of constantly lying to the public or with their clients. Market competition brings decrease in price of the insurances company and increase in the quality but customer play their part according to their views. In the unpredictable society insurances paly a secure part but customers face many other problems. In this paper we will discussed about the problems faced by the customers and also their reasons why many people don’t have faith in the insurances company. The paper is focused on the problems faces by the customer in insurances sector.

Pranjal Bezborah

Sathishkumar Ramasamy

the present study analyzes the attitudes of policyholders of Life Insurance Corporation of India with special reference to Tiruchirappalli district, the data were collected and analysed as per the requirement of the study. The primary data were collected from the respondents through interview schedule in June 2011 to March 2012. The study has adopted proportionate stratified random sampling method for selecting 500 respondents. The results revealed the fact that the factors, age, education, marital status, family size, number of earning members, income and awareness have influenced the level of attitude of the policyholders. Whereas the factors like sex, occupation and patronage mentality did not influence the level of attitude.

IP innovative publication pvt ltd

IP Innovative Publication Pvt. Ltd.

With the increase in risk there is need of insurance to bear the losses. Insurance is the instrument used as the financial protection against various contingency. This paper examines the customer perception towards the General Insurance. A study had been conducted at Gwalior region with the sample of 200 respondents to find out the perception of the customer (policyholders). In this context, the respondents’ opinion on the various related statements were collected with a 5 point scaling. Reliability, Factor analysis, multivariate technique had been applied on the data. The result concluded that loyalty, transparency, proficiency, reliable and convenient services are the five factors from the 18 statements on the basis of the expectation of the customers. This study signifies that various customer had different expectation from the insurance company in the studied area.


Revue internationale d'éducation de Sèvres

Maria Cristina Lopes Avelino

International Forestry Working Group Newsletter

Wayne Arendt

Proceedings of the 3rd International Conference on Higher Education Advances

Araceli Maseda Moreno

Expert Opinion on Drug Safety

David Nolan

Bhartiya Krishi Anusandhan Patrika

Arjun Kashyap

Formosa Journal of Applied Sciences

Lisda Van Gobel


Fengyu Zhang

19th Central European …

Alen Lovrencic

Journal of Biological Chemistry

Duy Linh Tran

Insook Choi

Rev Med Inst Mex Seguro Soc

Rafael Leyva

Claudia Campetella


Salah Al-Humood

Yassir Bahri

jose Eleuterio Correa Hoyos

American journal of human genetics

Jean MacCluer

Roczniki Kulturoznawcze

anna rogozinska

Microbiology Resource Announcements

João Vitor Godoy Takashe

Infectious Disease Modelling


Mahvish Malik

Journal of midwifery and reproductive health

Yeliz Meri̇h

Peter Corne

Medicinski glasnik : official publication of the Medical Association of Zenica-Doboj Canton, Bosnia and Herzegovina

Refet Gojak


  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

research paper car insurance

You are free to order a full plagiarism PDF report while placing the order or afterwards by contacting our Customer Support Team.

research paper car insurance

Earl M. Kinkade

We never disclose your personal information to any third parties

Customer Reviews



    research paper car insurance


    research paper car insurance

  3. Car Insurance Policy Document Pdf

    research paper car insurance

  4. (PDF) Car Insurance Plans Could Make a Society Safer

    research paper car insurance

  5. (DOC) Allstate Insurance Review

    research paper car insurance

  6. Car Insurance Templates

    research paper car insurance


  1. Car insurance alaparai😝| #shorts #naveenricky

  2. 😞 Lost my CAR Keys

  3. Car Insurance in Australia

  4. Are Car Insurance Photo Estimates Accurate?

  5. Are Car Insurance Photo Estimates Accurate?

  6. Car Insurance Under $38/Month


  1. Research on the Features of Car Insurance Data Based on Machine Learning

    In this paper, the features of auto insurance data are analyzed, and the most important features affecting auto renewal are mined. The random forest (RF), gradient lifting tree (GBDT) and lifting machine algorithm (LightGBM) are compared. The test results show that: LightGBM model with the best superiority and robustness.

  2. Car Insurance Plans Could Make a Society Safer

    Hugo Benitez-Silva. Yong-Kyun Bae. The number of automobile recalls in the U.S. has sharply increased in the last decade and a half, and the number of units involved in these recalls are often ...

  3. Women and insurance pricing policies: a gender-based analysis ...

    In most of the United States, insurance companies may use gender to determine car insurance rates. In addition, several studies have shown that women over the age of 25 generally pay more than men ...

  4. Autonomous Vehicles and the Future of Auto Insurance

    To investigate the impact that the widespread deployment of autonomous vehicles (AVs) could have on automobile insurance in the United States, RAND Corporation researchers interviewed 43 subject-matter experts from 35 stakeholder organizations and conducted an extensive literature review. A key finding from their research is that the existing ...

  5. Insurance fraud detection: Evidence from artificial intelligence and

    It is primarily responsible for covering the cost of damages due to natural catastrophes and vehicle accidents, which includes auto insurance and third-party motor car liability insurance (Wang and Xu, 2018). With more trust in the positive growth of the insurance sector, more capital will join the insurance market, thus making the competition ...

  6. Insurance 2030—The impact of AI on the future of insurance

    Article (10 pages) Welcome to the future of insurance, as seen through the eyes of Scott, a customer in the year 2030. His digital personal assistant orders him a a vehicle with self-driving capabilities for a meeting across town. Upon hopping into the arriving car, Scott decides he wants to drive today and moves the car into "active" mode.

  7. An Acceptance Approach for Novel Technologies in Car Insurance

    Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications. ... Car insurance, specifically ...

  8. Research on motor vehicle insurance underwriting risk ...

    Abstract. There are many risks in the proceeding of underwriting which affect the development of insurance.So,risk management of underwriting of motor vehicle insurance plays the important role. First, underwriting risks are analyzed by AHP. Secondly, this article discusses how to establish a scientific and practical model to enhance the motor ...

  9. Journal of Risk and Insurance

    The Journal of Risk and Insurance (JRI) is the premier outlet for theoretical and empirical research on the topics of insurance economics and risk management.Research in the JRI informs practice, policy-making, and regulation in insurance markets as well as corporate and household risk management. JRI is the flagship journal for the American Risk and Insurance Association, and is currently ...

  10. Car Insurance Research Methodology

    Published: July 30, 2015 At the start of our car insurance pricing project, we engaged Quadrant Information Services, a private company that collects the mathematical pricing formulas that...

  11. (PDF) Research on the Features of Car Insurance Data Based on Machine

    In this paper, the features of auto insurance data are analyzed, and the most important features affecting auto renewal are mined. The random forest (RF), gradient lifting tree (GBDT) and lifting ...

  12. A survey on driving behavior analysis in usage based insurance using

    The outcome of this research would help the insurance industries to assess the driving risk more accurately and to propose a solution to calculate the personalized premium based on the driving behavior with most importance towards prevention of risk. Introduction

  13. Automobile insurance fraud detection

    2,126 Views 0 CrossRef citations to date 0 Altmetric Articles Automobile insurance fraud detection Mark Anthony Caruana & Liam Grech Pages 520-535 | Published online: 04 Oct 2021 Cite this article Full Article Figures & data References Citations Metrics Reprints & Permissions Read this article

  14. Automobile insurance fraud detection in the age of big data

    The purpose of this paper is to survey the automobile insurance fraud detection literature in the past 31 years (1990-2021) and present a research agenda that addresses the challenges and opportunities artificial intelligence and machine learning bring to car insurance fraud detection.,Content analysis methodology is used to analyze 46 peer ...

  15. (PDF) Motor Insurance Claim Status Prediction using ...

    Based on this situation, this paper offers the new classification model for predicting health insurance claim based on SVM and BA. The metrics utilized for evaluation are accuracy, recall,...

  16. PDF A practical model for pricing optimization in car insurance

    the insurance policy and the outcome of applying optimal premium rates in several customer segments are shown. I. Introduction The sensitivity of car insurance customers to price and how this affects their retention has been a subject of intense analysis in the insurance market research literature. Different

  17. Blockchain Technology in Automobile Insurance Claim Systems Research

    This paper summarizes the application of block chain technology in the auto insurance claim systems. By comparing the implementations of various systems, the existing achievements and their advantages and disadvantages are described. In addition, this paper prospects the key directions of future research on the implementation of blockchain.

  18. PDF Car Insurance: Is No-Fault the Answer?

    auto-insurance is a variation of a pure no-fault compensation system, which is believed by many to lower auto-insurance rates significantly. This paper tests the claim that no-fault insurance lowers the overall premiums paid by motorists. It will compare the benefits and costs of the two legal regimes for auto-insurance that are in question:

  19. Bibliometric review of telematics-based automobile insurance: Mapping

    Furthermore, as reported in this research, the analysis of keyword co-occurrence and its subsequent network visualisation contributed to highlighting the knowledge framework relevant to telematics-based automotive insurance. The knowledge structure of car insurance studies using telematics was mapped using keyword co-occurrence analysis, and ...

  20. Machine Learning in Forecasting Motor Insurance Claims

    In this paper, we employ a series of Machine Learning algorithms (Support Vector Machines-SVM, Decision Trees, Random Forests, and Boosting) to forecast the average (mean) insurance claims amount per insured car per quarter and identify a subset of variables that are the most relevant in determining the average claims amount.

  21. Machine Learning-Based Predictions of Customers' Decisions in Car Insurance

    The paper is the result of cooperation with Aspartus Ltd. on predicting user decision in car insurance. We would like to thank Marcin Wójciuk, Adam Smółkowski and Robert Kluz for all their support in our experiments. Data used in this study are the property of Aspartus Ltd. To inquire about obtaining the dataset please contact the company.

  22. (PDF) Predictive Modeling of Insurance Claims Using ...

    The main objective of this research paper is to build an appropriate mathematical model that helps in forecasting third party claim amount for different categories of vehicles based on the...

  23. Why Car Insurance From Your Car Company Could Be a Better Deal in 2024

    Auto insurance prices skyrocketed in 2023. According to the Bureau of Labor Statistics, the average cost of auto insurance increased by 20.3% from December 2022 to December 2023, making this the ...

  24. (PDF) A Study on Customer Awareness on Car Insurance ...

    The concept of car insurance policies is very much needed aspects to the people who have owned a car, having car insurance policies makes the customers feel protected from the loss or damage if caused by the accident. See Full PDF Download PDF Related Papers Journal of emerging technologies and innovative research

  25. Research Paper On Car Insurance

    Research Paper On Car Insurance 578 Finished Papers Level: College, University, Master's, High School, PHD, Undergraduate Total orders: 7428 1753 Finished Papers First, you have to sign up, and then follow a simple 10-minute order process.