Mihaela David, Danut-Vasile Jemna


Within non-life insurance pricing, an accurate evaluation of claim frequency, also known in theory as count data, represents an essential part in determining an insurance premium according to the policyholder’s degree of risk. Count regression analysis allows the identification of the risk factors and the prediction of the expected frequency of claims given the characteristics of policyholders. The aim of this paper is to verify some aspects related to the methodology of count data models and also to the risk factors used to explain the frequency of claims. In addition to the standard Poisson regression, Negative Binomial models are applied to a French auto insurance portfolio. The best model was chosen by means of the log-likelihood ratio and the information criteria. Based on this model, the profile of the policyholders with the highest degree of risk is determined.

Full text: PDF


claim frequency, count data models, Poisson model, overdispersion, mixed Poisson models, negative binomial models, risk factors

JEL Codes

G22 - Insurance; Insurance companies; Actuarial Studies


Allain, E., and Brenac, T., 2012. Modèles linéaires généralisés appliqués à l'étude des nombres d'accidents sur des sites routiers: le modèle de Poisson et ses extensions. Recherche Transports Sécurité, 72, 3-18.

Antonio, K., Frees, E. W., and Valdez, E. A., 2012. A multilevel analysis of intercompany claim counts. ASTIN Bulletin, 40(1), 150-177.

Antonio, K., and Valdez, E. A., 2010. Statistical concepts of a priori and a posteriori risk classification in insurance. Advances in Statistical Analysis, 96(2), 187-224.

Boucher, J. P., Denuit, M., and Guillen, M., 2007. Risk classification for claims counts - A comparative analysis of various zero-inflated mixed Poisson and hurdle models. North American Actuarial Journal, 11(4), 110-131.

Boucher, J. P., Denuit, M., and Guillen, M., 2008. Models of insurance claim counts with time dependence based on generalization of Poisson and Negative Binomial Distributions. Advancing the Science of Risk Variance, 2(1), 135-162.

Boucher, J. P., and Guillen, M., 2009. A survey on models for panel count data with applications to insurance. Revista de la Real Academia de Ciencias Exactas, Fisicas y Naturales, 103(2), 277-295.

Boucher, J. P., Perez-Marin, A. M., and Santolino, M., 2013. Pay-as-you-drive insurance: the effect of the kilometers on the risk of accident. Anales del Instituto de Actuarios Españoles, 19(3), 135-154.

Cameron, A. C., and Trivedi, P. K., 1986. Econometric models based on count data. Comparisons and applications of some estimators and tests. Journal of Applied Econometrics, 1(1), 29-53.

Cameron, A. C., and Trivedi, P. K., 1990. Regression-based tests for overdispersion in the Poisson model. Journal of Econometrics, 46(3), 347-364.

Cameron, A. C., and Trivedi, P. K., 1998. Regression Analysis of Count Data. New York: Cambridge University Press.

Cameron, A. C., and Trivedi, P. K., 1999. Essentials of Count Data Regression (Chapter 15). In B. B.H. (Ed.), A Companion to Theoretical Econometrics. Malden, MA: Blackwell Publishing Ltd. .

Charpentier, A., and Denuit, M., 2005. Tome II: Tarification et provisionnement. Paris: Economica.

Denuit, M., and Lang, S., 2004. Nonlife ratemaking with bayesian GAM’s. Insurance: Mathematics and Economics, 35(3), 627-647.

Denuit, M., Maréchal, X., Pitrebois, S., and Walhin, J. F., 2007. Modeling of claim counts. Risk Classification, Credibility and Bonus-Malus Systems. Chichester: Wiley.

Dionne, G., and Vanasse, C., 1989. A generalization of auto insurance rating models: the negative binomial distribution with a regression component. ASTIN Bulletin, 19(2), 199-212.

Dionne, G., and Vanasse, C., 1992. Auto insurance ratemaking in the presence of asymmetrical information. Journal of Applied Econometrics, 7(2), 149-165.

Gourieroux, C., and Jasiak, J., 2001. Dynamic Factor Models. Econometric Reviews, Taylor & Francis Journals, 20(4), 385-424.

Gourieroux, C., and Jasiak, J., 2004. Heterogeneous INAR(1) model with application to car insurance. Insurance: Mathematics and Economics, 34(2), 177-192.

Gourieroux, C., Monfort, A., and Trognon, A., 1984a. Pseudo Maximum Likelihood Methods: Theory. Econometrica, 52(3), 681-700.

Gourieroux, C., Monfort, A., and Trognon, A., 1984b. Pseudo Maximum Likelihood Methods: Applications to Poisson Models. Econometrica, 52(3), 701-720.

Greene, W. H., 2002. Econometric Analysis. New Jersey: Prentice Hall.

Greene, W. H., 2008. Functional forms for the negative binomial model for count data. Economics Letters, 99(3), 585-590.

Greenwood, M., and Yule, G. U., 1920. An inquiry in to the nature of frequency distributions of multiple happenings, with particular reference to the occurrence of multiple attacks of disease or repeated accidents. Journal of the Royal Statistical Society A, 83, 255-279.

Gurmu, S., 1991. Tests for detecting overdispersion in the positive Poisson regression model. Journal of Business and Economic Statistics, 9(2), 215-222.

Hausman, J., Hall, B., and Griliches, Z., 1984. Economic models for count data with an application to the patents - R&D relationship. Econometrica, 52(4), 909-938.

Hilbe, J. M., 2007. Negative Binomial Regression. New York: Cambridge University Press.

Hilbe, J. M., 2014. Modeling Count Data. New York: Cambridge University Press.

Jong, P., and Heller, G. Z., 2013. Generalized Linear Models for Insurance Data (5th ed.). New York: Cambridge University Press.

Jorgensen, B., 1997. The Theory of Dispersion Models. London: Chapman and Hall.

Kouki, M., 2007. Conducteurs novices et conducteurs expérimentés: Approche économétrique sur la sinistralité et la couverture d’assurance. Working Paper.

Kuha, J., 2004. AIC and BIC comparisons of assumptions and performance. Sociological Methods and Research, 33, 188-229.

Lawless, J. F., 1987. Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics, 15(3), 209-225.

McCullagh, P., and Nelder, J. A., 1989. Generalized Linear Models (2nd ed.). London: Chapman and Hall.

Nelder, J. A., and Wedderburn, R. W. M., 1972. Generalized linear interactive models. Journal of the Royal Statistical Society A, 135(3), 370-384.

Vasechko, O. A., Grun-Réhomme, M., and Benlagha, N., 2009. Modélisation de la fréquence des sinistres en assurance auto. Bulletin Français d’Actuariat, 9(18), 41-63.

Winkelmann, R., 2004. Co-payments for prescription drugs and the demand for doctor visits - Evidence from a natural experiment. Health Economics, 13(11), 1081-1089.

Yip, K., and Yau, K., 2005. On modeling claim frequency data in general insurance with extra zeros. Insurance: Mathematics and Economics, 36(2), 153-163.