In this thesis work the use of graphical models is proposed to the analysis of credit scoring. In particular the applied application is related to the behavioural scoring which is defined by Thomas (1999) as ‘the systems and models that allow lenders to make better decisions in managing existing clients by forecasting their future performance’. The multivariate statistical models, named chain graph models, proposed for the application allow us to model in a proper way the relation between the variables describing the behaviour of the holders of the credit card. The proposed models are named chain graph models. They are based on a log-linear expansion of the density function of the variables. They allow to: depict oriented association between subset of variables; to detect the structure which accounts for a parsimonious description of the relations between variables; to model simultaneously more than one response variable. They are useful in particular when there is a partial ordering between variables such that they can be divided into exogenous, intermediate and responses. In the graphical models the independence structure is represented by a graph. The variables are represented by nodes, joint by edges showing the dependence in probability among variables. The missing edge means that two nodes are independent given the other nodes. Such class of models is very useful for the theory which combines them with the expert systems. In fact, once the model has been selected, it is possible to link it to the expert system to model the joint and marginal probability of the variables. The first chapter introduces the most used statistical models for the credit scoring analysis. The second chapter introduces the categorical variables. The information related to the credit card holder are stored in a contingency table. It illustrates also the notion of independence between two variables and conditional independence among more than two variables. The odds ratio is introduced as a measure of association between two variables. It is the base of the model formulation. The third chapter introduces the log-linear and logistic models belonging to the family of generalized linear models. They are multivariate methods allowing to study the association between variables considering them simultaneously. A log-linear parameterization is described in details. Its advantage is also that it allow us to take into account of the ordinal scale on which the categorical variables are measured. This is also useful to find the better categorization of the continuous variables. The results related to the maximum likelihood estimation of the model parameters are mentioned as well as the numerical iterative algorithm which are used to solve the likelihood equations with respect to the unknown parameters. The score test is illustrated to evaluate the goodness of fit of the model to the data. Chapter 4 introduces some main concepts of the graph theory in connection with their properties which allow us to depict the model through the graph, showing the interpretative advantages. The sparsity of the contingency table is also mentioned, when there are many cells. The collapsibility conditions are considered as well. Finally, Chapter 5 illustrates the application of the proposed methodology on a sample composed by 70000 revolving credit card holders. The data are released by a one of biggest Italian financial society working in this sector. The variables are the socioeconomic characteristics of the credit card holder, taken form the form filled by the customer when asking for the credit. Every months the society refines the classification of the customers in active, inactive or asleep according to the balance. The application of the proposed method was devoted to find the existing conditional independences between variables related to the two responses which are the balance of the account at two subsequent dates and therefore to define the profiles of most frequently users of the revolving credit card. The chapter ends with some conclusive remarks. The appendix of the chapter reports the code of the used statistical softwares.
Il presente lavoro di tesi illustra un'applicazione dei modelli grafici per il l’analisi del credit scoring comportamentale o behavioural scoring. Quest'ultimo e' definito come: ‘the systems and models that allow lenders to make better decisions in managing existing clients by forcasting their future performance’, secondo Thomas (1999). La classe di modelli grafici presa in considerazione e’ quella dei modelli garfici a catena. Sono dei modelli statistici multivariati che consetono di modellizzare in modo appropriato le relazioni tra le variabili che descrivono il comporatemento dei titoloari della carta. Dato che sono basati su un'espansione log-lineare della funzione di densità delle variabili consentono di rappresentare anche graficamente associazioni orientate, inerenti sottoinsiemi di variabili. Consentono, inoltre, di individuare la struttura che rappresenti in modo più parsimonioso possibile tali relazioni e modellare simultaneamente più di una variabile risposta. Sono utili quando esiste un ordinamento anche parziale tra le variabili che permette di suddividerle in meramente esogene, gruppi d’intermedie tra loro concatenate e di risposta. Nei modelli grafici la struttura d’indipendenza delle variabili viene rappresentata visivamente attraverso un grafo. Nel grafo le variabili sono rappresentate da nodi legati da archi i quali mostrano le dipendenze in probabilità tra le variabili. La mancanza di un arco implica che due nodi sono indipendenti dati gli altri nodi. Tali modelli risultano particolarmente utili per la teoria che li accomuna con i sistemi esperti, per cui una volta selezionato il modello è possibile interrogare il sistema esperto per modellare la distribuzione di probabilità congiunta e marginale delle variabili. Nel primo capitolo vengono presentati i principali modelli statistici adottati nel credit scoring. Il secondo capitolo prende in considerazione le variabili categoriche. Le informazioni sui titolari di carta di credito sono, infatti, compendiate in tabelle di contingenza. Si introducono le nozioni d’indipendenza tra due variabili e di indipendenza condizionata tra più di due variabili. Si elencano alcune misure d’associazione tra variabili, in particolare, si introducono i rapporti di odds che costituiscono la base per la costruzione dei modelli multivariati utilizzati. Nel terzo capitolo vengono illustrati i modelli log-lineari e logistici che appartengono alla famiglia dei modelli lineari generalizzati. Essendo metodi multivariati consentono di studiare l’associazione tra le variabili considerandole simultaneamente. In particolare viene descritta una speciale parametrizzazione log-lineare che permette di tener conto della scala ordinale con cui sono misurate alcune delle variabili categoriche utilizzate. Questa è anche utile per trovare la migliore categorizzazione delle variabili continue. Si richiamano, inoltre, i risultati relativi alla stima di massima verosimiglianza dei parametri dei modelli, accennando anche agli algoritmi numerici iterativi necessari per la risoluzione delle equazioni di verosimiglianza rispetto ai parametri incogniti. Si fa riferimento al test del rapporto di verosimiglianza per valutare la bontà di adattamento del modello ai dati. Il capitolo quarto introduce alla teoria dei grafi, esponendone i concetti principali ed evidenziando alcune proprietà che consentono la rappresentazione visiva del modello mediante il grafo, mettendone in luce i vantaggi interpretativi. In tale capitolo si accenna anche al problema derivante dalla sparsità della tabella di contingenza, quando le dimensioni sono elevate. Vengono pertanto descritti alcuni metodi adottati per far fronte a tale problema ponendo l’accento sulle definizioni di collassabilità. Il quinto capitolo illustra un’applicazione dei metodi descritti su un campione composto da circa sessantamila titolari di carta di credito revolving, rilasciata da una delle maggiori società finanziarie italiane operanti nel settore. Le variabili prese in esame sono quelle descriventi le caratteristiche socioeconomiche del titolare della carta, desumibili dal modulo che il cliente compila alla richiesta di finanziamento e lo stato del conto del cliente in due periodi successivi. Ogni mese, infatti, i clienti vengono classificati dalla società in: ‘attivi’, ‘inattivi’ o ‘dormienti’ a seconda di come si presenta il saldo del conto. Lo scopo del lavoro è stato quello di ricercare indipendenze condizionate tra le variabili in particolare rispetto alle due variabili obbiettivo e definire il profilo di coloro che utilizzano maggiormente la carta. Le conclusioni riguardanti le analisi effettuate al capitolo quinto sono riportate nell’ultima sezione. L’appendice descrive alcuni dei principali programmi relativi ai software statistici utilizzati per le elaborazioni.
(2000). Metodi statistici multivariati applicati all'analisi del comportamento dei titolari di carta di credito di tipo revolving. (Tesi di specializzazione, Universita' degli studi di Perugia, 2000).
Metodi statistici multivariati applicati all'analisi del comportamento dei titolari di carta di credito di tipo revolving
PENNONI, FULVIA
2000
Abstract
In this thesis work the use of graphical models is proposed to the analysis of credit scoring. In particular the applied application is related to the behavioural scoring which is defined by Thomas (1999) as ‘the systems and models that allow lenders to make better decisions in managing existing clients by forecasting their future performance’. The multivariate statistical models, named chain graph models, proposed for the application allow us to model in a proper way the relation between the variables describing the behaviour of the holders of the credit card. The proposed models are named chain graph models. They are based on a log-linear expansion of the density function of the variables. They allow to: depict oriented association between subset of variables; to detect the structure which accounts for a parsimonious description of the relations between variables; to model simultaneously more than one response variable. They are useful in particular when there is a partial ordering between variables such that they can be divided into exogenous, intermediate and responses. In the graphical models the independence structure is represented by a graph. The variables are represented by nodes, joint by edges showing the dependence in probability among variables. The missing edge means that two nodes are independent given the other nodes. Such class of models is very useful for the theory which combines them with the expert systems. In fact, once the model has been selected, it is possible to link it to the expert system to model the joint and marginal probability of the variables. The first chapter introduces the most used statistical models for the credit scoring analysis. The second chapter introduces the categorical variables. The information related to the credit card holder are stored in a contingency table. It illustrates also the notion of independence between two variables and conditional independence among more than two variables. The odds ratio is introduced as a measure of association between two variables. It is the base of the model formulation. The third chapter introduces the log-linear and logistic models belonging to the family of generalized linear models. They are multivariate methods allowing to study the association between variables considering them simultaneously. A log-linear parameterization is described in details. Its advantage is also that it allow us to take into account of the ordinal scale on which the categorical variables are measured. This is also useful to find the better categorization of the continuous variables. The results related to the maximum likelihood estimation of the model parameters are mentioned as well as the numerical iterative algorithm which are used to solve the likelihood equations with respect to the unknown parameters. The score test is illustrated to evaluate the goodness of fit of the model to the data. Chapter 4 introduces some main concepts of the graph theory in connection with their properties which allow us to depict the model through the graph, showing the interpretative advantages. The sparsity of the contingency table is also mentioned, when there are many cells. The collapsibility conditions are considered as well. Finally, Chapter 5 illustrates the application of the proposed methodology on a sample composed by 70000 revolving credit card holders. The data are released by a one of biggest Italian financial society working in this sector. The variables are the socioeconomic characteristics of the credit card holder, taken form the form filled by the customer when asking for the credit. Every months the society refines the classification of the customers in active, inactive or asleep according to the balance. The application of the proposed method was devoted to find the existing conditional independences between variables related to the two responses which are the balance of the account at two subsequent dates and therefore to define the profiles of most frequently users of the revolving credit card. The chapter ends with some conclusive remarks. The appendix of the chapter reports the code of the used statistical softwares.File | Dimensione | Formato | |
---|---|---|---|
Tesi_di_Laurea_Pennoni_Last.pdf
accesso aperto
Dimensione
5.85 MB
Formato
Adobe PDF
|
5.85 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.