It seems today that every industry is trying its best to comprehend and get an in-depth understanding of their customers’ needs. Indeed, banks and financial institutions are no excepetion and are leveraging data to derive insights about their business, and their customers. But today, insights are simply not enough – they need to be timely and actionable as well.
Yes, it’s not just about analytics now, but real-time analytics. Teradata’s work implementing real-time credit scoring for a reputed bank in Turkey is one such example of enabling real-time analytics to maximize revenue and minimize cost & time. Being one of largest banks in Turkey with more than 700 branches across the country and assets estimated at over ₺90,410,000,000 Turkish liras, our client was focused on optimizing its credit scoring pipeline to improve assessments and yield more precise scores, faster.
The end-to-end pipeline included creation of variables on the Vantage SQL engine and leveraging Teradata performance to replicate the SAS based credit scoring models to the Vantage Machine Learning Engine without loss in any model performance measures and ensuring in-database scoring. Variables were created in real time, with data coming from the bank’s DWH, as well as Turkey’s Credit Bureau. We started by optimizing the Oracle-based PL-SQL queries and translating them to Teradata SQL using SQL-E functions, Ordered Analytics, ADS creation tactics, UDFs and Stored Procedures. This process helped us create approximately 1000 variables at run time used in the machine learning model for credit score prediction.
Once the ADS creation process for training and scoring was in place, a Random Forest model was created using MLE after exploring other algorithms as options. Various perimeters, training sets and techniques were used to create the final Random Forest model with an 80% default catch rate. The statistically satisfied specifics of the models remained at 100 trees, max depth 8, node size 1 for a training set with a 22.19 % default rate. Some Oversampling & Overfitting was deliberately done to compensate for a low default rate.
An end-to-end real-time simulated pipeline running in 5.25 seconds -- which included creating the ADS variables and scoring one INQ_NUMBER -- was handed over to the IT and business teams at the Bank. This was a major win for the account team, bringing the entire process from less than 30 minutes to just 5.25 seconds, which provided the bank’s credit loan department with actionable insights and allowed them to make timely decisions.