Kaggle hosts data science competitions. And Home Credit, a consumer finance lender spanning central and eastern Europe to China and southeast Asia, used a Kaggle competition to develop its credit risk decision-making based on alternative data.
“We chose to host a competition for several reasons,” says Kirill Odintsov, head of the data modelling team at Home Credit. “It was partly to develop our understanding of best practice, partly to connect with some of the best people in this area and partly to raise our profile.”
It worked on all three fronts. The competition, which was then the largest they had ever hosted, attracted a lot of attention, with more than 7,000 teams from over 100 countries competing for the top prize of $35,000 – the total prize money was $70,000 – though the monetary reward is only part of the reason competitors took part.
“The reason why this competition was so popular was twofold,” says Maggie Demkin, head of customer success and partnerships at Kaggle. “First, the data was very clean and easy to work with. Second, the question that Home Credit was looking to solve was challenging, so it inspired people to collaborate and share their ideas.”
There might also be another reason that made Home Credit’s contest so attractive. “Fintechs are typically garage startups with trendy propositions. But we are a global powerhouse that issued 37 million new loans in the past year alone, so any idea that can lead to even a slight improvement in our approval is not only extremely valuable to us, but can also be scaled up to incredible heights very rapidly. And that’s a big draw for Kaggle contestants,” says Radek Pluhar, Home Credit’s chief risk officer.
Key to participation is the intellectual satisfaction of finding new solutions or proving that a solution works. Universities can use Kaggle as a vehicle for learning, while host companies can form relationships with some of the best and brightest, not only from the ivory towers.
The Kaggle community is composed of professionals, students and enthusiasts. Home Credit has already employed some participants in Prague, the Czech Republic, and Gurgaon, India, and is also in discussions with other companies as a result of the competition.
“The competition helped to steer our evolution in certain directions; it helped us to stay sharp and on top of trends,” says Mr Odintsov.
Home Credit’s competition involved predicting default rates for individual loans. The company is increasingly offering loans online, where it is difficult to use traditional assessment methods. Finding a way to assess credit risk more quickly is key to improving customer service and reducing defaults.
Crowdsourcing helps us to attract the world’s best-known researchers and top talents to work for us
The thrust of Home Credit’s work is developing the use of alternative, publicly available data, such as the time of day an individual’s mobile phone is switched on, to build up a realistic profile quickly to be able to make the decision to lend.
The competition used real-life, anonymised data and asked participants to predict which loans would default. The predictions were compared against actual results to decide the winner.
“Kaggle is an important platform for academics, who don’t usually come into contact with real-life data,” says Mr Odintsov. “They have a lot of solutions, but no data. Kaggle provides them with the opportunity to try out ideas in a strong and supportive community.”
The biggest challenge for Home Credit was preparing the data; it took four to five months before the legal department was happy and the competition was built with advice from Kaggle. Finding clear, mathematically defined criteria for success is important.
But once the competition was up and running, there was additional work for Home Credit’s modelling team. Kaggle competitions are, crucially, an interactive process between host and competitors.
“If you want to have a positive impact on your brand, you have to make the time to answer questions during the competition,” says Mr Odintsov. Two people were dedicated to fielding questions for the three months of the competition; one of the biggest hurdles for the competitors was grasping the nature of the financial services industry, so Home Credit had to ensure there were staff available to respond.
And the results? It was never intended that the outcome would be a cut-and-paste solution. “It’s more about identifying best practice, developing our modelling,” says Mr Odintsov. ”We could see that some more traditional methods like gradient boosting perform better in default prediction modelling than the deep neural network applications. Such tools have served us quite well in text-mining though. It generated lots of ideas and we are looking at hosting another competition soon.”
Home Credit BV
Home Credit BV is a global platform, which centrally manages core strategy, technology, risk and products for consumer finance operations in emerging markets in Central and Eastern Europe; the Commonwealth of Independent States; China, south and southeast Asia. Founded in 1997, Home Credit focuses on responsible lending primarily to people with little or no credit history. It has served over 119 million customers through a distribution network comprising 426,900 points of sale and online.
Customers are junior white-collar and blue-collar workers with limited credit records, who are underserved by traditional banks, often first-time borrowers and sometimes have no bank account, so can be difficult to assess from a risk perspective. Credit bureaus are limited in less developed countries, with information on perhaps only a third of the population. Working in areas where there are limited financial resources and a nascent financial industry requires a different approach to lending.
Kaggle
Kaggle is an online community of data scientists and machine-learners. The company, which is owned by Google, hosts machine-learning competitions in conjunction with commercial companies, which provide the data and prize money.
Registered users from around the world compete to solve real-life problems, by building the best algorithm. Work is shared publicly, with immediate feedback on a leader board, and the prize money is exchanged for the legal right to use the winning solution.
Kaggle has more than two million members. Problems that have been aired on the platform range from predicting the effect of genetic variants for more personalised medicine, and using satellite data to track human footprints in the Amazon rainforest, to predicting what songs a user will listen to next.
The commercial benefits are clear, but not all data-learning projects are created equal. Competitions, which typically run for a few months, require a tightly defined problem and clean data. The minimum prize pool is $25,000, but Kaggle estimates a commercial budget of between $85,000 to $200,000; it can be a lot more for bespoke operations.
Competitions can be time-consuming to achieve the best results, whether in terms of preparing the data or answering questions from the community once the competition is up and running.