Diving into Data Lakes
Credit unions have one unique and incredibly powerful advantage: collaboration.
Diving into Data Lakes
Credit unions have one unique and incredibly powerful advantage: collaboration.
Introduction
A collaborative industry data lake is the next step to empower credit unions with capabilities that even large banks and fintech startups can only dream of at this point.
The days of the internally developed data warehouse are over. We only win as an industry by collaborating on a common data standard that supports both: a forum for building and sharing applications and a data lake that ignites the research and innovation needed to remain relevant to our members.
We hope you find this ebook valuable, and that your understanding of business intelligence tools is strengthened.
Sincerely,
- The Team at OnApproach
- Sun Tzu
6th century BCE Chinese general, military strategist, and philosopher
Part 1
Understanding Data Lakes
Defining and differentiating data lakes, data warehouses, and other analytic tools.
Header
5 Reasons to Pool Your Data
By Mark Portz
Data continues to prove itself as a necessity for decision-making in financial institutions. For years, major banks and innovative companies such as Google and Amazon have taken advantage of “Big Data” to gain better insights into their customer base and make business decisions to position themselves for the future. The credit union industry is finally beginning to take advantage of their data and utilize new technologies. However, credit unions are much smaller than major banks and simply don’t have the same quantity of data that banks are able to collect from their customers. Fortunately, data pooling serves as a great solution to this problem. Here are 5 reasons your credit union should participate in data pooling:
1. Access to Diverse Data
Header
“Why do I care about the data collected from a credit union on the other side of the country?” This is a frequently asked question when discussing data pools. Of course, it is a valid question. The economy may be different in December in Alaska compared to Florida. However, it is important to recognize that this diversity can actually be a major advantage that should not be overlooked.
As Joe Breeden of Deep Future Analytics explains in a podcast with Best Innovation Group, titled The CECL Effect – How the New Credit Loss Rule will alter Financial Analytics, data diversity is healthy for pooling and advanced analytics. In the podcast he states, “If we get folks spread around the country, in a shared blind repository, then it gives us a better overall view of the scaling of the risk versus economics and other things.” He continues to explain that “We leverage that pool to learn aspects that are in common, like economic sensitivities, but then also to calibrate to the individual… so you get the benefit of the whole, but specific to the individual institution.”
2. Affordable Access to Data Scientists
Data scientists are highly skilled, highly demanded, and expensive resources. They play a major role in analyzing and creating predictive insights (such as ALLL forecasting for CECL) from raw data, which means there is a reason data scientists often earn $175k+ per year.
Credit unions simply don’t have the same assets and hiring power as Google, Microsoft or the large banks which makes hiring a single data scientist a non-option. This is where the power of the data pool comes into play. If a data scientist works on a pool of data, consisting of the data from, say, 50 credit unions, those 50 credit unions get to split the cost of the data scientist, making advanced analytics much more affordable.
3. Encrypted and Secure
Another common concern around the topic of data pooling is the access to private information. In a proper data pool, all personally identifiable information (PII) is encrypted prior to leaving the firewall at the credit union. In the pool, the data is still anonymized. Only after the data reenters the firewall again, is it de-encrypted using a de-encryption key that only the credit union holds.
Data Scientist don’t need to know your individual members’ contact information, SSNs, etc., but all contributing organizations will benefit from sharing data that provides insights into loan risk, for example. Post analysis, you will never even be able to tell your data was pooled, except for the increased accuracy in your results.
Header
4. Quantity of Data for Predictive Analytics
Predictive analytics is no longer a luxury, but a requirement for upcoming regulations such as CECL. It is well-known that more data means more accurate results. Credit unions have potentially very insightful data to learn more about their members, but only if done collectively with the rest of the industry. There is simply not a large enough data set to perform accurate predictive analytics within the individual credit union. 95% of the credit unions in the United States are below $3.0 billion in Assets and do not have enough data to build accurate predictive models.
Fortunately, data pooling is coming to the rescue. Pooling data provides an opportunity to analyze a much larger data set. With a good model, each additional credit union participating in the pool will help to continue to decrease your margin for error and allow you to have more confidence in your data-driven decision making for the future.
5. Near Real Time Industry Data for Peer to Peer Analysis
Although it is highly valuable, it is currently very difficult for credit unions to perform peer to peer analysis in a manner that is near real time. Typically, the best option for credit unions to perform any sort of peer to peer analysis is to compare data captured in 5300 Call Reports. However, this data is collected only once a quarter and likely published at least a month after collection. Valuable insights can be gained from this type of analysis, and it would be beneficial for credit unions to have access to this data before it is 4-5 months old. For example, if you realize your credit union is behind on loan origination, what changes can be made today versus 5 months from now.
A proper data pool makes it possible for credit unions to access industry data and perform analysis on data that is updated daily. This makes it possible to stay on top of industry trends before they have passed.
To learn more, listen to the Joe Breeden BIGcast about data pooling and CECL at https://www.big-fintech.com/Media/BIGcast/ArticleID/269/The-CECL-Effect-How-the-new-credit-loss-rule-will-alter-financial-analytics
Part 2
Getting Credit Unions to the Next Level
Re-imagining the applications of collaborative analytics.
Header
By Peter Keers
Credit union interest in Big Data is at an all-time high. The promise of predictive analytics and other Big Data opportunities will be a key part of helping the industry compete more effectively with traditional banks and fintech upstarts.
However, where does the data for Big Data come from? The answer is simple: from the credit unions themselves. For example, the loan loss forecasts required by CECLmodels will require data from many credit unions to increase their predictive accuracy.
While credit unions are eager to cash in on the Big Data boom, one of the costs is “contribution” of their own data to the Big Data “lake”. A data lake is a virtual “storehouse in the cloud” that holds a vast amount of data that can be used for Big Data analytics.
Header
At this point, credit union decision makers often turn sour on Big Data. Why? The cost of “contribution” is too high. The credit union is obligated to protect the sensitive member data in its care. This data cannot simply leave the credit union’s firewall perimeter and be uploaded to the Data Lake.
The healthcare industry faced a similar conundrum regarding electronic medical records. As medical records evolved from paper to an electronic format, the opportunity to perform analytics on this data was gigantic. Yet, the Health Insurance Portability and Accountability Act (HIPPA), a law about patients' medical records privacy, stood in the way.
To take advantage of this opportunity but still adhere to HIPPA, healthcare analytics companies devised processes to “de-identify” the sensitive data in medical records. In this this way, no specific patient could be uniquely identified while analysts gleaned insights from millions of medical records uploaded by thousands of healthcare providers.
Credit union member data can be handled in a similar way. In fact, the same method for protecting patient privacy can be adapted to the data of credit union members.
In a 2015 publication from the National Institute of Standards and Technology (NIST), the concept of “de-identification” of data is explained. It is defined as, “…a tool that organizations can use to remove personal information from data that they collect, use, archive, and share with other organizations.”
The document describes the HIPPA Safe Harbor method which specifies 18 specific types of data to be de-identified. The list has been altered to replace healthcare data types with credit union data types. The 18 types are:
a. The geographic unit formed by combining all ZIP codes with the same three initial digits contain more than 20,000 people; and
b. The initial three digits of a ZIP code for all such geographic units containing 20,000 or fewer people is changed to 000
Header
An important consideration is how this data is de-identified. Removal of Direct Identifiers is at the heart of de-identification. The NIST document defines Direct Identifiers as “data that directly identifies a single individual. Examples of direct identifiers include names, social security numbers, and email addresses.”
The document notes numerous ways Direct Identifiers can be de-identified:
Header
“Pseudonymization” is an extremely important topic in de-identification. Unlike the techniques above, it allows “linking information belonging to an individual across multiple data records or information systems, provided that all direct identifiers are systematically pseudonymized.”
In layman’s terms, this means that authorized parties can restore de-identified data back to its original form. For example, Member Number is de-identified via pseudonymization. The data is not comprehensible to any unauthorized party. However, when the data returns to the credit union, it can be reversed and integrated back into the database.
Credit unions that can understand the what and how of data de-identification will be better prepared to take advantage of Big Data opportunities.
Company History
OnApproach began in 2005 as a data consulting company. In six years, OnApproach completed over 50 major projects for Fortune 500 companies, such as Toro and Land )' Lakes. In 2009, OnApproach completed an extensive reporting and analytics project for a credit union and realized the significant need for a stardard enterprise data integration solution.
OnApproach became a Credit Union Service Organization (CUSO) in 2014 and received a patent for the M360 Enterprise analytic data model in 2015. To further its commitment to credit union analytics, OnApproach annually co-hosts the Analytics and Financial Innovation (AXFI) Conference with Best Innovation Group to provide a forum for industry collaboration.
OnApproach Today
OnApproach is the only CUSO dedicated to credit union success through a collaborative analytics ecosystem. By providing a secure and frictionless data experience, OnApproach empowers credit unions to take full control of their own data and their own futures. We exist to serve the credit union movement with technology and expertise required for the digital transformation of the industry business model. OnApproach’s collaborative ecosystem enables communities of users, data scientists, and application developers focused on analytics innovation.
OnApproach is the creator of the CU Analytics Ecosystem, a network of credit unions interconnected through a common data integration platform (leveraging the CUFX standards) that is powered by OnApproach M360 data integration middleware. The CU Analytics Ecosystem is a collaborative ecosystem that enables communities of users, data scientists, and application developers focused on innovation, driven by analytics.
Make the Most of Your Data
OnApproach is a CUSO dedicated to credit union success through a collaborative analytics ecosystem. By providing a secure and frictionless data experience, OnApproach empowers credit unions to take full control of their own data and their own futures.
We exist to serve the credit union movement with technology and expertise required for the digital transformation of the industry business model. OnApproach’s collaborative ecosystem enables communities of users, data scientists, and application developers focused on analytics innovation.
Learn more at OnApproach.com or