from CoreLogic
— this post authored by Bret Fortenberry
CoreLogic has determined the regions of the U.S. that have the highest correlation with the National Mortgage Fraud Risk index, based on a tracking score1. The regions that are most highly correlated with fraud risk are areas that will be the best predictors of nationwide mortgage fraud.
In fact, one can look at a few highly correlated regions to predict fraud risk on a national scale.
The heatmap (figure 3) shows the correlation of each region to the National Trend. Mousing over a region shows the region name, the tracking score, and the percentage level of the lowest to highest possible tracking score (-1.0 to 1.0). The heatmap has two layers (that can be toggled in the top-right menu of the map), one for state and one for CBSA. The CBSAs are limited to the top 50 CBSAs based on population.
California and Maryland have the highest correlation with the national trend for risk (see figure 2). The two states have tracking scores of 0.49 and 0.47 respectfully. To put this in perspective, the next highest correlated state is Massachusetts with a tracking score of 0.1. All other states have a tracking score less than a 0. When California and Maryland are combined by averaging, the tracking score climbs to 0.72. The correlation typically increases the more regions that are added because the national score is a combination of all regions. However, the combined correlation gets worse when combining more states in descending order of correlation. It requires combining more than 6 states before it becomes better than combining California and Maryland alone.
Finding the states that are correlated is good but looking at smaller regions is better. Smaller regions have a reduced number of contributing fraud factors to analyze. Along with the states, we also looked at the correlation for metropolitan areas, commonly referred to as core-based statistical areas (CBSA). Utilizing the same process, the number of CBSAs that best fits the national trend can be reduced to three. CBSAs are smaller than states and are less likely to be predictive of the national trend (see figure 1). However, combining only three CBSAs provide a strong correlation to the National Fraud Risk Trend. The three CBSAs are Baltimore-Columbia-Townson with a tracking score of 0.43, San Francisco-Oakland-Heyward with a tracking score of 0.26, and San Diego-Carlsbad with a tracking score of 1.6. Boston-Cambridge-Newton is the only other CBSA with a tracking score higher than 0. The top 3 CBSAs combined has a tracking score of 0.64. Combining more CBSAs will slightly increase the correlation percentage but not significantly. There are 935 CBSAs in the Nation and the top three most correlated CBSAs only cover 12.2 Million out of 319 Million people (3.8%) in the US.
The national trend is not influenced by the largest population CBSAs as one might expect, due to more fraud instances given a larger volume of mortgages. The top three CBSAs based on population (New York City, Las Angeles, and Chicago) with the highest of the three having a tracking score of -0.96 and a combined tracking score of -1.0. The same is true for CBSA’s with the highest fraud risk (Miami, Daytona Beach, and New York City), each one having a tracking score of -1.0.
Understanding the highly correlated regions will help to identify the contributing factors that lead to fraud. When looking across the nation, the number of potential factors is large and with the combination of the factors, the number becomes very large. This make it almost impossible to find the contributing factors. It is exciting to see that the correlated regions are limited to just a couple of CBSA because it might reduce the number of potential factors to the point that we can identify the contributing factors.
Source
http://www.corelogic.com/blog/authors/bret-fortenberry/2017/08/who-are-the-geographic-influencers-for-fraud-risk.aspx
[1] Methods: The tracking score used for the ranking is the computed from the correlation coefficient and mean square error. The correlation coefficient is the deviation from the mean and not the original data point and is not adequate for measuring trend lines. The mean square error was used as a secondary metric to penalize the correlation coefficient score as the distance between the two trend lines increase. The correlation coefficient goes from 1.0 as a perfect score and -1.0 as the lowest correlated. Scores can go below -1 due to the penalization of the mean square error. All scores below -1 were set to -1.
©2017 CoreLogic, Inc. All rights reserved.