6  Association Rule Mining

6.1 Overview

Association Rule Mining, henceforth ARM in this section, is the process of identifying commonly occuring combinations within a dataset. The method allows one to explore and uncover potentially unknown associations and groupings in data that are not necessarily immediately evident upon a direct human inspection.

A common application of these methods is that of assessing common purchases by customers in a store or e-commerce website (e.g. if a customer buys a portable music player, is it common or frequent for those customers to purchase a USB cable, or Bluetooth headphones, or potentially both?). Manually inspecting or searching through the data with ad-hoc or independently built algorithms may not be effective or efficient to identify such patterns. When considering an application such as customer purchases, or transactional data, the possibility of combinations of items in each purchase is conceptually unbounded, and using brute-force methods to seek out these combinations will exceed the available computational resources of a machine before identifying anything of use.

So, to examine transactions for potentially insightful or useful metrics, one needs to go about said search wisely and efficiently. This is where a handful of algorithms and methods come into play, including Frequent Pattern growth and Apriori.

What is a frequent pattern?

A frequent pattern is defined as a collection of items that occur at or above a specified threshold within a dataset of transactions. When measuring frequent patterns, each item in the collection is unique, and multiple instances of the same item within an individual transaction are ignored so as to bring focus on the occurence of unique items being grouped together within the dataset. Since the threshold can be specified by the person conducting association rule mining, frequent is a relative and subjective term in this context, as it is based on the relative frequency of occurence of the group of items, and whether or not that relative frequency exceeds an arbitrary threshold.

Common items that are connected, grouped, or bundled together are be referred to as frequent patterns within datasets.

How does apriori work?

The Apriori algorithm leverages bayesian probability and induction to… To perform apriori ARM, one must set thresholds for the metric for which one is measuring the data. This is key, because depending on the number of unique items or the number of transactions within the dataset in question, the

To search for these connections and associations, the method leverages Bayes rule for probability, metrics such as entropy or Gini indices within the Apriori algorithm. The algorithm iteratively and inductively examines data for frequent patterns, generally, in the following manner -

  1. Calculate or determine the relative frequency of item groups or sets of length 1 (based on input threshold).

  2. Prune items that do not meet the input threshold from further consideration.

  3. For patterns of length \(n\), examine combinations of frequent patterns with length \(n-1\) and length 1 (e.g. for length 2, combine frequent item sets of length 1 and 1), and calculate those combinations’ relative frequencies.

  4. Retain only sets of items that meet the initially established threshold.

  5. Repeat steps 3 and 4 until no more sets of items of length \(n\) meet the specified threshold.

So one can see that as the apriori algorithm proceeds, it does have an eventual halting point, depending on the initially established thresholds. The predominant metric and threshold is the relative frequency of occurence of an item (or combination of items). This relative frequency is also known as the support of the combination of items within the dataset.

What is an association rule?

An association rule goes beyond individual relative frequencies of items or combinations thereof, and begin telling more about how strongly connected certain item combinations are within the data. ARM establishes a connection between an antecedent (or prior) and a consequent (or posterior) set of items within the dataset. The combination of antecedents and consequents are what form assosciation rules. It is similar to asking the question, given a customer has already placed items \(A\) and \(B\) in their shopping cart, what is the probability or likelihood that they will next place \(C\) in their basket? Knowing relative frequencies of individual items and combinations thereof across many transactions is necessary to answer this question, but does not necessarily answer how certain or strong those associations are.

How do we measure the strength of the rules?

Support has already been discussed in this section as the relative frequency of occurance of a set of items wihtin source data. Strength of association rules are measured with metrics including support, confidence, and lift.

Confidence and lift tell us the most about the strength of a rule. High confidence (ranging from 0 to 1, with 1 being the highest) tell us how often this collection of items occurs.

  • Confidence - How often the items A and B occur together given the number of times A occurs. Helps us in that if someone is just buying A and B together and not C, we can rule out C at that point in time. \(P(B|A) = \frac{P(A\cap B)}{P(A)}\)

  • One can define a threshold for mininum support and confidence as initial parameters when beginning to build association rules. Once set these values are set, they serve as a filter that adjusts the number of rules that are found, and helps determine how long or specific those rules can be. Generally, since the algorithm is inductive, lengthy rules are rare (e.g. will have low support). By setting lower initial thresholds for support, more rules can be mined.

  • Lift - gives us the indepdendent occurence probability of item A and B. We observe that there is alot of between this random occurence and association. \(\frac{P(A\cap B)}{P(A)\cdot P(B)}\)

    • The calculation of lift is based upon the assumption of statistical independence - \(A\) and \(B\) are indpendent \(\iff\) \(P(A\cap B) = P(A)\cdot P(B)\). So, with the fraction \(\frac{P(A\cap B)}{P(A)\cdot P(B)}\), it transforms the calculation in such a way we can garner important insight.

    • Lift values equal to 1 signify item occurences that are independent of one another.

    • Lift values greater than 1 are akin to saying the sum is greater than its parts, and gives greater creedence to a calculated confidence value. The higher lift is, the more assurance that we have that the confidence is meaningful and impactful.

    • Lift values less than 1 signify that there is an inverse relationship between the items in question, and that having one actually reduces the chances of the other occuring. The closer to zero this value approaches, the stronger the inverse relationship is.

How does one interpret association rules?

ARM does not establish causal relationships between antecedents and consequents. It is a frequentist method to examine relative probabilities within transaction data. When interpreting association rules, one can comment on the strength of identified rules using metrics like confidence and lift. Lift values for association rules are tantamount to pearson R correlation values with some differences in the range of the potential result:

  • a lift value substantially higher than 1 is analogous to a high, positive Pearson R value close to +1

  • a small lift value, very close to 0, is analogous to a low, negative Pearson R value close to -1

  • a lift value of 1 is analogous to a Pearson R value of 0

With this similarity to correlation, one can interpret mined rules with high lift and high confidence with statements such as “Customers who buy \(A\) and \(B\) almost always buy \(C\).” And similarly, for a very low lift and high confidence, “Customers who buy \(A\) rarely if ever also buy \(B\).” One should not interpret mined rules in such manners as “Customers buy \(B\) because they bought \(A\)” or that “Customers who have \(A\) and \(B\) need \(C\).” The later statements are causal in nature, and such relationships are not established via ARM.

6.2 ARM within this Study

ARM for the purpose of this study can help examine some of the findings from the CNN article, and help examine their findings as well as explore other research questions with respect to the top 5 lenders. Examining associations in which the consequent is either a result of loan approval or loan denial is of interest here. Furthermore, performing similar actions where antecedents include the specific financial institution, an individual or collection of protected class information, and other important features should be examined to pursue answers to the research questions established in Chapter 1.

6.3 Data

Applying ARM to the collected HMDA mortgage data is somewhat of a challenge. The data itself is not necessarily organized in a way that is immediately conducive to searching for associations; it contains a mixture of quantitative and qualitative data. To perform ARM, we need transactional data - a list of all things that effectively went into the “basket” of each mortgage application. Additionally, numeric information is a detriment to identifying patterns, as any variable or feature that sits along an interval or continuous scale has countless possibile values which it can take on, and as such, identifying frequent patterns and results in the data may not be possible.

Generally, to prepare this mortgage data for use in ARM, a few actions were necessary to establish features as available and usable:

  • perform discretization and binning of numeric variables into distinct categories

    • numeric variables were divided on percentile boundaries of width 20, including 0-20, 21-40, 41-60, 61-80, and >80.

    • which numeric vars?

  • add features of each mortgage application into a basket

  • transform the resulting baskets into a one-hot element frame of data

  • pivot data into single format (2 columns, transaction number and item)

The code to perform these transformations and prepare the data was written in Python and can be reviewed in Appendix C.

Prior to performing the transaction transformation, the data is the same state as it was after initial collection:

  state_code county_code       derived_sex action_taken purchaser_type
1         OH       39153 Sex Not Available            1              0
2         NY       36061              Male            1              0
3         NY       36061 Sex Not Available            1              0
4         FL       12011              Male            1              0
5         MD       24031             Joint            1              0
6         NC       37089             Joint            1              0

Examples of the data, post transformation:

  index                                variable
1     0                                     3.0
2     0 applicant_ethnicity:Not Hispanic/Latino
3     0                            income:21-40
4     0                      interest_rate:0-20
5     0                    applicant_race:White
6     0               loan_to_value_ratio:41-60

The transformed data can be found here

6.4 Code

The code to prepare the data into single transaction format, execute the apriori algorithm, and measure metrics such as confidence, lift, and support, was written in R is located in Appendix C. Furthermore, the code is embedded, but hidden, within the quarto source code of this webpage, written in R. Examination of the source .qmd file will provide a view of the specific code used to generate the rules and visuals.

6.5 Results

The below tables and figures provide insight to mined association rules from the dataset. The overall first three tables, Table 6.1, Table 6.2, and Table 6.3 cover the top association rules when the totality of the dataset is mined via apriori.

However, some other tables and figures are necessary to examine the individual institutions, as what is frequent for one institution may be infrequent for another. Being able to dive deeper on the individual institutions and the relative frequency of approvals and denials for their organizations is of interest to the intent of this research.

Table 6.1: Top 15 Associations by Support
     lhs                                           rhs                                         support confidence  coverage     lift  count
[1]  {approve}                                  => {1 rooms}                                 0.8465587  0.9897986 0.8552837 1.001607 172124
[2]  {1 rooms}                                  => {approve}                                 0.8465587  0.8566580 0.9882108 1.001607 172124
[3]  {applicant_ethnicity:Not Hispanic/Latino}  => {1 rooms}                                 0.7152005  0.9893591 0.7228928 1.001162 145416
[4]  {1 rooms}                                  => {applicant_ethnicity:Not Hispanic/Latino} 0.7152005  0.7237327 0.9882108 1.001162 145416
[5]  {applicant_ethnicity:Not Hispanic/Latino}  => {approve}                                 0.6253676  0.8650905 0.7228928 1.011466 127151
[6]  {approve}                                  => {applicant_ethnicity:Not Hispanic/Latino} 0.6253676  0.7311815 0.8552837 1.011466 127151
[7]  {applicant_ethnicity:Not Hispanic/Latino,                                                                                             
      approve}                                  => {1 rooms}                                 0.6195837  0.9907512 0.6253676 1.002571 125975
[8]  {1 rooms,                                                                                                                             
      applicant_ethnicity:Not Hispanic/Latino}  => {approve}                                 0.6195837  0.8663077 0.7152005 1.012889 125975
[9]  {1 rooms,                                                                                                                             
      approve}                                  => {applicant_ethnicity:Not Hispanic/Latino} 0.6195837  0.7318852 0.8465587 1.012439 125975
[10] {applicant_race:White}                     => {1 rooms}                                 0.6052911  0.9917401 0.6103324 1.003571 123069
[11] {1 rooms}                                  => {applicant_race:White}                    0.6052911  0.6125121 0.9882108 1.003571 123069
[12] {aus:Desktop Underwriter}                  => {1 rooms}                                 0.5645626  0.9910383 0.5696678 1.002861 114788
[13] {1 rooms}                                  => {aus:Desktop Underwriter}                 0.5645626  0.5712977 0.9882108 1.002861 114788
[14] {applicant_race:White}                     => {approve}                                 0.5349397  0.8764727 0.6103324 1.024774 108765
[15] {approve}                                  => {applicant_race:White}                    0.5349397  0.6254529 0.8552837 1.024774 108765

Table 6.1 outlines the overall top 15 mined rules by support, or relative frequency. One can see some relatively frequently occurring occurences here, however, this doesn’t mean they are useful or meaningful associations.

For instance, examining rule #14 with an untrained eye would be immediately concerning, as it seems 53% of the time, White applicants are simply approved because they are White. However, examining the lift of this rule being quite close to 1, this is actually a weak association within this data. Recalling that a lift value equal to 1 means that A and B are independent. While this value is greater than one, as are all values in Table 6.1, all of them are very close to 1. As such, every rule in this table is a weak association and simply a result of frequency of presence in the data.

Table 6.2: Top 15 Associations by Confidence
     lhs                                         rhs            support confidence   coverage     lift count
[1]  {interest_rate:>80}                      => {approve}   0.18578904          1 0.18578904 1.169203 37775
[2]  {aus:Loan Prospector/Product Advisor,                                                                  
      aus:Other}                              => {JP Morgan} 0.15580213          1 0.15580213 4.971320 31678
[3]  {interest_rate:>80,                                                                                    
      loan_to_value_ratio:61-80}              => {approve}   0.04241056          1 0.04241056 1.169203  8623
[4]  {interest_rate:>80,                                                                                    
      tract_minority_population_percent:0-20} => {approve}   0.04197283          1 0.04197283 1.169203  8534
[5]  {debt_to_income_ratio:61-80,                                                                           
      interest_rate:>80}                      => {approve}   0.04571074          1 0.04571074 1.169203  9294
[6]  {interest_rate:>80,                                                                                    
      tract_to_msa_income_percentage:21-40}   => {approve}   0.04036454          1 0.04036454 1.169203  8207
[7]  {debt_to_income_ratio:21-40,                                                                           
      interest_rate:>80}                      => {approve}   0.04220891          1 0.04220891 1.169203  8582
[8]  {Female,                                                                                               
      interest_rate:>80}                      => {approve}   0.04209579          1 0.04209579 1.169203  8559
[9]  {2.0,                                                                                                  
      interest_rate:>80}                      => {approve}   0.04509104          1 0.04509104 1.169203  9168
[10] {1.0,                                                                                                  
      interest_rate:>80}                      => {approve}   0.05827702          1 0.05827702 1.169203 11849
[11] {interest_rate:>80,                                                                                    
      loan_to_value_ratio:21-40}              => {approve}   0.05279802          1 0.05279802 1.169203 10735
[12] {interest_rate:>80,                                                                                    
      Joint}                                  => {approve}   0.06351993          1 0.06351993 1.169203 12915
[13] {interest_rate:>80,                                                                                    
      Male}                                   => {approve}   0.06214773          1 0.06214773 1.169203 12636
[14] {aus:Loan Prospector/Product Advisor,                                                                  
      interest_rate:>80}                      => {approve}   0.08140290          1 0.08140290 1.169203 16551
[15] {interest_rate:>80,                                                                                    
      Rocket Mortgage}                        => {approve}   0.10214832          1 0.10214832 1.169203 20769

In Table 6.2, we have a bit more of a mixed bag of results. The strongest rule is actually the assocation that JP Morgan uses Loan Prospector/Product Advisor and Other Automated underwriting systems for their loan applications, moreseo than any other lender, for 2023 data. This rule (rule #2) has a substantially high value for lift (close to 5) and a confidence of 1, whereas nearly all other rules in this top-15 list are much closer to 1 in terms of lift.

All of these rules have greater lift than the rules mined in Table 6.1, and thus have more utility. However, most are still very close to 1 and not incredibly strong, less rule #2. As such, many of these rules are frequent, but not necessarily meaningful beyond their relative frequency.

Table 6.3: Top 15 Associations by Lift
     lhs                                                rhs                                      support confidence   coverage     lift count
[1]  {applicant_ethnicity:Mexican}                   => {applicant_ethnicity:Hispanic/Latino} 0.04056128  0.9169446 0.04423525 8.207934  8247
[2]  {applicant_ethnicity:Hispanic/Latino}           => {applicant_ethnicity:Mexican}         0.04056128  0.3630800 0.11171442 8.207934  8247
[3]  {1 rooms,                                                                                                                               
      applicant_ethnicity:Information Not Provided,                                                                                          
      applicant_race:Information not provided,                                                                                               
      Rocket Mortgage}                               => {Sex Not Available}                   0.07532387  0.7362273 0.10231062 7.834365 15315
[4]  {applicant_ethnicity:Information Not Provided,                                                                                          
      applicant_race:Information not provided,                                                                                               
      Rocket Mortgage}                               => {Sex Not Available}                   0.07603703  0.7353151 0.10340740 7.824658 15460
[5]  {1 rooms,                                                                                                                               
      applicant_ethnicity:Information Not Provided,                                                                                          
      applicant_race:Information not provided,                                                                                               
      approve,                                                                                                                               
      Rocket Mortgage}                               => {Sex Not Available}                   0.06595941  0.7333224 0.08994600 7.803453 13411
[6]  {applicant_ethnicity:Information Not Provided,                                                                                          
      applicant_race:Information not provided,                                                                                               
      approve,                                                                                                                               
      Rocket Mortgage}                               => {Sex Not Available}                   0.06651026  0.7322395 0.09083129 7.791930 13523
[7]  {1 rooms,                                                                                                                               
      applicant_ethnicity:Information Not Provided,                                                                                          
      applicant_race:Information not provided,                                                                                               
      aus:Desktop Underwriter,                                                                                                               
      Rocket Mortgage}                               => {Sex Not Available}                   0.05100776  0.7264131 0.07021867 7.729930 10371
[8]  {applicant_ethnicity:Information Not Provided,                                                                                          
      applicant_race:Information not provided,                                                                                               
      aus:Desktop Underwriter,                                                                                                               
      Rocket Mortgage}                               => {Sex Not Available}                   0.05134712  0.7252518 0.07079903 7.717572 10440
[9]  {1 rooms,                                                                                                                               
      applicant_ethnicity:Information Not Provided,                                                                                          
      applicant_race:Information not provided,                                                                                               
      approve,                                                                                                                               
      aus:Desktop Underwriter,                                                                                                               
      Rocket Mortgage}                               => {Sex Not Available}                   0.04512547  0.7246663 0.06227068 7.711341  9175
[10] {applicant_ethnicity:Information Not Provided,                                                                                          
      applicant_race:Information not provided,                                                                                               
      approve,                                                                                                                               
      aus:Desktop Underwriter,                                                                                                               
      Rocket Mortgage}                               => {Sex Not Available}                   0.04538122  0.7233459 0.06273792 7.697291  9227
[11] {1 rooms,                                                                                                                               
      applicant_ethnicity:Information Not Provided,                                                                                          
      applicant_race:Information not provided,                                                                                               
      approve,                                                                                                                               
      aus:Desktop Underwriter}                       => {Sex Not Available}                   0.04939456  0.6718625 0.07351885 7.149444 10043
[12] {1 rooms,                                                                                                                               
      applicant_ethnicity:Information Not Provided,                                                                                          
      Rocket Mortgage}                               => {Sex Not Available}                   0.07602227  0.6709350 0.11330795 7.139574 15457
[13] {applicant_ethnicity:Information Not Provided,                                                                                          
      applicant_race:Information not provided,                                                                                               
      approve,                                                                                                                               
      aus:Desktop Underwriter}                       => {Sex Not Available}                   0.04969457  0.6703377 0.07413364 7.133218 10104
[14] {applicant_ethnicity:Information Not Provided,                                                                                          
      Rocket Mortgage}                               => {Sex Not Available}                   0.07674034  0.6700880 0.11452278 7.130562 15603
[15] {1 rooms,                                                                                                                               
      applicant_ethnicity:Information Not Provided,                                                                                          
      approve,                                                                                                                               
      Rocket Mortgage}                               => {Sex Not Available}                   0.06656437  0.6685768 0.09956129 7.114480 13534
Figure 6.1: Visualization of top 15 rules (by lift)

In Table 6.3 and Figure 6.1, one can begin to see some interesting patterns. The rules in this list are quite interesting. Of particular interest are rules #11 and #13, establishing assoications between approval and the absence of ethnic, racial, and gender information on a loan application. While other features are include (e.g. 1 bedroom home, specific underwriting systems), the association of these things together within the overarching dataset could be used to tell a compelling story.

Namely, with 66-70% confidence, it’s possible that you may boost your chances to have a loan approval if you omit your demographic information, as such omissions and exclusions are associated with loan approvals. Similarly speaking, if the former is a true statement, it may also be true that one may reduce their chances for approval when including their personal demographic information on a loan application. And by further examining these rules, both of these claims may best hold true if the loan has been processed by Rocket Mortgage.

From here, it’s of interest to examine association rules mined when the transactions are filtered to specific organizations. The reason for this is that, for a set threshold of support and confidence, certain interesting rules for a given organization may not be available for mining simply due to a lower volume of transactions processed via that organization. As such, by first filtering down transactions to a set organization and then exploring the rules mined for that organization at set common thresholds for support and confidence, more interesting information may arise.

Figure 6.2: top 10 NFCU Denial Rules by Lift

NFCU has rules including debt-to-income ratio being above 80th percentile and loan interest rate being between 41 and 60th percentile as common features for all of its top 10 rules by lift. A total of 4 of the top ten rules include race or ethnicity (Non-hispanic/latino and black/African American applicants).

Figure 6.3: top 10 JP Morgan Denial Rules by Lift

JP Morgan has the same common features of debt-to-income ratio being above 80th percentile and loan interest rate being between 41 and 60th percentile for its top 10 rules. JP Morgan has a single rule covering ethnicity for denial (non-hispanic/latino applicants with the other common criteria).

Figure 6.4: top 10 Bank of America Denial Rules by Lift

Bank of America appears as an oddity here. The main common traits are a moderate interest rate (in the range of 21st to 40th percentile) and the lack of use of an underwriting system (e.g. they didn’t use any underwriting).

Surprisingly, there are strong rules for BoA for denial of White and Non-hispanic/latino applicants.

Figure 6.5: top 10 Wells Fargo Denial Rules by Lift

Wells Fargo corporation has no presence of association rules tied to ethnic, racial, gender, or age in its top-10 rules. Their denials seem to be predominantly mapped to lower proposed interest rates married with a high debt-to-income ratio.

Figure 6.6: top 10 Rocket Mortgage Denial Rules by Lift

Wells Fargo has 3 rules in its top 10 by lift tied to ethnicity (Denial of non-Hispanic/Latino applicants with moderate interest rates).

6.6 Conclusions

These association rules support this research going a degree beyond basic statistical analyses of source variables. In particular, cases of lift being over 1 are of interest, as it suggests that the antecedents contribute more probabilistically to the precedents. Once again, not necessarily tending to causation, but instead establishing a probabilistic and associative connection between the variables.

Examining the results for the top 15 rules by lift, one sees an interesting occurence. Namely, the following set of items is frequent and strongly associated (examining the 11th rule):

{ 1 rooms, applicant_ethnicity:Information Not Provided, applicant_race:Information not provided approve, aus:Desktop Underwriter} => {Sex Not Available}

Namely, that mortgage approval is strongly associated with not having protected class information (sex, ethnicity, race) available or listed on a mortgage application.

With lift values exceeding 7 and confidence of 66%. Moreover, this 66% confidence corresponds to the concept that when all items in the antecedent are met, 66% of the time it is followed by the sex not being listed or available in the application.

This suggests, then, that it is quite likely to see applications where no demographic protected class information is provided or available for the applicant, and the application is not denied. This finding appears connected and linked to those of Figure 2.9, Figure 2.10, and Figure 2.11. What is further interesting is that the findings for each individual chart in initial exploration appear to merge together as rules within this association rule mining. While ARM does not establish or produce causal relationsips, the further depth of the relationships between these protected class variables is intriguing.

However, some potentially concerning rules did arise for rocket mortgage (male or White or non-hispanic/latino), Wells Fargo (non-hispanic/latino ethnicity), Bank of America (White or non-hispanic/latino), and JP Morgan (non-Hispanic/Latino). While these rules appear to include other relevant financial or risk-based lending information (high debt to income ratio and insufficient interest rate on the loan, and other similar financial indicators that the applicant’s ability to repay may be at risk), these rules suggest that, at least on a frequentist basis, all lenders are more likely to deny loans to non-Hispanic/Latino applicants when they fall within these financial categories.

The findings in denial for NFCU rules #7 and #8 in Figure 6.2 high confidence and lift are consistent with the findings of CNN’s report from the end of 2023, when taking the organization by itself and when not comparing to other institutions. To compare all institutions for such a rule of denial of black applicants, here, ARM is run once more, focused across the entirety of the dataset with minimum support = 0.04, confidence = 0.01, and setting loan denial as the consequent.

    lhs                                           rhs       support confidence  coverage      lift count
[1] {applicant_race:White,                                                                              
     interest_rate:41-60}                      => {deny} 0.04213514  0.3407446 0.1236561 2.3546500  8567
[2] {1 rooms,                                                                                           
     applicant_race:White,                                                                              
     interest_rate:41-60}                      => {deny} 0.04138263  0.3381153 0.1223921 2.3364812  8414
[3] {applicant_race:White}                     => {deny} 0.07539273  0.1235273 0.6103324 0.8536119 15329
[4] {1 rooms,                                                                                           
     applicant_race:White}                     => {deny} 0.07414348  0.1224923 0.6052911 0.8464593 15075
[5] {applicant_race:White,                                                                              
     aus:Desktop Underwriter}                  => {deny} 0.04111213  0.1185674 0.3467406 0.8193371  8359
[6] {1 rooms,                                                                                           
     applicant_race:White,                                                                              
     aus:Desktop Underwriter}                  => {deny} 0.04052193  0.1176227 0.3445077 0.8128092  8239
[7] {applicant_ethnicity:Not Hispanic/Latino,                                                           
     applicant_race:White}                     => {deny} 0.05616707  0.1111652 0.5052577 0.7681857 11420
[8] {1 rooms,                                                                                           
     applicant_ethnicity:Not Hispanic/Latino,                                                           
     applicant_race:White}                     => {deny} 0.05537030  0.1103444 0.5017952 0.7625140 11258

In the above table, all rules from the totality of the dataset that include race are listed. Examining these rules, one can clearly see that at the selected minimum confidence and support levels, there are no stand-outs in terms of specific organizations having high-lift high-confidence associations between a particular protected class and denial of loan applications. One can also see, however, that with limited confidence and moderate lift in rules #1 and #2, White applicants whose interest rates would be in the 41-60th percentile of 2023 interest rates tended to be denied. The remainder of the rules are not useful as they have lift values less than 1 and thus have negative assoications with one another.

     lhs                                           rhs          support confidence   coverage     lift count
[1]  {applicant_race:White,                                                                                 
      interest_rate:>80}                        => {approve} 0.11876236          1 0.11876236 1.169203 24147
[2]  {applicant_race:White,                                                                                 
      interest_rate:>80,                                                                                    
      Joint}                                    => {approve} 0.04705836          1 0.04705836 1.169203  9568
[3]  {applicant_race:White,                                                                                 
      interest_rate:>80,                                                                                    
      Male}                                     => {approve} 0.04324667          1 0.04324667 1.169203  8793
[4]  {applicant_race:White,                                                                                 
      aus:Loan Prospector/Product Advisor,                                                                  
      interest_rate:>80}                        => {approve} 0.05488339          1 0.05488339 1.169203 11159
[5]  {applicant_race:White,                                                                                 
      interest_rate:>80,                                                                                    
      Rocket Mortgage}                          => {approve} 0.06145424          1 0.06145424 1.169203 12495
[6]  {applicant_race:White,                                                                                 
      aus:Desktop Underwriter,                                                                              
      interest_rate:>80}                        => {approve} 0.08051760          1 0.08051760 1.169203 16371
[7]  {applicant_ethnicity:Not Hispanic/Latino,                                                              
      applicant_race:White,                                                                                 
      interest_rate:>80}                        => {approve} 0.09885305          1 0.09885305 1.169203 20099
[8]  {1 rooms,                                                                                              
      applicant_race:White,                                                                                 
      interest_rate:>80}                        => {approve} 0.11780329          1 0.11780329 1.169203 23952
[9]  {applicant_race:White,                                                                                 
      interest_rate:0-20,                                                                                   
      Rocket Mortgage}                          => {approve} 0.05066840          1 0.05066840 1.169203 10302
[10] {1 rooms,                                                                                              
      applicant_race:White,                                                                                 
      interest_rate:>80,                                                                                    
      Joint}                                    => {approve} 0.04677802          1 0.04677802 1.169203  9511

Since metrics such as confidence and lift originate from somewhat Bayesian probability measurements, the performance of naive Bayes and Bernoulli Naive Bayes classification methods on the data could potentially be effective in terms of accuracy, recall, and precision for cases when age, gender, or race are not listed for an application. What would still remain in question is the degree to which predictive strength for those models is impacted by the presence of specific protected classes (instead of their absence) for establishing a link to the outcome of approval or denial of the application.