laika222
1/16/2018 - 8:15 PM

Fundraising Analytics

-- Edit query
- right click table in query and Duplicate Table
- to add column, Edit Queries, Add Column tab, Custom Column, if statement (must be lowercase):
Column = if [creditee] = null then [name] else [creditee]

-- example of POWER BI FORMULA with multiple else if statements to be used in Edit Query
if [Cumulative Contribution] < 100 then "Low"
else if [Cumulative Contribution] < 1000 then "Medium"
else if [Cumulative Contribution] < 10000 then "High"
else "Super High"


-- example of DAX MEASURE - IF statement for levels using a MEASURE with aggregate SUM - this is responsive to changes in grouping
Giving Level = IF(SUM('Group by FY, Soft ID'[Cumulative Contribution]) < 100, "Very Low",
IF(SUM('Group by FY, Soft ID'[Cumulative Contribution]) < 1000, "Low",
IF(SUM('Group by FY, Soft ID'[Cumulative Contribution]) < 10000, "Medium",
IF(SUM('Group by FY, Soft ID'[Cumulative Contribution]) < 25000, "High",
IF(SUM('Group by FY, Soft ID'[Cumulative Contribution]) < 50000, "Very High",
IF(SUM('Group by FY, Soft ID'[Cumulative Contribution]) < 100000, "Incredibly High",
"Highest"
))))))

-- example of DAX COLUMN - IF statement calculated COLUMN for rankings (doesn't necessarily seem to work when you change grouping
Level = IF('Group by FY, Soft ID'[Cumulative Contribution] < 100, "Very Low",
IF('Group by FY, Soft ID'[Cumulative Contribution] < 1000, "Low",
IF('Group by FY, Soft ID'[Cumulative Contribution] < 10000, "Medium",
IF('Group by FY, Soft ID'[Cumulative Contribution] < 25000, "High",
IF('Group by FY, Soft ID'[Cumulative Contribution] < 50000, "Very High",
IF('Group by FY, Soft ID'[Cumulative Contribution] < 100000, "Incredibly High",
"Highest"
))))))

-- example of DAX MEASURE - Calculate, which aggregates values from a column and places it as repeated value in each row. Format is CALCULATE(expression being calculated, filter 1, filter 2, etc.). In this example, it's calculating the SUM of the Cumulative Contribution column and filtering it by Query Level value of "High"
Sum of High Level Contribs = 
CALCULATE ( SUM ( 'Group by FY, Soft ID'[Cumulative Contribution] ), 'Group by FY, Soft ID'[Query Level] = "High" )

/* Results, showing how the value calculated is repeated into each row:

Query Level   Cumulative Contribution   Sum of High Level Contribs
High          $500                      $500
Medium        $200                      $500
Low           $450                      $500
Super Low     $25                       $500
*/

-- example of DAX MEASURE - Calculate, using a nested OR statement to select the SUM of Contributions that are in the "High" or "Medium" levels
Sum of High or Medium Level Contribs = 
CALCULATE (SUM 
            ('Group by FY, Soft ID'[Cumulative Contribution]), 
            OR('Group by FY, Soft ID'[Query Level] = "High", 'Group by FY, Soft ID'[Query Level] = "Medium")
  )

/* Results, showing how the sum of High or Medium values:

Query Level   Cumulative Contribution   Sum of High or Medium Level Contribs
High          $500                      $700
Medium        $200                      $700
Low           $450                      $700
Super Low     $25                       $700
*/

-- Dax MEASURE - how to get total of entire column, specify entire table in ALL statement:
Sum of Cumulative Contrib Column = 
CALCULATE (SUM ('W Giving Levels All'[Cumulative Contribution] ), ALL('W Giving Levels All'))

/* Results, showing how the sum of High or Medium values:

Query Level   Cumulative Contribution   Sum of Cumulative Contrib Column
High          $500                      $1325
High          $200                      $1325
Low           $450                      $1325
Low           $150                      $1325
Super Low     $25                       $1325
*/

-- Dax MEASURE - how to get total for all sub-values in a column (kind of like a pane in Tableau or a partitioned window function in SQL), specify sub-category column in ALL statement:
Sum of Cumulative Contrib for Each Query Level = 
CALCULATE (SUM ('W Giving Levels All'[Cumulative Contribution] ), ALL('W Giving Levels All'[Query Level]))

/* Results, showing how the sum of High or Medium values:

Query Level   Cumulative Contribution   Sum of Cumulative Contrib for Each Query Level
High          $500                      $700
High          $200                      $700
Low           $450                      $600
Low           $150                      $600
Super Low     $25                       $25
*/

-- Dax MEASURE - how to get percentage of a subtotal (kind of like a percentage of a pane in Tableau or a partitioned window function in SQL), specify sub-category column in ALL statement:
% of Query Level = SUM('W Giving Levels All'[Cumulative Contribution]) / 
CALCULATE (SUM ('W Giving Levels All'[Cumulative Contribution] ), ALL('W Giving Levels All'[Query Level]))

/* Results, showing how the sum of High or Medium values:

Query Level   Cumulative Contribution   % of Query Level
High          $500                      71%
High          $200                      29%
Low           $450                      75%
Low           $150                      25%
Super Low     $25                       100%
*/



-- DAX MEASURE - showing how to calculate the sum of a column. Uses CALCULATE with ALL keyword in filter in order to clear all filters. If you don't use all, it'll sum for each row in the table, but if you use ALL to clear the filters it'll sum the entire column.
TOTAL = 
CALCULATE (SUM 
            ('Group by FY, Soft ID'[Cumulative Contribution] ), 
            ALL('Group by FY, Soft ID'[Query Level]))
            
/* Results, showing how Cumulative Contribution has been summed:

Query Level   Cumulative Contribution   TOTAL
High          $500                      $1,175
Medium        $200                      $1,175
Low           $450                      $1,175
Super Low     $25                       $1,175
*/

-- DAX MEASURE - how to show a percentage of total. This first takes the Cumulative Contribution value without using ALL (therefore pulling the value for just that line), and then divides by Cumulative Contribution using ALL (which has ALL clearing the filter and therefore pulls the sum of the entire column). Uses CALCULATE with ALL keyword in filter in order to clear all filters. 
Percent of Total = SUM ('Group by FY, Soft ID'[Cumulative Contribution]) /
CALCULATE (SUM 
            ('Group by FY, Soft ID'[Cumulative Contribution] ), ALL('Group by FY, Soft ID'[Query Level]))
            
/* Results, showing how Cumulative Contribution has been summed:

Query Level   Cumulative Contribution   TOTAL     Percentage of Total
High          $500                      $1,175    0.42
Medium        $200                      $1,175    0.17
Low           $450                      $1,175    0.38
Super Low     $25                       $1,175    0.02
*/
-- Cross Industry Standard Process for Data Mining (CRISP-DM)

Step 1: Business Understanding: defining the context of the analysis
Step 2: Data Understanding: aligning data elements to the context
Step 3: Data Preparation: gathering and priming the data for analysis
Step 4: Modeling: conducting the analysis
Step 5: Evaluation: determining whether the analysis supports the goal of the business understanding
Step 6: Deployment: implementing the analysis

-- three primary elementts defining the development business process:

1: Base Development: annual giving and membership programs
2: Prospecting: identifying and prioritizing prospects for major giving
3: Major Giving: cultivating and stewarding major gifts

-- annual and mid-level giving is generally predictable, but major donor and corporate level giving is generally more volatile.

-- Correlation

-- Regression Analysis

-- Analytical metrics:

- Capacity Rating - measure of patron's ultimate giving ability over a five year period.

- Capacity Yield - five previous year's giving vs. the Capacity Rating, or how much of their capacity is being realized as gifts. If Capacity Rating is $1M and five year giving $100,000, Capacity Yield is 10%. 11% is a good Capacity Yield. 

- Target Efficiency - how the target ask compares to the Capacity Rating. 

- Target Accuracy - how the actual ask compares to the target ask.

- Close Yield - gift amount compared to ask amount. If ask is $1M and gift is $900,000, Close Yield is 90%.

-- example of Arts Single-Point Modeling system:

- Attendee: 1 point
- Subscriber: 1 point
- Arts background: 1 point
- Other arts giving: 1 point
- Donor: 1 point
- Gift in last five years: 1 point
- Five or more gifts: 1 point
- Managed: 1 point
- Researched: 1 point
- Visited: 1 point
- Attended event: 1 point
- Trustee: 1 point

- example of Arts Single-Point Modeling system:

- Attendee: 5 points
- Subscriber: 20 points
- Donor: 5 points
- 10 plus gifts: 10 points
- 5 to 9 gifts: 5 points
- Managed: 20 points
- Constituency or committee membership: 5 points per group/committee
- Total contacts x 3: points
- Current campaign prospect: 30 points
- 0-2 years since last gift: 10 points
- 2-5 years since last gift: 5 points
- 5-10 years since last gift: 2 points
- Arts background: 5 points
- Other arts giving: 5 points

-- Solicitation life cycle

1: Identification: feeding the pipeline with suspects
2: Qualification: qualifying suspects with data and through interaction
3: Cultivation and Solicitation: developing strong portfolios, developing cultivation strategies to align case and personal values, moving prosepcts with financial capacity closer to the organization, and soliciting prosepcts at the appropriate amount
4: Stewardship: stragegy to move prospects back into cultivation

-- annual giving is high-volume/low-dollar, and major giving is low-volume/high-dollar. CSO skews more low-volume/high-dollar

-- first year renewal - if you don't renew a patron within the first year, likelihood of a renewal drops considerable. Goal should first be to just secure a renewal and then once the giving patter is established, then move goal toward increased giving. For this reason, many organizations don't ask for upgrades in the first or second year.

-- RFM ANALYSIS - transactional recency, frequency, and amount

-- LYBUNT, SYBUNT, New
- LYBUNT - gave last year but not this
- SYBUNT (lapsed) - gave between 2 and 3 years ago, but not last year or this
- Long lapsed - intermediate category, gave between 2 and 5 years ago but not last year or this
- New donor - a new donor or someone who last gave more than 5 years ago

-- DATA MINING INTRO
- separate individuals from organizations since giving tends to be so different

-------------------
-- DATA ANALYSIS --
-------------------

- activate DataPak, use Data > DataTools > Histogram to create a histogram that counts the frequency of certain values. You set up a column of bins that are the boundaries of each group. In Excel histogram, the Input Range will be your array of values, and the Bin Range will be the column of bins. The spreadsheet will be set up like this:

contribution    bins
20              0
11              1000
13.56           3500
13.42           10000
11              25000
20              50000
13.42           100000
...

The historgram will count each value as art of a bin and then give you the total of the bin counts:

0       Frequency
1000    20904
3500    1268
10000   717
25000   320
50000   132
100000  58
More    36

-- cross tabulation is comparing the values of two fields. For example, how many event attendees also have an email address?
--------------------------

-- CORRELATION IN EXCEL --

--------------------------



- highlight two or more columns with numeric data and it'll give you a cross tabulation of the correlation. If you're wanting to find a correlation between text data, you'll need to convert the text to numeric values (such as using an IF statement to assign a 1 to one category of values, a 2 to the next category, a 3 to the next, etc.

- correlation will check if there is a correlation between all of the columns. You'll get a result set which is a cross-tabulation showing the correlation between each variable, with a score given of -1 to 1. 1 means a perfect positive correlation (as one goes up, the other goes up in lockstep), and -1 means a perfect negative correlation (as one goes up, the other goes down in lockstep). 0 means there is no correlation. Any value in-between 0 and 1 shows a correlation of varying strength.

Most often you migh pay the most attention to correlation values above 0.1 with an acceptable significance. These are the strongest correlations. Correclation values above 0.05 are still noteworth, and mild correlations of 0.025 may be useful at times.

		contribution	Gift		Pledge		Received_amt
contribution	1			
Gift		0.267457358	1		
Pledge		0.961139925	-0.008940714	1	
Received_amt	0.967004052	0.314316732	0.913472687	1




- correlation won't give you a P-Value, so you get that by using (in ANOVA table, Regrssion Line under Significance F. If P-Value is less than 0.05, then the correlation is statistically significant (it's usable). If P-Value is is less than 0.05, then the measurement is significant in more than 95% of the population. If the value is less that 0.01, then the measurement is significant in more that 99% of the population. You can also get the correlation co-efficient from the Regression process if you are comparing two columns - this number will be located in the first line, Regression Statistics > Multiple R.

- a popular way to get a quick score is to do a Correlation Ranking. To do this:

1) run a correlation to find the five to ten variables that have the strongest relationships
2) recode your inverse relationsihps, so 1 becomes a 0 and 0 becomes a 1. If there's an inverse correlation, you don't want those people, so flip the numbers so that you look at the people who don't have the attribute who is invesrse. Example: subscribing to a certain series has a negative correlation with donating, flip the rankings so you focus on people who don't subscribe to that certain series (and due to the negative correlation, are therefore more likely to give)
3) add the numbers to get your scores. 
4) you can then run a cross tabulation between the donor variable and the correlation score. As the score increases, the precentage of donors increases by group.
5) you can then focus on people who have high scores but are not currently high level donors

Example: say you have five variables, and you score each 0 or 1. That means that each donor will end up with a combined score of 0 to 5 (5 being the most promising). If you cross tabulate these scores against who is actually a donor, you'll see how the scores actually behave. You can then try to focus on the people who have a high score but who are not giving.

                                            DONOR
                                            0 (not a donor)		1 (is a donor)		TOTAL

CORRELATION SCORE     0	  Count		        	389			120			509
                          % of Cor Score		76.4%			23.6%			100%	
  
                      1	  Count			10679			3673			14442
				% of Cor Score		74.6%			25.4%			100%
	
			2	Count			13641			12121			25762
				% of Cor Score		52.0%			47.0%			100%	

			3	Count			3270			4212			7482
				% of Cor Score		43.7%			56.3%			100%
	
			4	Count			286			1319			1605
				% of Cor Score		17.8%			82.2%			100%	

			5	Count			4			196			200
				% of Cor Score		2.0%			98.0%			100%



-------------------
-- RFM ANALYSIS --
-
-------------------

- example of scoring

**RECENCY, max score = 20
Gave in last year	20
Gave 1-2 years ago	15
Gave 2-3 years ago	10
Gave 3-4 years ago	5
Gave 4-5 years ago	2
Gave 5+ years ago	1
No giving		0

**FREQUENCY, max score = 30

FREQUENCY % SCORE
Gift count / Last gift year - first gift year

100%		10 (to be multiplied by years-giving multiplier below)
90-99%		8
80-89%		6
70-79%		4
60-69%		2
< 60%		1

YEARS-GIVING MULTIPLIER
20+ years giving	(Frequency % score) * 3.0 = overall frequency score
15-19 years giving	(Frequency % score) * 2.5 = overall frequency score
10-14 years giving	(Frequency % score) * 2.0 = overall frequency score
5-9 years giving	(Frequency % score) * 1.5 = overall frequency score
< 5 years giving	(Frequency % score) * 1.0 = overall frequency score
No giving		0 = overall frequency score

EXAMPLE

Donor has 20 gifts, first gift was 1995, last gift was 2018

Frequency % score = 20 / (2018-1995) =  0.86, or 86%, which is a Frequency % Score of 6

- donor has giving in last year, so that's a Year's-giving multiplier of 3.0.

- therefore FREQUENCY = 6 * 3.0 = 18

**MONETARY VALUE, max score = 50 (set gift amounts that make sense for your organization)

$5,000+ outright gift 		25
$2,000-$4,999 outright gift	15
$1,000-$1,999 outright gift	10
$500-$999 outright gift		5
<$500				1
$20,000+ cumulative giving	Additional 25
$10,000-$19,999 cumulative	Additional 15
$5,000-$9,999			Additional 5

RFM Score, max score = 100
Sum of Recency, Frequency, and Monetary values

-----------------------
-- ATTACHMENT SCORE --
-
-----------------------

Donor: 5 points
Gift count
	10 or more gifts: 10 points
	5 to 9 gifts: 5 points
Managed: 20 points
Any board or group membership: 5 points per board or group
Total contacts x 3 = points
Years since last gift
	Lowest through 2: 10 points
	...through 5: 5 points
	...through 10: 2 points


-------------------------
-- REGRESSION ANALYSIS --
-------------------------

- dependent variable is apparently the varable you're trying to model or predict. Examples of dependent variables:

Major giving prospect: Major Donors coded 1, all others 0
Planned giving prospect: existing PG donors 1, all other 0
Overall giving likelihood for never donor appeals: all donors 1, all others 0
Affinity to institution: constituents with survey responses indicating affinity 1, all others 0
Predicting donors likely to renew or upgrade gift: donors with consecutive increasing gifts 1, all others 0

-- types of models you can use that depend on the variables in your model.

NOMINAL - also called categorical or logistic, nominal variable values represent categories with no intrinsic ranking; for example, the department of the company at which an employee works. Examples of nominal variables include yes/no variables, channles, volunteers, gender, marital status, region, ZIP code, or religious affiliation.

ORDINAL - the values of ordinal variables represent categories with some intrinsic ranking; for example, levels of service satisfaction, from highly dissatisfied to highly satisfied. Examples of ordinal variables include attitude scores, preference ranking scores, "A", "B", "C" propensity rankings, and gift levels.

SCALE - a variable can be treated as a scale when its values are represented linerly, so that distance comparisons between values are appropriate. Examples of scale variables include total giving dollars, attachment scores, RFM scores, age in years, and capacity in dollars.

When analysts have binary nominal variables (1/0) as their dependent, they might choose to use decision trees or binary logistic regression. If they have many nominal categories in their dependent variable, they might use multinomial logistic regression or discriminant analysis. If analysts are trying to predict a ranking or a gift level, they might use ordinal regression. When they predict giving or large numeric scores, they would use linear regression. If they do not know the groups for a depent and want the software to discover it, they would use cluster analysis.

BINARY LOGISTIC REGRESSION
- tests a sample using a dependent variable and tests against the independent variables. In this case the dependent variable is binary, so it'll be a 1 or a 0. Then it calculates a probablility that the dependent variable is true based on each independent variable, and tests against members of the sample. Each tests succeeds (correctly identifies the dependent variable as either 1 or 0), or fails (incorrectly predicts the variable to be 0 when it is actually 1, or incorrectly predicts the variable is 1 when it is actually 0). You then remove the variables that fail in order to increase the overall accuracy of the model. Once the model is refined to be accurate, it can be applied to the rest of the population and you can make predictions.

To do this, you run the model for the population and convert the predicted probability percentage number into a 0-1000 score by multiplying it by 1000.

You can also create an oridinal score by breaking the groups into quintiles or deciles using the score you've just calculated.

You then cross tabulate your rankings against the dependent variable to see if it actually matches the population. For exampl,e you compare the scores against the donor indicator (1 for donor and 0 for non-donor) and see if there is an upward progression (the higher the score, the higher the percentage of donors in that group).

1) create your sample. For major giving models, try to make the sample half major donors and then half of a random population.

--------------------------

-- REGRESSION IN EXCEL --
-
--------------------------



- go to Data > Regression. It will ask you to fill in the Y Axis and the X Axis. The Y Axis is the dependent variable, and there can be only one column which must be numeric. The X Axis are the independent variables, and you can have more than one column here (multiple regression).

SUMMARY OUTPUT OF SINGLE REGRESSION								
								
Regression Statistics								
!!Multiple R!!	0.967004052							
R Square	0.935096837							
Adjusted R Square	0.935094067							
Standard Error	4244.018571							
Observations	23435							
								
ANOVA								
		df	SS		MS		F		!!Significance F!!		
Regression	1	6.08097E+12	6.08097E+12	337612.5792	0			
Residual	23433	4.22068E+11	18011693.63					
Total		23434	6.50304E+12						
								
		Coefficients	Standard Error	t Stat		P-value		Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%
Intercept	-4.322165597	27.86638385	-0.15510321	0.876741287	-58.94209556	50.29776437	-58.94209556	50.29776437
Received_amt	1.129077031	0.001943186	581.0443866	0		1.12526826	1.132885801	1.12526826	1.132885801


SUMMARY OUTPUT OF MULTIPLE REGRESSION

SUMMARY OUTPUT								
								
Regression Statistics								
Multiple R	1							
R Square	1							
Adjusted R Square	1							
Standard Error	4.66075E-09							
Observations	23435							
								
ANOVA								
		df	SS		MS		F		Significance F			
Regression	3	6.50304E+12	2.16768E+12	9.97894E+28	0			
Residual	23431	5.08981E-13	2.17226E-17					
Total		23434	6.50304E+12						
								
		Coefficients	Standard Error	t Stat		P-value		Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%
Intercept	-2.64853E-10	3.08759E-11	-8.577996747	1.02428E-17	-3.25372E-10	-2.04334E-10	-3.25372E-10	-2.04334E-10
Gift		1		1.08572E-14	9.21049E+13	0		1		1		1		1
Pledge		1		7.25712E-15	1.37796E+14	0		1		1		1		1
Received_amt	1.30996E-16	8.60064E-15	0.015231005	0.987848016	-1.67268E-14	1.69888E-14	-1.67268E-14	1.69888E-14

-------------------------------------------
-- EXAMPLE OF MULTIPLE LINEAR REGRESSION --
-------------------------------------------

- Y value is dependent variable
- X value is independent variable, can be more than one column.

- run regression, and look at P-value. Any value greater than 0.15 isn't predictive, and you should exclue. Any value less that 0.15 is predictive and you should keep. 
-re-run the regression excluding any variables that have P-value less than 0.15

- predicitive formula is Y = Constant + B1 *(X1) + B2*(X2) + BnXn

EXPLANATION - Constant is found in the Intercept Coefficient of the regression output in Excel. B1 is the value of the independent variable you're trying to predict times that variable's coefficient from the regression output. If you have 3 independent variables that passed the P-Value test (less that 0.15), you'll multiply the target value of variable 1 times the coefficient for that variable, variable 2 times the coefficient for that variable, and variable 3 times the coefficient for that variable, and it will predict the value of the dependent variable.

Y = 35475.3 + (800*5.320968) + (1000*5.417138)
-- figure out correlation
-- figure out regression analysis
-- figure out reliable grouping with calculated formulas include Switch with ranges
-- figure out binary logistic regression in Excel
-- high indicators but low conversion (donation) donors
-- regression analysis to find correlation
-- pipeline reporting to show movement up and down (based on consituency dates)?
-- BWF ranking basic stats
-- breakout by giving levels and look at lapsed/renewal/etc.
-- Plans number of contacts versus contribution amount
-- renewal rate of first year donors
-- renewal rate of lapsed and long lapsed donors
-- donations by time (cash flow and ability to predict when is best time to ask)
-- effectiveness of mailings by giving type (sometimes you net more if you don't mail to lower level donors that have a lower potential to actually repsond)
-- frequency disribution of rankings or wealth and whether or not these people are high livel donors or not