Email/Account Validity Stats
<!DOCTYPE html>
<meta charset="utf-8">
<style>
text {
font-family: "Helvetica Neue", Helvetica, sans-serif;
}
.name {
font-weight: bold;
font-size: small;
}
.about {
fill: #777;
font-size: smaller;
}
.link {
fill: none;
stroke: #000;
shape-rendering: crispEdges;
}
</style>
<body>
<script src="http://d3js.org/d3.v2.min.js?2.9.4"></script>
<script>
var integerFormat = d3.format(",d");
var margin = {top: 0, right: 320, bottom: 0, left: 0},
width = 960 - margin.left - margin.right,
height = 500 - margin.top - margin.bottom;
var tree = d3.layout.tree()
.separation(function(a, b) { return a.parent === b.parent ? 1 : .875; })
.children(function(d) { return d.children; })
.size([height, width]);
var svg = d3.select("body").append("svg")
.attr("width", width + margin.left + margin.right)
.attr("height", height + margin.top + margin.bottom)
.append("g")
.attr("transform", "translate(" + margin.left + "," + margin.top + ")");
d3.json("data.json", function(json) {
var nodes = tree.nodes(json);
var link = svg.selectAll(".link")
.data(tree.links(nodes))
.enter().append("path")
.attr("class", "link")
.attr("d", elbow);
var node = svg.selectAll(".node")
.data(nodes)
.enter().append("g")
.attr("class", "node")
.attr("transform", function(d) { return "translate(" + d.y + "," + d.x + ")"; })
node.append("text")
.attr("class", "name")
.attr("x", 8)
.attr("y", -6)
.text(function(d) { return d.name; });
node.append("text")
.attr("x", 8)
.attr("y", 8)
.attr("dy", ".71em")
.attr("class", "about lifespan")
.text(function(d) { return "# Email Addresses: " + integerFormat(d.email_addresses); });
node.append("text")
.attr("x", 8)
.attr("y", 8)
.attr("dy", "1.86em")
.attr("class", "about lifespan")
.text(function(d) { return "# Sent: " + integerFormat(d.sent); });
});
function elbow(d, i) {
return "M" + d.source.y + "," + d.source.x
+ "H" + d.target.y + "V" + d.target.x
+ (d.target.children ? "" : "h" + margin.right);
}
</script>
{
"name": "Non-Member-Club Accounts",
"email_addresses": 1447637,
"sent": 54903240,
"children": [
{
"name": "Known Member Accounts",
"email_addresses": 1404841,
"sent": 53163567,
"children": [
{
"name": "With Email Address In DW",
"email_addresses": 1166903,
"sent": 47691223,
"children": [
{
"name": "Belonging To The Account",
"email_addresses": 1163842,
"sent": 47515542
},
{
"name": "Belonging To Another Account",
"email_addresses": 20277,
"sent": 175681
}
]
},
{
"name": "With Email Address Not In DW",
"email_addresses": 237938,
"sent": 5472344
}
]
},
{
"name": "Unknown Accounts",
"email_addresses": 499086,
"sent": 1739673,
"children": [
{
"name": "With Email Address In DW",
"email_addresses": 434874,
"sent": 1479705
},
{
"name": "With Email Address Not In DW",
"email_addresses": 64212,
"sent": 259968
}
]
}
]
}
CheetahMail sends USGA Email Event Exports on a daily basis. Email Events are emailing-related events, sometimes called 'Click-Stream' or 'Conversions'. The default data format provided is a mixed-record format (the format of the data is different depending on the type of event) with all email events (sends, opens, transactions, etc) in one file and the meta-data about each mailing (issue) in a second file.
Each record in the Event file represents a single event in the click-stream. The initial event in the click-stream is always the 'Send' event. Each subsequent event in the click-stream is tied back to the send event. The send event record is the only event record that contains the USGA Account Id for the sent-to email address. Since email addresses are not unique across USGA Members/Accounts, it is ideal to have both the Account Id and the account's email address in order to tie the email events back to the Member/Account/Email Address.
The goal of this quick analysis is to determine the quality of the send event record. The Account Id/Email Address combination in the send event record should ideally match back to an Account Id/Email Address combination in the DW, since the email campaigns were generated from the same source as the DW. If the data provided in the send event record does not match back to the DW, then we must decide when and how the data should be integrated.
The first question is: What is the size and scope of the data set we're dealing with:
# Email Addresses | # Sent | Min Sent On | Max Sent On |
---|---|---|---|
1,447,710 | 55,139,910 | Mar 03, 2010 | Dec 04, 2012 |
Of these, how many were Member Club events:
# Email Addresses | # Sent |
---|---|
12,882 | 236,670 |
Leaving us with Non-Member-Club Account Send Events
# Email Addresses | # Sent |
---|---|
1,447,637 | 54,903,240 |
However, not all of the Account Id's match to known Member Account Id's in the DW. If we remove events linked to these Account Id's, we will remove:
# Email Addresses | # Sent |
---|---|
499,086 | 1,739,673 |
Which leaves us with Valid Member Account Send Events:
# Email Addresses | # Sent |
---|---|
1,404,841 | 53,163,567 |
However, not all of these Email Addresses exist in the DW. If we remove events linked to these Email Addresses, we will remove:
# Email Addresses | # Sent |
---|---|
237,938 | 5,472,344 |
Leaving us with Send Events for Valid Member Accounts with Known Email Addresses:
# Email Addresses | # Sent |
---|---|
1,166,903 | 47,691,223 |
Next, let's remove any send events where the Email Address in the send event is not a known Email Address for the Account Id in the send event (it is an Email Address for a different Member Account):
# Email Addresses | # Sent |
---|---|
20,277 | 175,681 |
Leaving us with Send Events for Valid Member Accounts with Matching Email Addresses:
# Email Addresses | # Sent |
---|---|
1,163,842 | 47,515,5420 |
This means that roughly 86.5% of the Non-Member-Club Account Send Events (47,515,542 / 54,903,240) match to the DW.
I can't come up with a good explanation for this...
Possibly, these Email Addresses used to exist in the DW, but they have been updated to different email addresses, rather than logically deleting the old addresses (setting their status to 'D') and adding new ones. If this is the case, this could be a long term problem re: incorporating the email event data, and we will have to decide whether this is a best practice that should be continued.