Btibert3
2/6/2016 - 4:34 PM

Cypher query help: Summary stats for email marketing

Cypher query help: Summary stats for email marketing

= Email Marketing Analytics App

:neo4j-version: 2.3.1
:author: Brock Tibert
:twitter: @brocktibert

== Problem Statement

I want to summarize the email engagement for a particular contact in my database.  The summary stats that I am looking for are:

* The count of emails that have been sent to that contact
* The timestamp for the first email sent, as well as the most recent (last) email  
* If it exists, the timestamp for the first and last open across all emails
* If it exists, the timestamp for the first and last click across all emails
* A count of how many distinct emails were opened  
* A count of how many distinct emails were clicked 

The last two bullets above are giving me some issues.

== Toy Database

//hide
//setup
//output
[source,cypher]
----
MERGE (c1:Contact {id:1})
MERGE (c2:Contact {id:2})
MERGE (c3:Contact {id:3})
MERGE (m1:Email {ts:1})
MERGE (m2:Email {ts:2})
MERGE (m3:Email {ts:3})
MERGE (m4:Email {ts:4})
MERGE (m5:Email {ts:5})
MERGE (m6:Email {ts:6})
MERGE (o1:Open {ts:11})
MERGE (o2:Open {ts:12})
MERGE (o3:Open {ts:13})
MERGE (o4:Open {ts:14})
MERGE (l1:Link {ts:20})
MERGE (l2:Link {ts:21})
MERGE (l3:Link {ts:22})
CREATE (c1)-[:SENT]->(m1)
CREATE (m1)-[:NEXT_EMAIL]->(m2)
CREATE (m2)-[:NEXT_EMAIL]->(m3)
CREATE (c2)-[:SENT]->(m4)
CREATE (m4)-[:NEXT_EMAIL]->(m5)
CREATE (m5)-[:NEXT_EMAIL]->(m6)
CREATE (m1)-[:WAS_OPENED]->(o1)
CREATE (m2)-[:WAS_OPENED]->(o2)
CREATE (m2)-[:WAS_OPENED]->(o3)
CREATE (m5)-[:WAS_OPENED]->(o4)
CREATE (m1)-[:WAS_CLICKED]->(l1)
CREATE (m2)-[:WAS_CLICKED]->(l2)
CREATE (m2)-[:WAS_CLICKED]->(l3);
----

//graph_result

Now that the database is setup, this query __almost__ gets me to the end result.

[source,cypher]
----
MATCH (c:Contact {id:1})-[*]->(e:Email)
OPTIONAL MATCH pl=(e)-[]->(click:Link)
OPTIONAL MATCH po=(e)-[]->(open:Open)
WITH c, 
     e, 
     click, 
     open
RETURN c.id as pidm,
       COUNT(DISTINCT e) as email_count,
       MIN(e.ts) as first_email, 
       MAX(e.ts) as last_email, 
       MIN(click.ts) as first_click,
       MAX(click.ts) as last_click,
       MIN(open.ts) as first_open,
       MAX(open.ts) as last_open,
       SUM(CASE WHEN EXISTS((e)-[]->(click:Link)) THEN 1 ELSE 0 END) as click_ind,
       SUM(CASE WHEN EXISTS((e)-[]->(open:Open)) THEN 1 ELSE 0 END) as open_ind;
----

//table

Because an individual email can be opened and clicked on multiple times, I just want to identify if it was ever clicked, or ever opened.  These stats allow me to get estimate email engagement; simply, the percentage of emails that were opened and clicked.  

_In the table above, given the toy dataset, `open_ind` and `click_ind` should each return the value of 2_

Moving beyond this, I will modify this query to look at the last `N` emails, which clearly is Neo4j's wheelhouse and why it's a great tool for these sort of analytical projects.

Any help will be greatly appreciated.