Btibert3
2/19/2016 - 6:02 PM

Shortest Path Question

Shortest Path Question

= Shortest Path Question

:neo4j-version: 2.3.1
:author: Brock Tibert
:twitter: @brocktibert

== Problem Statement

In my use-case, I am modeling a marketing campaign by which I want to find the shortest path to a click.  Simply, the first outreach that 
generated a click through.  


== Toy Database

//hide
//setup
//output
[source,cypher]
----
MERGE (c1:Contact {id:1})
MERGE (c2:Contact {id:2})
MERGE (c3:Contact {id:3})
MERGE (m1:Email {ts:1})
MERGE (m2:Email {ts:2})
MERGE (m3:Email {ts:3})
MERGE (m4:Email {ts:4})
MERGE (m5:Email {ts:5})
MERGE (m6:Email {ts:6})
MERGE (o1:Open {ts:11})
MERGE (o2:Open {ts:12})
MERGE (o3:Open {ts:13})
MERGE (o4:Open {ts:14})
MERGE (k1:Click {ts:20})
MERGE (k2:Click {ts:21})
MERGE (k3:Click {ts:22})
MERGE (t1:Topic {name:"A"})
MERGE (t2:Topic {name:"B"})
MERGE (t3:Topic {name:"C"})
CREATE (m1)<-[:HAS_TOPIC]-(t1)
CREATE (m2)<-[:HAS_TOPIC]-(t2)
CREATE (m3)<-[:HAS_TOPIC]-(t3)
CREATE (m4)<-[:HAS_TOPIC]-(t1)
CREATE (m5)<-[:HAS_TOPIC]-(t2)
CREATE (m6)<-[:HAS_TOPIC]-(t3)
CREATE (c1)-[:SENT]->(m1)
CREATE (m1)-[:NEXT_EMAIL]->(m2)
CREATE (m2)-[:NEXT_EMAIL]->(m3)
CREATE (c2)-[:SENT]->(m4)
CREATE (m4)-[:NEXT_EMAIL]->(m5)
CREATE (m5)-[:NEXT_EMAIL]->(m6)
CREATE (m1)-[:WAS_OPENED]->(o1)
CREATE (m2)-[:WAS_OPENED]->(o2)
CREATE (m2)-[:WAS_OPENED]->(o3)
CREATE (m5)-[:WAS_OPENED]->(o4)
CREATE (m3)-[:WAS_CLICKED]->(k1)
CREATE (m5)-[:WAS_CLICKED]->(k2)
CREATE (m6)-[:WAS_CLICKED]->(k3);
----

//graph

What I want to do is find the topic for the first outreach that generated a click.  If 
someone has more than 1 click in their sequence, I only want to return the first.

[source,cypher]
----
MATCH (c:Contact)
WITH c
MATCH (c)-[*]->(e:Email)<--(t:Topic)
WHERE t.name <> 'A'
MATCH (e)-->(k:Click)
MATCH p = shortestPath((c)-[*]->(k))
RETURN p
----

//table

From above, Contact id = 1 has one path, as expected, but there is only 1 click in the sequence.  On the other 
hand, Contact id = 2 has two paths returned, because in the stream, there were two click events on different 
emails.  In the case of Contact id = 2, I only wanted the path of to the email that is tagged with Topic B.

With above, how can I go about isolating a single path for each contact if, and only if, there is a click event.  Simply, 
I don't want to match to every email that had a click event for each contact; I only want the path to the first click.

Thanks in advance.


== Update 

Got this help from the slack channel, @michael.neo  


[source,cypher]
----
MATCH (c:Contact)-[r*]->(e:Email)-->(k:Click),(e)<--(t:Topic)
WHERE t.name <> 'A'
WITH c,k,size(r) AS dist ORDER BY dist LIMIT 1 
MATCH p = shortestPath((c)-[*]->(k))
RETURN p
----

//table


== Update 2   

from @chris.graphaware.  This works as expected.

[source,cypher]
----
MATCH (c:Contact)
WITH c
MATCH (c)-[*]->(e:Email)<--(t:Topic)
WHERE t.name <> 'A'
MATCH (e)-->(k:Click)
WITH c, collect(k) AS clicks 
UNWIND clicks AS click
MATCH p = shortestPath((c)-[*]->(click))
WITH c, p, length(p) AS l
ORDER BY l ASC 
RETURN c, collect(p)[0]
----

//table