Why I Like Neo4j

Earlier this week, I gave a presentation comparing various SQL/NoSQL databases. I provided an overview of five different databases described their particular use cases, but I noticed myself talking about one much more than the rest: Neo4j.

Part of this is because Neo4j is one of the least-known database from a list that includes PostgreSQL, Redis, MongoDB, and CouchDB. But part of it is because it takes a really fascinating approach to data. While I’ve used MongoDB in the past for a number of applications, I’ve really enjoyed using Neo4j for a couple of recent projects – and I think I finally understand why: MongoDB prefers to internalize data attributes, whereas Neo4j seeks to externalize them. (Incidentally, Neo4j’s approach is loosely analogous to the way objects are handled in Lisp).

Alternatively, MongoDB is about data properties, whereas Neo4j is about data relationships.

In MongoDB, the idiomatic way to represent data is as an object, which contains properties. I might have the following:

  { "name" : "John",
     "state" : "NY"
    }



  { "name" : "Frank",
    "state" : "NY"
    }

If I want to find all New York residents. MongoDB selectors lets me do this with a simple query: db.users.find({"state" : "NY"}). However, let’s be clear about what is actually happening: there is no inherent connection between the “state” field in John’s object and the “state” field in Frank’s object. It is almost as if the fact that both fields have the value “NY” is coincidental. Both happen to “own” the same value and reference this property by the same name. The primary way of interacting with data in MongoDB is by aggregating relationships into objects and then querying against those aggregate relationships.

In Neo4j, I might instead do the following:

This represents the exact same information, but the interface is very different, and one difference is particularly notable: John’s “NY” value and Frank’s “NY” value are both the same – not just by coincidence, but by the very structure of our database relationship. Furthermore, neither the John node nor the Frank node “owns” the NY node.

Finding all New York residents is therefore as easy as finding all directed edges of type “lives in” that terminate on the (single!) “NY” node.

Thus, the primary way of interacting with data in Neo4j is by disaggregating data objects into datapoints and then defining relationships between datapoints. It’s a slight stretch, but we can almost say that the relationships are more important than the data itself! Since Neo4j prefers not to internalize information within nodes, a single node is nowhere near as informative as the entire collection of related nodes.

As a fairly blunt analogy, it’s the difference between the following:

{"_id" :  ObjectId("50e21b4623e50ac61370"),
  "field-1" : "value1",
   "field-2" : "value2"}

This is not to say that Neo4j doesn’t support properties – in Neo4j, nodes can have properties (and edges can too). However, aggregating over nodes by property value is likely suboptimal, and it robs you of the very power that a graph-based database is designed to provide.

This parallels the Lisp object system (CLOS) in a way that’s a bit subtle and probably worthy of its own post. For those of us who naturally prefer functional programming styles, however, graph-based databases like Neo4j may provide an interesting way of looking at our data.

Leave a Reply