Introduction to Graph Database

Martel Richard
4 min readNov 27, 2018

--

If you ask anybody what kind of database is used in your application, majority will say Relational Database. It has been the de facto standard for database that people go to when they start building an application and it usually suits their requirements. Relational database has been around since 80's and most of the world’s top enterprise applications uses it even today. It is still the best database to store structured data in the form of tables. Each row is uniquely identified by a primary key. We query data from relational database using SQL. For example,

Table: Person

To query all male persons, below is the query,

SELECT * FROM PERSONS WHERE GENDER = ‘MALE’;

When the application grows bigger, a single table may not be sufficient to represent an entity as it’s attributes may also grow and data will be spread across tables in a normalized database. This is where the relationship between tables comes in.

Say there is another table Department

Table: Department

To associate a person to department, a foreign key dept_id will have to be introduced to persons table which is the primary key of departments table and it will be a one-to-one relationship.

Table: Person with foreign key

When a person is associated with multiple departments, then it will be one-to-many or many-to-many relationships. This relationship is usually represented in another separate table with person_id and dept_id

Table: Dept_Person

To retrieve data using SQL, a JOIN has to be used between tables. In case of may-to-many relationship, the query will be something like below to retrieve the person details with departments.

SELECT * FROM PERSON INNER JOIN DEPT_PERSON ON PERSON.ID = DEPT_PERSON.PERSON_ID

This is a simple example and when we include tens of departments, sub-departments, different portfolios, etc, just to represent the relationship between tables will be a nightmare and to query them using multiple JOIN will take a performance hit in relational database. Graph database essentially addresses this drawback.

Graph are basically a set of vertices connected by edges. In a graph database, each real world entity will be stored as a vertex and relationship between them will be represented in edges.

When a person is involved in multiple departments, simply have one more edge from that person vertex directing to the other department and be done with it. There is no overhead of having another column or table and maintaining them.

Each vertex can have properties or attributes describing about it. Here each person vertex will have Name and Gender properties. Each edge will also have a label to denote the kind of relationship it has with the other entity. If there are more than one relationships between two entities, we can simply have two edges to the other entity denoting their relationships.

Querying in graph database will be something like below.

Graph g = GraphFactory.getIntance(“myfirstgraphdb”);

Vertex peter = g.addVertex(label:“Person”, name:“Peter”, gender:”Male”);

Vertex = publicRelations = g.addVertex(label:“Department”, name:“Public Relations”);

Vertex urbanDevelopment= g.addVertex(label:“Department”, name:“Urban Development”);

peter.addEdge(“Involved in”, publicRelations);

peter.addEdge(“Involved in”, urbanDevelopment);

To find Peter -> g.find(name: “Peter”)

To filter with male gender -> g.has(gender: “Male”)

These are just basic operations in graph database and there can be much more advanced querying available with filtering, sorting, etc. In essence, graph database beats the rigid nature of relational database and simply queries the data by traversing across the edges. There are many providers available for graph database and currently Neo4j provides an enterprise grade graph database with many salient features. There is also a free version with limited features available or you can try Tinkerpop, an Apache flavor of graph database to explore more on this.

--

--

No responses yet