Post

OLAP Graph

I want to present a quick study I’ve done with my colleague João Raimundo about an experiment around OLAP1 operations using a Graph.

We call it: The “Graph” Kimball Presentation2.

Why Graphs?
We found a NoSQL Graph-based OLAP Analysis paper3 that really caught our eye and we wanted to see how this version of OLAP analysis compares to the usual tabular OLAP approach.

We then purpose how to correctly use a Graph for Cube Architecture and OLAP Operations.

I leave this table as an initial reference of how powerful graph-based databases can be, in particular Neo4J which is the same one that is used on this study. Do note that the example presented is a motivation for graph-based databases so the example is a finger-picked one.

efficiency of finding extended friends

Graph Architecture

We follow the NoSQL Graph-based OLAP Analysis paper’s architecture which is presented in the image below.

In here we can easily identify Dimension Nodes that are created in a snowflake architecture (should we compare it to the tabular version), and Fact Nodes where we can spot them out by seeing where all the connections intersect.

It is important to say as well that we can define levels of our Dimensions with new nodes instead of the tabular version where we would need to add a new column.

We can identify three Dimensions: Location, Product, and Time as well as the Fact nodes with grey color.

graph architecture

Practical Example

hypervending

To easily show you how powerful this Graph-based OLAP can be, we created a Hypervending Data Warehouse to use on the demonstrations that will follow.

We can notice that each Dimension is a node, and the same can be said for the Facts.

To make it even more interesting, we can define the levels of each dimension very easily and they can be identified almost without effort by following the archs between the nodes. These archs have to be unidirectional.

Facts and Measures

fact

A Fact Node is identified by the archs directed at it. There is a direct relation between the number of Dimensions and the centrality of the Node.

The Measures are indicated in the attributes of the Fact Nodes.

We can have different types of Fact Node in our OLAP Graph, utilizing the same dimensions. These can be seen as a different data-marts.

Degree and Dimensions

One thing that we can immediately understand from using a graph approach for the OLAP Analysis is that we can identify, without effort, the nodes with the most importance.

The nodes with more connections are the most “popular node”. As we can see from this example, we can identify which months (in grey) are the most popular months for sales

popular months

Slow-changing Dimensions

In our more tabular SQL-driven approach, slow changing dimensions require a special type of treatment, and it can easily be the reason for an over-growth of our dimension tables.

slowly changing dimension arch

With the graph approach, the only thing we need to do is save the property of the versioning in the arch that connects two dimensions.

The arch saves the last time that connection was true in a timestamp, allowing us to query it easily using cypher4, as follows:

1
2
3
MATCH (store:Store {name: "Super"})-[]->(sec:Section)-[arch:HAS]->(prod:Product {name: "A"})
WHERE arch.date >= 2019-05-21
RETURN store, sec, prod

Operations

OLAP allows us to execute some important operations such as Slice, Dice, Drill-Down, Roll-Up.

I will be showing how the same operations can be done using an OLAP Graph instead with the help of illustrative images and some Cypher code

Slice

Get Sales made in “Lisboa”.

1
2
MATCH (f:Fact)<-[]-(c:Customer {name: "Lisboa"})
RETURN f, c

slice

Get Sales made in “Lisboa” and Product line = “Junior”.

1
2
3
MATCH (f:Fact)<-[]-(c:Customer {name: "Lisboa"})
MATCH (f)<-[]-(l:Line {name: "Junior"})<-[]-(p)<-[]-(sub)<-[]-(cat)
RETURN f, c, l, p, sub, cat

Dice

slice

Get Sales made in “Lisboa” and Product line = “Junior” WHERE quantity is larger than 910.

1
2
3
4
MATCH (f:Fact)<-[]-(c:Customer {name: "Lisboa"})
MATCH (f)<-[]-(l:Line {name: "Junior"})<-[]-(p)<-[]-(sub)<-[]-(cat)
WHERE f.quantity >= 910
RETURN f, c, l, p, sub, cat

Roll-up and Drill-Down

Get Sales made in “Spain” with sub-category = “Cold” in the 2nd trimester.

1
2
3
4
MATCH (f:Fact)<-[]-(s)<-[]-(co:Country {name: "Spain"})
MATCH (f)<-[]-(l)<-[]-(prod)<-[]-(sub:SubCategory {name: "Cold"})
MATCH (f)<-[]-(d)<-[]-(m)<-[]-(tri:Trimester {_: 2})
RETURN f, s, co, l, prod, d, m, sub, tri

slice

Get Sales made in “Spain” with product = “Coffee” in the 5th month

1
2
3
4
MATCH (f:Fact)<-[]-(s)<-[]-(co:Country {name: "Spain"})
MATCH (f)<-[]-(l)<-[]-(prod:Product {name: "Coffee"})
MATCH (f)<-[]-(d)<-[]-(m:Month {_: 5})
RETURN f, s, co, l, prod, d, m

Exploration and Conclusion

slice

Given all that has been said, this is our final demonstration of what a Graph approach can give you. Sorry for bad resolution that Giphy allows me to have.

With a Graph OLAP “cube” and Neo4J we can do our exploratory analysis in a way that is unimaginable for the SQL approach.

Is it something that can be used to actually study the data? We say YES! but I’ll leave that answer up to you and your projects where you can hopefully get some inspiration from here :)




This post is licensed under CC BY 4.0 by the author.