As Gartner warns in the report, adopting NoSQL storage can result in Information … read more
It is rather ridiculous that the discussion on Data Driven Enterprise did not … read more
Business Abstraction delivered OWL training customised for experienced Information Architects. The target audience … read more
Understanding Semantic and Property Graphs
What is a Graph Model?
A Graph is a data/information model that intends to return information to its natural, connected state. In the real world, everything is connected, and no predefined structure for the information exists. And in our thinking we more often investigate information describing related entities than search and list through similar blocks of information for many unrelated entities.
However the restrictions of paper forms and catalogues, and than of relational model that was necessary due to highly limiting hardware, made us think in terms of tables and records.
The Graph model takes us back to the world of connected entities, which can be connected in various ways, and have whatever properties they please.
Graph Model as an Enterprise Data Storage
Relational Databases are not going anywhere. They will remain the only solution for working with massive quantities of pre-defined records. However Graph Databases are increasingly employed for tasks that fall outside of that formula. Graph model is seen as the way to integrate and connect many diverse datasets.
Also, for data that emphasise connectivity and complexity, a Graph database can significantly reduce development time while increasing performance.
Graph Models from Executive Perspective
There are two different Graph models:
While they have similarities, and some products support both, it is important to understand the difference.
Semantic Graph
Semantic Graph model is based on the “Semantic Web” ideas by the inventor of World Wide Web Sir Timothy Berners-Lee. The foundation of it is the idea of world-wide (or enterprise-wide) integration of information. The key features are:
Property Graph
Property Graph databases emerged bottom-up, from the projects that emphasised traversing of links over data filtering. A Property Graph implementation will likely be an engineering undertaking, with a lot of ad-hoc decisions on structure, properties, metadata etc. Some features of a Property Graph implementation would be:
Overall, use of a Property Graph storage is not something to be exposed to end-users or stakeholders, who instead have to evaluate the expected visual or data management functionality.
Unlike a Semantic Graph, Implementing a Property Graph is destined to be a proprietary engineering feat which bring traditional problems associated with software engineering in a non-software enterprise: continuity, project monitoring, [lack of] documentation etc.
At the same time, for a team of engineers, a Property Graph platform can have advantages over a Semantic Graph platforms – some of them are outlined below.
Semantic and Property Graphs from Engineering Perspective
What the Graph consists of
<p >A Semantic Graph consists of Triples in form of <subject> <predicate> <object>. Both Subject and Predicate must be URIs, while an Object can be either a URI or a Literal. A Semantic Graph doesn’t specifically defines nodes/vertices – the only way a Node can be known is by been mentioned in a Triple.
A Property Graph consists of Vertices (Nodes) and Edges. Each Edge and Vertice has an internal GUID and can have any number of Key-Value pairs associated with it. Thus a Property Graph can effortlessly associate values with Edges, which Semantic Graph cannot do. While some of the values can be URIs, it is not required.
Identity
For a Semantic Graph, URI is the GUID of a Node or Predicate. Please note, that URI of a Predicate defines the type of the Predicate, not the individual Triple (Edge in Property Graph talk)
Fro a Property Graph, each Edge and each Vertice (node) has an internal GUID. That enables multiple edges between the same Vertices with the same properties (key-values).
Metadata
For a Semantic Graph, Metadata makes a part of the Graph, and uses Predicates and Nodes defined by the OWL (Web Ontology Language). Metadata can be retrieved by a user with zero knowledge of a particular implementation of Graph. There are many Ontologies widely and freely available, and we usually recommend our clients to either reuse an Ontology, or develop for reuse.
For a Property Graph, there is no universal standard for Metadata. Naturally, any project with a chance of success must have some information about Vertices and Edges types and how they are represented and related, but that information would be project-specific, recorded in a project-specific way – or not.
Query
A Semantic Graph is queried using SPARQL, an SQL-like declarative query language. The author successfully taught SPARQL to non-technical, non-IT enterprise staff.
A Property Graph is queried by Gremlin query language, a Java-looking language (based on Groovy programming language) that has both declarative and imperative capabilities. We teach Gremlin together with some elements of Groovy, and recommend that the participants have at least some programming education or experience.
Reasoning
Most Semantic Graph platforms support classification and generalisation Reasoning. A query for a Person would return anyone defined as a Consultant, because a Consultant Class been defined as a subclass of Humans. A query for all Trainers would return anyone offering a Training Course, even if the person was not explicitly defined as a Trainer.
Property Graph doesn’t have Reasoning defined on the platform level.
Scalability
Both types of Graph storage can be scaled massively, although the approaches are different. For a massive Property Graph one has to choose a highly scalable Graph DB implemented on top of HBase or Cassandra. A Semantic Graph is scaled through so-called Virtual Graph, where data remain in Relational or NoSQL storage, however are made available for SPARQL queries.
That difference is due to higher complexity of Semantic storage, including reasoning. While there were several attempts to build a Semantic storage on top of Cassandra or HBase, none succeeded.
Analytics
One will find it rather awkward trying to implement graph algorithms, like Page Rank or Trust Propagation, on a Semantic Graph. Part of the problem is inability to attach values to triples.
It is possible to implement similar algorithms on a Property Graph.
However, true performance at scale would require exporting a graph into a sparse matrix file (possibly and HDFS file), then using either Apache Spark GraphX, or some GPU analytics package. Then the results must be populated back into the Graph.
Next Step
Introducing either Semantic or Property Graph is a serious paradigm shift for a traditional IT site. The critical step is training your core staff, the people you can trust, then running a small-scale project that delivers further learning of the technology. Business Abstraction can assist with out training and consulting:
Training for Semantic Graph technologies
Training for Property Graph
Contact us for more information.