Understanding Semantic and Property Graphs

What is a Graph Model?

A Graph is a data/information model that intends to return information to its natural, connected state. In the real world, everything is connected, and no predefined structure for the information exists. And in our thinking we more often investigate information describing related entities than search and list through similar blocks of information for many unrelated entities.

However the restrictions of paper forms and catalogues, and the limitations of the relational model that were necessary due to highly limiting hardware, made us think in terms of tables and records.

The Graph model takes us back to the world of connected entities, which can be connected in various ways, and have whatever properties they please.

Graph Model as an Enterprise Data Storage

Relational Databases are not going anywhere. They will remain the only solution for working with massive quantities of pre-defined records. However Graph Databases are increasingly employed for tasks that fall outside of that formula.

The Graph model is seen as the way to integrate and connect many diverse datasets.

Also, for data that emphasize connectivity and complexity, a Graph database can significantly reduce development time while increasing performance.

Graph Models from Executive Perspective

There are two different Graph models:

  • Semantic Graph

  • Property Graph

While they have similarities, and some products support both, it is important to understand the difference.

Semantic Graph

The Semantic Graph model is based on the “Semantic Web” ideas by the inventor of the World Wide Web Sir Timothy Berners-Lee. The foundation of it is the idea of world-wide (or enterprise-wide) integration of information. The key features are:

  1. Built-in data and metadata integration and extensibility. As a result, any local, tactical development can be extended or integrated into an extended enterprise-wide Knowledge Graph.

  2. Strong international standards for representation of metadata, in form of OWL Ontologies.

  3. Presence of expertly defined Ontologies, covering Finance Industry (FIBO), Earth Sciences (SWEET), and many other domains.

  4. Strong standards for data representation and query, that together with common Ontologies enable involvement of external resources and expertise.

  5. Reasoning and generalisation of Classes and Properties allow for coexistence of more generic, business-friendly layer with detailed technical information

  6. Reasoning, classification and integration naturally provide platform for a decision-support system, even if not planned initially

  7. Non-technical domain experts can be trained to evaluate Ontologies and write SPARQL queries

  8. Data in a Semantic Graph is naturally discovered. For example, after 4-day training, the trainees were discovering information in DBPedia without any prior knowledge on its structures

  9. Another side of Semantic Graph is the Linked Data data publication standards.

Property Graph

Property Graph databases emerged bottom-up, from the projects that emphasised traversing of links over data filtering. A Property Graph implementation will likely be an engineering undertaking, with a lot of ad-hoc decisions on structure, properties, metadata etc. Some features of a Property Graph implementation would be:

  1. There is a standard query language, however it is an extension of Groovy programming language. Only engineers are expected to use it.

  2. Non-technical users will be accessing the Graph via applications. The input from stakeholders, technical users etc would be via traditional Business Analysis process.

  3. There are no standards on naming of the vertices (nodes), or representation of metadata.

  4. While introducing new Properties doesn’t require any additional effort, there is no established way of recording the properties and their meaning.

  5. Unlike with Semantic Graphs, integration of two Property Graphs will not happen automatically.

Overall, use of a Property Graph storage is not something to be exposed to end-users or stakeholders, who instead have to evaluate the expected visual or data management functionality.

Unlike a Semantic Graph, Implementing a Property Graph is destined to be a proprietary engineering feat which bring traditional problems associated with software engineering in a non-software enterprise: continuity, project monitoring, [lack of] documentation etc.

At the same time, for a team of engineers, a Property Graph platform can have advantages over a Semantic Graph platforms - some of them are outlined below.

Semantic and Property Graphs from Engineering Perspective

What the Graph consists of

A Semantic Graph consists of Triples in form of <subject> <predicate> <object>. Both Subject and Predicate must be URIs, while an Object can be either a URI or a Literal. A Semantic Graph doesn’t specifically define nodes/vertices - the only way a Node can be known is by being mentioned in a Triple.

A Property Graph consists of Vertices (Nodes) and Edges. Each Edge and Vertice has an internal GUID and can have any number of Key-Value pairs associated with it. Thus a Property Graph can effortlessly associate values with Edges, which Semantic Graph cannot do. While some of the values can be URIs, it is not required.

Identity

For a Semantic Graph, URI is the GUID of a Node or Predicate. Please note, that URI of a Predicate defines the type of the Predicate, not the individual Triple (Edge in Property Graph talk)

For a Property Graph, each Edge and each Vertice (node) has an internal GUID. That enables multiple edges between the same Vertices with the same properties (key-values).

Metadata

For a Semantic Graph, Metadata makes a part of the Graph, and uses Predicates and Nodes defined by the OWL (Web Ontology Language). Metadata can be retrieved by a user with zero knowledge of a particular implementation of Graph. There are many Ontologies widely and freely available, and we usually recommend our clients to either reuse an Ontology, or develop for reuse.

For a Property Graph, there is no universal standard for Metadata. Naturally, any project with a chance of success must have some information about Vertices and Edges types and how they are represented and related, but that information would be project-specific, recorded in a project-specific way - or not.

Query

A Semantic Graph is queried using SPARQL, an SQL-like declarative query language. The author successfully taught SPARQL to non-technical, non-IT enterprise staff.

A Property Graph is queried by Gremlin query language, a Java-looking language (based on Groovy programming language) that has both declarative and imperative capabilities. We teach Gremlin together with some elements of Groovy, and recommend that the participants have at least some programming education or experience.

Reasoning

Most Semantic Graph platforms support classification and generalisation Reasoning. A query for a Person would return anyone defined as a Consultant, because a Consultant Class has been defined as a subclass of Humans. A query for all Trainers would return anyone offering a Training Course, even if the person was not explicitly defined as a Trainer.

Property Graph doesn’t have Reasoning defined on the platform level.

Scalability

Both types of Graph storage can be scaled massively, although the approaches are different. For a massive Property Graph one has to choose a highly scalable Graph DB implemented on top of HBase or Cassandra. A Semantic Graph is scaled through so-called Virtual Graph, where data remain in Relational or NoSQL storage, however are made available for SPARQL queries.

That difference is due to higher complexity of Semantic storage, including reasoning. While there were several attempts to build a Semantic storage on top of Cassandra or HBase, none succeeded.

Analytics

One will find it rather awkward trying to implement graph algorithms, like Page Rank or Trust Propagation, on a Semantic Graph. Part of the problem is inability to attach values to triples.

It is possible to implement similar algorithms on a Property Graph.

However, true performance at scale would require exporting a graph into a sparse matrix file (possibly an HDFS file), then using either Apache Spark GraphX, or some GPU analytics package. Then the results must be populated back into the Graph. `