By: Philip Howard, Research Director - Data Management, Bloor Research
Published: 19th February 2014
Copyright Bloor Research © 2014
SPARQL (SPARQL Protocol and RDF Query Language) is the most widely supported (which is not the same as most widely used, which it may well not be) language by graph database vendors, regardless of how we define database in this context—that is, irrespective of whether data is actually stored in graph format as triples or as a property graph or using some other storage mechanism.
As you might infer from its linguistic similarity to SQL, SPARQL is a declarative language. That is, you don’t have to know where the data is in order to create and run queries. However, just as with SQL and relational databases, the performance of said queries is therefore dependent upon the database and, in particular, the database optimiser. Unfortunately, while relational databases have sophisticated optimisers, graph databases typically do not (Neo4j is an exception). The same, it has to be said, applies to NoSQL databases in general—you may be able to run SQL (or HiveQL) against Hadoop, for example, but without an optimiser performance is still going to suffer.
The second issue with SPARQL is that, as the "R" implies, it was designed for RDF (resource description framework), which is the basis of the semantic web. It wasn’t designed for business intelligence and analytics. Moreover, while RDF stores may have their place in supporting Web 3.0, for most commercial applications of graph technology there is a clear shift towards property graphs.
The difference between a property graph and a triple store is that in a property graph the edges and nodes of the graph may have values associated with them. As a result, they are much more practical for general-purpose business uses: they are much more compact and nodes do not grow like Topsy every time you add a new attribute (or value).
So, property graphs are becoming the popular option. But that means that SPARQL, developed to support RDF or triple stores, is not particularly well suited to support property graphs: so what language do you use?
Generally speaking the answer is to use a procedural language such as Gremlin (which is a scripting language based on Groovy). However, this has all the drawbacks of being procedural and there are also portability issues associated with Groovy and Gremlin. As far as I know the only company that has a declarative language is Neo Technology, which has developed Cypher alongside its database optimiser.
The problem, from my point of view, is that Cypher is proprietary. Neo4j is considering—and assures me that it re-evaluates on a regular basis—making it open but that’s not going to happen anytime in the immediate future. While it may be good for Neo to be the only vendor to be in this position my opinion (and I know that Neo disagrees with me on this: it’s view is that it doesn’t want the language bogged down in standards discussions at this stage) is that it would serve the market well if Cypher was to be made open and more widely available sooner rather than later. Neo4j would still have the advantage of a database optimiser but I think that the general availability of a declarative language would help to drive the market.
Posted: 20th February 2014 | By Bob DuCharme :
Nice job, but this is misleading:
>The difference between a property graph and a triple store
>is that in a property graph the edges and nodes of the graph
>may have values associated with them.
In RDF (and therefore in triplestores) properties are resources represented by URIs, just like the things that they connect, so you can associate all the data you want with them, just like with property graphs. For example, in the following triples (assume that prefixes has been properly declared) I have associated both standard and custom properties with the x:locatedIn property, including one that will let me infer what building x:chair1 is in even though there is no specific statement about that:
x:chair1 x:locatedIn x:rm253 .
x:rm253 x:locatedIn x:352MainSt .
x:352MainSt rdf:type x:Building .
x:locatedIn rdfs:label "located in" .
x:locatedIn x:startDate "2012-10-12"^^xsd:date .
x:locatedIn rdf:type owl:TransitiveProperty .
Posted: 20th February 2014 | By Bernard Angerer :
thanks for the nice post. have a look at VALID (www.sbsvalid.info)... this is an in-memory solution which allows for contextual, natural labguage like queries on Linked-Data.
Posted: 25th February 2014 | By Philip Howard :
I have been doing some research on this following Bob's comment and I think the following is a more clear statement of the difference between property graphs and tripe stores: in a property graph the edges and nodes of the graph can each have properties, which from a developer's perspective are considered to be part of each object; whereas in RDF, values are resources, and can be associated with other resources via a triple, using largely the same resource-predicate-object approach that one uses to associate disparate objects. The net effect of this is that property graphs are rather more intuitive (I know this is subjective) for non-specialists to understand.
The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.
All fields must be completed to submit a comment. Email addresses are passed through to the author so they can contact you directly if needed.
Published by: IT Analysis Communications Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761