Tags

Design

Introduction

The first thing we need to do is understand what graph processing is and how it relates to SQL Server. In the second part of this series, we will talk about Entity Framework Core and how we can use it with SQL Server graphing. If you don’t know or care about EF Core though, don’t worry, that’s the last time we’ll mention it in here!

Here however, we will focus entirely on the database.

You may be thinking about bar and pie charts? Or now maybe just pies… Either way, you’d be wrong or maybe just really hungry. In SQL Server 2017 (which of course means also in Azure SQL by default) Microsoft introduced graphing to the database engine, but what is it? From the Microsoft documentation:

A graph database is a collection of nodes (or vertices) and edges (or relationships).

I like the term “relationships” in this context as it describes quite succinctly what is going on. For example, let’s say you have a database table containing People, the relationship here could be “is a friend of” that connects 2 people in the table.

There’s nothing this feature brings that SQL Server couldn’t do before, but it does bring some important benefits:

  1. Traversing the graph makes queries much easier and readable. Previously you would need to laboriously JOIN tables together.
  2. Relationships don’t require primary keys and foreign keys — all this is handle for you.
  3. Graph queries can perform better because the database engine can apply certain optimisations.

The Tables

There’s a few minor differences with regards to the tables. Other than these, the tables basically behave exactly as you would expect and you can run all your normal CRUD operations on them.

Creating Tables

First, we need to tell SQL Server that the tables are nodes or edges, and that is as simple as tagging “AS NODE” or “AS EDGE” on the end. So instead of this:

CREATE TABLE FooNode 
(
    Id INT NOT NULL IDENTITY PRIMARY KEY, 
    Bar NVARCHAR(50)
)

We have:

CREATE TABLE FooNode 
(
    Id INT NOT NULL IDENTITY PRIMARY KEY, 
    Bar NVARCHAR(50)
) AS NODE

A relationship (or edge) table is similar:

CREATE TABLE FooEdge 
(
    Bar NVARCHAR(50)
) AS EDGE

And in fact, an edge table doesn’t even need to have user-defined columns:

CREATE TABLE FooEdge AS EDGE

Relationships and Keys

The second difference is that we don’t have explicit relationships between nodes and edges; there are no foreign keys! You can link any two nodes together with any edge you like. So if you have node tables for people and locations and edge tables for likes and owns, it is now trivial to say “people like locations”, “people like people”, “people own locations” and even “people own people” (though I highly recommend you stay well away from that last one!)

So how does SQL Server know how to relate rows together? Well for that we have some psuedocolumns

Psuedo-what-now?

Despite my editor wanting to auto-correct psuedocolumn to pseudocelli (which apparently is a pore or an eye on some sort of primitive insect… definitely not one for the trypophobics out there) it is a real thing. These are additional columns (a bit like calculated columns) that are created for you and that you cannot edit. For example, taking our FooNode table from earlier, let’s run a SELECT * across it:

Notice we have a new $node_id column, the dollar sign prefix is what identifies it as a psuedocolumn. You can effectively forget about the seemingly random set of characters that follows the column name. You can select that column manually with:

SELECT $node_id FROM FooNode

As for the edge tables, there’s a couple more psuedocolumns. In addition to $node_id we also have $from_id and $to_id:

These should start to give you an idea how SQL Server is constructing the graph and the relations between nodes.

Creating Relationships

This is simply a matter of inserting the $node_id of two nodes into an edge table. For example:

INSERT INTO FooEdge
($from_id, $to_id, Bar)
VALUES 
(
    '{"type":"node","schema":"dbo","table":"FooNode","id":0}', 
    '{"type":"node","schema":"dbo","table":"FooNode","id":1}',
    'Baz'
)

In a real situation you wouldn’t have the $node_id values specified this way, you would almost certainly do something like this:

INSERT INTO FooEdge 
($from_id, $to_id, Bar) 
VALUES 
(
    (SELECT $node_id FROM FooNode WHERE Id = 1),
    (SELECT $node_id FROM FooNode WHERE Id = 2),
    'Baz'
)

And that’s it! Pick any nodes you like and relate them together.

Querying the Graph

The real power of SQL Server graphing is getting data out of the system. Imagine you were doing this in a classic SQL relational way with foreign keys and joins. I’m sure you’ve all felt the same pain as me when trying to remember the names of the columns (“Is it ‘ID’ or ‘ProductID’?”)

Well now we have the MATCH clause. Given our tables above we can query it like this:

SELECT
    SourceFoo.Id, 
    DetinationFoo.Id 
FROM 
    FooNode AS SourceFoo, 
    FooEdge, 
    FooNode AS DestinationFoo 
WHERE MATCH(SourceFoo-(FooEdge)->DestinationFoo)

That’s it. It might look a bit daunting at first, but when you look closer, all you’re doing is specifying 3 things. A node, a relationship and another node.

You can go one step further too and traverse the graph over a second hop:

SELECT 
    SourceFoo.Id,
    DetinationFoo.Id 
FROM 
    FooNode AS SourceFoo, 
    FooEdge AS FirstEdge, 
    FooNode AS IntermediateFoo, 
    FooEdge AS SecondEdge, 
    FooNode AS DetinationFoo 
WHERE MATCH(SourceFoo-(FirstEdge)->DetinationFoo-(SecondEdge)->DetinationFoo)

Now imagine how that would look as a series of JOINs, oh… the horror!

Worked Example

The Setup

OK, so the above is all a bit dry, how about we do something a little more substantial and tangible. Let’s say we want to create a social media platform that lets people make friends and rate locations around your city. First things first… creating the nodes and for this all we need are people and locations.

CREATE TABLE People 
(
    Id INT NOT NULL IDENTITY PRIMARY KEY, 
    FirstName NVARCHAR(50) NOT NULL, 
    LastName NVARCHAR(50) NOT NULL
) AS NODE

CREATE TABLE Locations
(
    Id INT NOT NULL IDENTITY PRIMARY KEY,
    [Name] NVARCHAR(100) NOT NULL
) AS NODE

Now the edges, our relationships are simply friends and ratings.

CREATE TABLE Ratings
(
    Score DECIMAL NOT NULL
) AS EDGE

CREATE TABLE Friends
(
    ConnectedOn DATETIME NOT NULL
) AS EDGE

And since empty tables are useless without data, let us seed some in there.

INSERT INTO People 
(FirstName, LastName) 
VALUES
    ('Alice', 'Armstrong'),
    ('Bob', 'Barnett'),
    ('Charlene', 'Cox'),
    ('Doreen', 'Davies')
    
INSERT INTO Locations 
([Name]) 
VALUES
    ('Tyne Bar'),
    ('Tilleys'),
    ('Gotham Town')

Now since we’re being friendly, we should make sure our fake people are also friendly. To do this we are going to add in some relationships by inserting rows into our edge tables. In our case, Alice is friends with everyone we will link here with everyone else using the $node_id values from the node tables, these are effectively our keys.

INSERT INTO Friends VALUES
    ((SELECT $node_id FROM People WHERE FirstName = 'Alice'),
     (SELECT $node_id FROM People WHERE FirstName = 'Charlene'),
     GETDATE()),
    ((SELECT $node_id FROM People WHERE FirstName = 'Charlene'), --Charlene is also friends with Alice!
     (SELECT $node_id FROM People WHERE FirstName = 'Alice'),
     GETDATE()),
    ((SELECT $node_id FROM People WHERE FirstName = 'Alice'),
     (SELECT $node_id FROM People WHERE FirstName = 'Bob'),
     GETDATE()),
    ((SELECT $node_id FROM People WHERE FirstName = 'Charlene'),
     (SELECT $node_id FROM People WHERE FirstName = 'Doreen'),
     GETDATE())

Since everyone loves the Tyne Bar, everyone has visited and given it a rating of 5 out of 5

INSERT INTO Ratings VALUES
    ((SELECT $node_id FROM People WHERE FirstName = 'Alice'),
     (SELECT $node_id FROM Locations WHERE [Name] = 'Tyne Bar'),
     5),
    ((SELECT $node_id FROM People WHERE FirstName = 'Bob'),
     (SELECT $node_id FROM Locations WHERE [Name] = 'Tyne Bar'),
     5),
    ((SELECT $node_id FROM People WHERE FirstName = 'Charlene'),
     (SELECT $node_id FROM Locations WHERE [Name] = 'Tyne Bar'),
     5)

Most people are sensible, so they don’t particularly like Gotham Town though. Well, apart from Bob who, let’s face it, is a bit of an animal.

INSERT INTO Ratings VALUES
    ((SELECT $node_id FROM People WHERE FirstName = 'Alice'),
     (SELECT $node_id FROM Locations WHERE [Name] = 'Gotham Town'),
     1),
    ((SELECT $node_id FROM People WHERE FirstName = 'Bob'),
     (SELECT $node_id FROM Locations WHERE [Name] = 'Gotham Town'),
     4),
    ((SELECT $node_id FROM People WHERE FirstName = 'Charlene'),
     (SELECT $node_id FROM Locations WHERE [Name] = 'Gotham Town'),
     1)

Oh, and Charlene is a big drinker who also love Tilleys.

INSERT INTO Ratings VALUES
    ((SELECT $node_id FROM People WHERE FirstName = 'Charlene'),
     (SELECT $node_id FROM Locations WHERE [Name] = 'Tilleys'),
     5)

Right, now what to do with all this lovely data? All the above is just boring data that we could have created with primary keys and foreign keys, the real power of SQL Server graphing is in the querying. Let’s ask some interesting questions of our database.

The Queries

First, we will start with something simple, get the average rating for all locations and order it best to worse

SELECT
    Locations.[Name],
    COUNT(*) AS [Number Of Ratings],
    AVG(Ratings.Score) AS [Average Rating]
FROM
    People, Ratings, Locations
WHERE 
    MATCH(People-(Ratings)->Locations)
GROUP BY 
    Locations.[Name]
ORDER BY 
    AVG(Ratings.Score) DESC,
    COUNT(*) DESC

Which gives us a nice list of places we might like to visit.

OK, something a little more complex, how about we get a list of all locations that were rated 5 stars by Alice’s friends?

SELECT
    Locations.[Name]
FROM 
    People AS People1, 
    Friends, 
    People AS People2, 
    Ratings, 
    Locations
WHERE 
    MATCH(People1-(Friends)->People2-(Ratings)->Locations)
    AND People1.FirstName = 'Alice'
    AND Ratings.Score = 5
GROUP BY
    Locations.[Name]

Note the double hop over the graph here from people to people to locations. Another point is that we need to alias the People table as we are using it twice. This gives us two locations

It’s important to note that relationships are directional. So in this context, think of the friendship relationship more like the way Twitter works rather than Facebook, in that you can follow someone but they don’t necessarily follow you back. So if we naively run this query to get all the people who rated Tilleys, it will return an empty result set.

SELECT 
    People.FirstName
FROM 
    Locations, Ratings, People
WHERE
    MATCH(Locations-(Ratings)->People)
    AND Locations.[Name] = 'Tilleys'

So, how would we change the friendship to be more Facebook style? There’s a couple of ways.

  1. Add the reciprocal relationship, so Alice is friends with Bob and Bob is friends with Alice.
  2. When querying, make sure you always check both directions.

For the second option, a query to find out all of Alice’s friends that are also friends with her might look something like this…

SELECT
    People2.FirstName
FROM
    People AS People1,
    Friends AS Friends1,
    People AS People2,
    Friends AS Friends2,
    People AS People3
WHERE 
    MATCH(People1-(Friends1)->People2-(Friends2)->People3)
    AND People1.FirstName = 'Alice'
    AND People3.FirstName = 'Alice'

Or another variant:

SELECT
    People1.FirstName
FROM
    People AS People1,
    Friends AS Friends1,
    People AS People2,
    Friends AS Friends2,
    People AS People3
WHERE
    MATCH(People1-(Friends1)->People2-(Friends2)->People3)
    AND People1.Id = People3.Id
    AND People2.FirstName = 'Alice'

Other Things to Think About

It’s not all fluffy bunnies though, there are some downsides.

Keys and Indexes

We can (and should) still be creating primary keys and indexes on both NODEand EDGE tables and you can also include the psuedocolumns. Remember that they’re still just tables and there’s no reason people can’t treat them that way.

Weird Relationships

With our example above, it’s possible to create a relationship that says a location can be friends with a person. That really doesn’t make sense unless we’re talking about sentient buildings, and without super advanced AI or haunted houses, we don’t really want that to be allowed. So how would you stop it? As far as I can tell, there isn’t a good way to do this, which leaves us two options:

  1. Create INSERT triggers on the edge tables to validate your from and totables. I’m not a big fan of these generally, there’s usually a better way.
  2. Just allow the relationships to be created. If you’re not going to write queries to retrieve these values, then there’s an argument to have that they don’t really matter.

Which one you choose depends on your business case, and maybe how much you trust your developers to not create silly relationships.

Coming Next?

As you might imagine Microsoft are not sitting on their hands and are working on improvements to this feature. There’s a few key changes in SQL Server 2019 but the main one is almost certainly the ability to add edge constraints. This feature will fix the issue above and prevent queries being able to create those weird relationships.

Conclusion

I hope this has served as a handy introduction into SQL Server graph processing and encourages you to use it in your application. If you do then please get in touch and let me know, I’m keen on learning how people are using this in a production capacity.

My reasons for writing this were two-fold. Firstly, I wanted to learn this myself, solidify my own knowledge and pass that on to others. I don’t think there can be too many tutorials online.

Secondly, this originally came from a question posted on Stack Overflow titled Querying SQL Server 2017 graph tables from Entity Framework. My initial reaction to that was “Hell no, stay away from combining those two things” but after thinking about it a little deeper, I wondered if I could make a pull request to the Entity Framework Core source code to support this feature. Since then I have submitted a pull request but it will be a while before it gets reviewed by the EF team as they are currently planning for v2.2 and after that comes v3. I don’t know if my code will get merged but I wanted to tell the story of how I went through the open source journey and hopefully encourage others to try it out too. After all, isn’t that one of the best things about open source?

Thanks to David Glass for his contribution to our blog this week!

Who Are Ronald James?

We are a leading niche digital & tech recruitment specialist for the North East of England. We Specialise in the acquisition of high-performing technology talent across a variety of IT sectors including Digital & Technology Software Development.

Our ultimate goal is to make a positive impact on every client and candidate we serve - from the initial call and introduction, right up to the final delivery, we want our clients and candidates to feel they have had a beneficial and productive experience.

Contact our Team

If you’re looking to start your journey in sourcing talent or find your dream job, you’ll need a passionate, motivated team of experts to guide you. Check out our Jobs page for open vacancies. If interested, contact us or call 0191 620 0123 for a quick chat with our team.

Let's be Friends!

Follow us on our blog, FacebookLinkedInTwitter or Instagram to follow industry news, events, success stories and new blogs releases.

 

 

Back to Blog

</Follow Us>