Local Government Shared Services – wrestling with data, PostGIS and stuff

This making maps business is certainly not as easy as it seems! I have just burnt an inordinate amount of time trying to make a half decent map and publish it to the web, but I am learning stuff on the way even if I did disappear up some blind alleys.

2 years ago, I wrote a piece “Could you make a better map than this?” which questioned the way in which the Local Government Association were presenting shared services partnerships on a map. I challenged people to make a better map with the offer of a small prize but no one took up my offer. A few weeks ago, the LGA published an updated version of the map and also provided a link to download the data, well the map is pretty similar to the original one that I criticised, have a look:

Note (2/7/17): The map was updated in June 2017 with new data, the problems are still there. Apparently no one at the LGA read my post 🙁 The remainder of this post is about wrestling with the 2016 data not the most current data, I can’t think of a reason to redo the exercise.

For me this map is cluttered with too many markers and colours and numbers on the markers that are confusing. So I thought this time I will try to make something, hopefully better. I am not sure that I have succeeded as representing this data was more difficult than I imagined and I think I have stretched the limits of my technical skills and of the QGIS2Web plugin. So here is a summary of what I tried, maybe it will help someone else.

Wrestling with the data

The data is in a csv file with almost 1700 rows, 1 row per organisation per service/partnership (so if 20 organisations are in a partnership there are 20 rows with pretty much identical data). Some simple metadata about the columns and their significance would have been helpful (e.g. there was a count field that didn’t seem to make any sense to me).

Lesson 1 when you are cleaning up data don’t delete stuff that you think you don’t need too early, you might discover why it was useful later on and have to restart!

I spent a while looking at the data and decided that it might be more effective to represent each service as a multi-polygon covering the Local Authorities. So the first task was to give each partnership a unique ID, how do you do that in Excel? A bit of googling and I cam up with this simple answer: Insert a column (A in this case) for the Service ID then sort the data by service name (column B), set A1=1 then use this formula =if(B2=B1, A1, A1+1) in A2 and copy down the column. Now you have a series of Service IDs, copy and paste as values so that they are fixed and don’t recalculate as you resort or delete some of the data.

SQL is hard

Teaching yourself SQL involves a heck of a lot of trial and error, if I had’t had a lot of help from my friend Aileen Heal (a database guru IMHO) I would never have got a result. Here is the sequence of stuff that I did:

I imported the data into PostGIS and filtered out the services that had shut down or where there was no information to show that it was live to create a table of live services (ca 1100 rows)
The data had an ONS Code for almost every row, so I grabbed the latest version of BoundaryLine from the OS OpenData site and loaded the Counties and Districts/Boroughs/Unitaries into PostGIS
Then I appended the Counties data to the Districts so that I had one table with all of the boundaries and ONS codes
Then I added a geometry column to the table of live services and updated the column with the geometry from the combined boundaries table using the ONS Code (setting the coordinate system and adding a spatial index)
Then I created a table that created a multi polygon for each row within a group of a service ID. I wouldn’t have had a clue how to do this and couldn’t find an example on line so it was good that I could “phone a friend”.
Then I could join the attribute data from the live table to the union table that had the multi polygons to create a view with 256 live services displayed as multi polygons.
The final step was to work out how to split these 256 rows into a manageable number of categories to portray on the map. The original data had 27 different categories which was way too much (see the massive rainbow legend at the bottom of the LGA map above). I worked out how to run a Select, Count, Group and Order query to list all of the categories and the number of rows in each, pretty pleased with that (google is fantastic for working this stuff out). I ended up creating 9 higher level groupings that made some sense to me (e.g combining “Shared Management” with “Shared Leadership”) using select queries.

Job done! Well not first time, there were several iterations before I got it all right 🙁

Lesson 2 save your SQL queries so that you can reuse them if you have to start again

The QGIS bit

This bit should have been really easy but of course it took at leat 6 or 8 tries before I got it all to work.

For some reason my views weren’t opening in the load PostGIS table dialogue (now fixed needed to select a feature ID) so I used the DB Manager tool and “Add to Canvas” which worked fine.

The next step was to do a bit of styling of the layers in QGIS, get them in a sensible order, set some transparency so that you could partially see the layers where they overlap, it would be nice if you could set transparency on a group of layers in one go in QGIS but if there is a way I couldn’t find it. I added a project title (which would appear in my web map) and saved the whole thing as a QGIS project so that I could get back to it.

QGIS2Web

Then it was over to QGIS2Web to build my web map. I have done this before so I sort of knew what I was doing but the volume of data made the whole process rather slow and got me wondering whether this was such a good idea. QGIS2Web has a feature to thin down the geometry and I set that to the highest level, played around with the zoom ranges (there’s no point in being able to zoom down to street level when the data is at local authority level), added a legend and a search box and I was almost there. Last task was to edit the javascript to limit the size of the popup and add a scroll bar if the attribute data is too long for the window. After a quick test and tweak or too I was ready to publish the project to my web server, I zipped it up and uploaded it – aargh it’s nearly 50MB! The moment of truth came loading from the interwebby place – it loaded about as fast as a tortoise doing a 100m sprint.

Realisation, I need to simplify the geometry in PostGIS and then go through the process again 🙁

Back to PostGIS

Another call out to my friend Aileen who suggested a neat way to solve this little problem.

Back up the table with the union geometries just in case
Add a column for simplified geometry and create an index
Run the simplify geometry function to update this column with a simplified geometry I used a setting of 10 (no idea what level of simplification it implies but I had used it before on another data set with good results)
Recreate the master view to use the simplified geometry but aliasing it as “geom” so that all of the dependent views would still work (clever stuff from Aileen)
Rerun a load of queries that fortunately I had saved and “Bob’s your uncle”

Lesson 3 if you are using the QGIS2Web plugin thin your data down unless you really need high levels of precision. PostGIS does that better than QGIS or QGIS2Web

The finish line

Reload the project and zippo you can see how much faster the layers load. Into QGIS2Web and a bit more tweakery (switch off the simplification here as it has already been done in PostGIS). Test and re-upload to my web server – now it is only 9MB, that should speed my tortoise up to harelike standards.

And here is the result

It’s not perfect by any means

The QGIS2Web plugin doesn’t provide the ability to drill down through layers or on overlapping polygons, I need to do some fiddling to see if I can come up with a fix for that.
It looks quite pretty and is less visually cluttered than the original
The polygons provide a clearer view of the extent of a service/partnership particularly if you switch off some of the layers. Maybe it would be better to start with the layers switched off?

Verdict – could be better but I have probably gone as far as I can with the tools I have available. The alternatives would either be to use one of the services like ArcGIS Online (I know, but it is pretty damn slick and easy) or CartoDB or think about installing GeoServer on my web server and learning how to use that which is just a leap to far for me at the moment.

Think you can do better? I’d love to see what an “expert” could do.

Massive thanks to Aileen for all her help.