Eating Humble Pie


humble pie, smokin',

Last week I wrote a post about making a map of the locations of US steel plants and the electoral results in their local communities. I concluded with this brief summary

“it’s a first draft and needs a good bit of work, but you can certainly see a pattern – most of the steel plants (and all of the large ones) are in Republican Districts – hardly a big surprise but there you go. I guess big tariffs will play well with hardcore Trump supporters.”

Unfortunately I didn’t follow the advice that I gave to other mappers in my “FAKEMAPS, very dishonest” talk namely

  1. Dig deep into your data, is there really a spatial pattern in it? What is the best way to represent it?
  2. Try different settings and styles (and even projections) to see the difference in output before starting to refine your map
  3. Is your map showing the message that you want to get over? If so be very cautious that your own bias combined with settings isn’t producing misleading results
  4. Less is more, don’t try to map too many variables
  5. Pay attention to detail, it makes an enormous difference. Don’t make your map at the last minute
  6. Get some feedback from one or two colleagues (even an expert) before you publish

I messed up badly on this map particularly with regard to points 3 and 6 🙁

Fortunately Jonathan picked me up on my lack of rigour and commented

“Are they? When I zoom in to many the plants appear to be in small, blue areas – Democratic; e.g. Chicago, Detroit, Cleveland, Pittsburgh. I attribute that to (an assumed) correlation of steel manufacturing and urban areas: urban => generally more Democratic; rural => more Republican.

Perhaps you have generalized from the initial, small scale, appearance? If you made your input data (lat/lon of steel plant; district, party of district) your conclusion might be justified but I’m not getting that from the map.

One might draw some conclusions about those very large capacity steel plants in Mexico and Canada; but that would have a different basis than political favoritism.”

So after acknowledging my error I decided to go back to the raw data and see if I could do a better job second time round.

First up you might want to go back to the original map to remind yourself what it looked like.

I decided to get rid of the Congressional District boundaries and the electoral results choropleth as they just made a visually overwhelming clutter and confused particularly at small scales. I decided to identify the electoral result for the congressional district for each steel plant in the US and then map the points using graduated symbols (based on annual capacity) and coloured by political party. This would mean I could have a relatively clean and uncluttered map of steel plants with a low impact base map to provide some context. Here are the main steps that I worked through (without all of the dead ends, false starts and variations in settings:

  • I used the “Vector | Data Management Tools | Join Attributes by Location” tool to pick up the data from the congressional boundaries and election results and add the attributes to the US Steel Plants points. Remember to remove all of the data summary options (Sum, Count etc) from the dialogue or you will get loads of stats rather than attributes. I couldn’t find a way to choose which attributes to combine with the point data so I ended up with them all and filtered them out in a subsequent step.
  • I saved the result as a GeoPackage rather than a shape file because, well just because.
  • Because I only had city level location data in my raw data I knew that I had several cities with overlapping points representing multiple plants in the same location. I wanted to distribute these points around the central geocode to aid visibility, labelling etc. Not so easy, there is a rendering style in QGIS called “Point Displacement” which does just what it says on the tin (after a lot of tinkering with the settings) but it only affects the visualisation in QGIS and is lost when you export the data to a file format or to a web map (either Carto or QGIS2Web) so it wasn’t much use as I wanted to share the results on the web.
  • The only solution that I could find (after asking a few people at FOSS4GUK last week) was to manually move the coincident points. Fortunately I only had about 160 points in total and only 20-25 needed manually moving, however it would be great to find a way to do this for larger datasets. Tip: work on a copy of your data so that you can go back to the original if you don’t like the results after you have moved points around
  • At this point you should use the “Layer Properties | Fields” settings to set the fields you don’t want to display in the info tool to “Hidden”. If you wait until you have split the data set into 3 subsets you will have to run through this 3 times 🙁
  • Next up I split the data set into 3 subsets – Republican, Democratic and NAFTA (Canada and Mexico). This would make it easier to apply colour styling and graduated symbol sizes than if they were in one data set (no doubt a QGIS superstar would be able to do the whole task in one go). You can do this by writing a simple select expression within the Attribute Table panel, something along the lines of  “2016 Elect” = ‘Democratic’. I then saved my selection as a new GeoPackage file and repeated for the other 2 subsets.
  • In the “Layer Style” panel I then used the “Graduated” style on the annual capacity field using the “Size” method to get graduated symbols. Use “Pretty breaks” and tweak them to give you a nice classification that creates a legend with 4 or 5 classes, you can also manually adjust the size of the symbols for each class if you wish. Finally choose a colour for the symbols (you can do this for all of the symbols in a layer by changing the colour for the “Simple marker” at the top of the dialogue rather than editing for each sized symbol in the classification).
  • You should now have one of your 3 subsets nicely coloured with sized symbols in your QGIS legend. Right click on the layer name and select “Styles | Copy”. You can now select each of the other two subsets and go “Styles | Paste” to paste the size (and colour) settings to the other layers, you then just need to change the colour to finish the job off.
  • Zoom to the extent of the data, save your project and you are ready to publish your data set to the web.

I bumped into Tom Chadwin at FOSS4GUK last week and we chatted briefly about which QGIS styling etc was supported in QGIS2Web so I thought that I would try and publish the QGIS project using the QGIS2Web plugin rather than uploading the 3 new subsets of data to Carto and then having to recreate all of the styling manually within the Carto interface (which is very powerful but quite time consuming). It took me a few iterations to get the map as close to how I wanted it as possible but overall it took less than half an hour to get the web map done. I can’t praise this plugin highly enough, it makes simple web publishing a doddle, you don’t need a map server, a database, an account with Carto or ArcGIS or anything else – have file server on the web and you are up and running. You can see the results here:

I’m not sure about the overlapping symbols which still look messy even though I moved the coincident points and tweaked transparency and blending. Despite the messiness, the overlapping points do enable you to get a quick sense of both size and party. I also tried using the clustering option to simplify the data when zoomed out:

You need to zoom in a bit and pan around to get the picture, it’s more elegant on first glance but at country level a lot of the trends in the data are hidden. Overall I think I prefer the unclustered version.

I also discovered the statistics tool in QGIS “View | Panels | Statistics” which produces a great array of statistics for a layer or selection. The summary of those stats is:

[table id=1 /]

Not quite the results that I had originally expected! Capacity is roughly evenly split between Democratic and Republican districts with the larger plants being in the Democratic districts. As far as imports from Canada and Mexico go, the picture is also quite balanced, most of the northern plants (presumably impacted most by Canadian imports) are in Democratic districts while most of the southern plants (presumably impacted most by Mexican imports) are in Republican districts. Mea culpa, the imposition of steel tariffs does NOT appeared skewed to Donald Trump’s core supporters.

I’ll take more notice of my own advice to others in the future particularly:

  • Is your map showing the message that you want to get over? If so be very cautious that your own bias combined with settings isn’t producing misleading results
  • Get some feedback from one or two colleagues (even an expert) before you publish

Hey, better you try stuff and learn than just sit on the sidelines.