On Friday the ODI hosted an Open Addresses Symposium in London (thanks to the nice people at Arup for their great facilities). The event attracted a pretty large audience of people interested in Open Addresses (remember that idea?) including the ‘usual faces’ and veterans of the Address Wars and a wide range of potential users. As one friend commented near the beginning, it is something of a disgrace that after over 15 years we are still arguing about multiple address databases with different owners and content and that a single complete and authoritative national dataset doesn’t exist let alone an open one!
Jeni Tennison, the technical director of the ODI, must be complimented for initiating this event and driving the Open Address agenda forward. ODI have secured government funding for the first prototype phase of an Open Address service (I do hope the call it an Open Address File, just so that we have a snappy acronym).
I got to talk to quite a few people as I was chairing one of the ‘requirements’ groups in the morning session. The question I asked people was “How good is good enough?” and the responses were illuminating to say the least. There were some people who continue to question the need for an OAF, surely the answer is to fix the existing products? Difficult to disagree, except that it hasn’t happened in a decade or more. Some think that a half decent OAF will act as the spur to get OS and RM to work out a solution with the Cabinet Office, I am not sure. Outside of the predictable “why are you bothering?” folk there were a large section of the audience who had varied and interesting use cases for an OAF and who came up with some surprising ideas on “How good is good enough?”
- Those in the public sector who have access to ‘official’ address data through a central purchasing agreement but still struggle to publish their own data as OpenData because of derived data type constraints are keen to have an open data set but some linkage to closed data sets would be needed (Open UPRN?)
- Many who can’t afford to license existing datasets or who are just flummoxed/stymied by the licensing restrictions said that anything that was approaching 90% complete would be ‘good enough’
- Not everyone needed coordinates in a first release, a good address list (say 90%) on it’s own would be useful to them.
- One person who wanted addresses for research said that even as few as 1m addresses free of cost and restriction would be useful
- Someone from the demographics community said that anything less than the current level of completeness would be unusable for them as it might introduce sampling bias. I am not sure how that works as I would have thought that if you are picking a few thousand addresses from 25m+ it wouldn’t be a problem if you selected from 23m instead?
There was quite a lot of disagreement about what an ‘address’ was with one person wanting the details of outhouses and barns to be included but most people agreed that an address was probably the generally understood description of places where we lived, worked, shopped and visited. I expect the ODI team will need to come back to this before going much further with the project.
In the afternoon there was an excellent presentation on the arcane legal issues in building an OAF from apparently Open Data in case it had in some way been validated against PAF. There is a possible opinion that RM doesn’t have a ‘database right’ in postcodes because it created them rather than collected them. In the pub afterwards there was an interesting discussion (surprising that) about sourcing street names and numbers from the originators in local government to circumvent any legal concerns. I know ODI are exploring the legals of OAF very thoroughly, probably more sensible than the disruptive JFDI approach that I would encourage but maybe not as much fun.
We then got into the nitty gritty of open sources of address data, the techniques to build a data set and how we might crowd source addresses using OpenStreetMap. If the core datasets such as Land Registry ‘price paid’ data and Companies House data are deemed free of IP contamination and they can be combined with an open postcode file and some of the OS OpenData (streets, boundaries and gazetteer) we would be well on the way to a base data set. Once the framework is in place other open data sets can sourced and munged to add a bit more detail and then we will be down to the last 10 or 15% (my guess there no science) and that will be the difficult part, it’s also the bit where the crowd will be able to help. Some very determined people may actually be able to manually survey and complete the missing parts but I think it will be smart data people working with passive crowd usage of OAF who will gradually fill in the gaps. For example, if you search for an address that isn’t in OAF, that can be the trigger to create a candidate record, if that happens 3 times you could be pretty sure that the record exists. If we get to the ‘good enough’ stage and then get widespread usage I don’t think it will take all that long to get to ‘pretty damn good’
So it looks like we will have a crowd of address products, the two incumbents and the shiny new open version. Do we need three? No, maybe we only need one but that one needs to be open. Surely it is time to have a complete (well as complete as possible) address data set that is freely available for government, business, research and any other purpose that people can think of?
Maybe we should be saying ‘Two were companies, three will be the Crowd‘ Watch out there is an OAF on the way.