Hi, I am looking to compile a list of UK street names for an infographic project. Would it be possible to scrape the OSM for every street name, and output that data into a spreadsheet? If anyone out there can help with this I'd be very grateful. Thanks, Chris asked 18 Apr '12, 11:26 chrishall |
We reserve the word "scraping" for people who, to our dismay, write clumsy scripts that make tons of individual requests against our API or web site. Don't do that - we're an open data project and we make our data available for download! Grab a data extract for the UK e.g. from the Geofabrik download server, then use a program like Osmosis to filter out only highways:
From the resulting XML file, extract all names - easiest on Linux with something like
and you have your list. (If you prefer DBF files to XML, you could probably download the shp.zip file from the download server and simply open the roads.dbf file.) Caveats:
answered 18 Apr '12, 11:41 Frederik Ramm ♦ Frederik, Thanks for coming back to me. Apologies re. 'scraping', I'm not looking to inconvenience anyone! The second point you make is probably the most relevant - and thanks for bringing it to my attention. I'm not sure I know how to solve this myself - can you help, or recommend anyone who can? If it's time-consuming work I'm willing to pay for the research/make an appropriate donation. Many thanks, Chris
(18 Apr '12, 11:54)
chrishall
It seems like the data set mentioned by Richard and Ed would conveniently circumvent this problem!
(18 Apr '12, 11:59)
Frederik Ramm ♦
|
It might be better to start with something like OS OpenData, particularly the Locator dataset I think. http://www.ordnancesurvey.co.uk/oswebsite/products/os-locator/index.html As yet, OpenStreetMap does not have as comprehensive a coverage as the OS data. answered 18 Apr '12, 11:44 EdLoach ♦ |
OpenStreetMap is arguably not the best data source for your application. You would be better served by using OS Locator, from the Ordnance Survey OpenData release, which has a better licence, a simpler file format, more consistent data, and is more complete. answered 18 Apr '12, 11:44 Richard ♦ |