Hi - I am trying to load a planet file into Postgres on a very fast University Networked VM and the relation processing speed is extremely slow. But in comparison the Node & Way processing seems to be very fast (when comparing to the OSM2PGSQL benchmarks). My current estimates for the Relation processing to finish is near enough 20 days + !!! I have tried about 20 different loads with different settings - but always the relations are slow. Processing: Node(4512083k 5086.9k/s) Way(494141k 137.53k/s) Relation(40080 2.16/s) Can anyone point to an obvious mistake or error? Or provide some advice? I'm using the latest OSM2PGSQL build - 0.96.0 64bit ( I have tried 0.94 - but this was even slower) HARDWARE
COMMAND osm2pgsql -v -l --unlogged --create --slim -C 30000 --number-processes 8 --flat-nodes /var/data_2/planet.nodes -S /usr/local/share/osm2pgsql/default.style --tablespace-slim-index index_space --tablespace-main-index index_space --hstore -d osm_planet -U osm planet-latest.osm.pbf PostgreSQL Settings - Version 9.6.8 / PostGIS 2.3.3
asked 25 May '18, 09:52 mike_de_funk |
a) the relation is slow slow because any associated geometries need to be built from the constituent relation members (just as way processing was slower than node processing). b) the processing speeds up as the older relations tend to have more large ones and the fast to process ones (turn restrictions etc) tend to have become popular later on (you should have seen a similar effect during way processing). c) I wouldn't set number-processors to 8 if you have 8 cores, you will have an additional database process per osm2pgsql thread and you may be overwhelming available memory d) check that you are not swapping/paging (one time out is ok, back in not) and that you are not limited by the SAN, these days fast imports tend to run directly to local SSDs. answered 25 May '18, 10:02 SimonPoole ♦ Hi Simon - thanks for the reply. a) I know that Relations would be slower - just not this slow. I did a full planet import over a year ago on much less powerful infrastructure and was getting ~100/s. this led to a successful import in just under 3 days. c) The machine actually has 16 cores, but I will try an import with maybe 6 processes to see if it makes any difference. d) I created a 20GB swap partition on the VM prior to the load and the system never uses any swap memory Do you have any comments on the Postgres settings? Anything there that might be limiting?
(25 May '18, 12:31)
mike_de_funk
Can you check that the --flat-nodes file is actually being populated (and IMHO that definitely should be local), though you would notice that processing ways too? Modern versions of osm2pgsql are far better in caching so the -C 30000 can be a lot smaller, but as you said you are not swapping so that is unlikely to have an effect. In general I would consider anything over two days for a full planet import on a machine half up to the job is too long, so I agree with you there. But I haven't seen any thing in your config that would cause any such problems, so I would continue to suspect the SAN.
(25 May '18, 13:39)
SimonPoole ♦
1
Hi Simon I spoke to our infrastructure team at the University - they spun up another VM but this time with "direct access" to the SAN. I'm not an infrastructure specialist but they said it's using ceph rather than iscsi. I kicked off a new planet load and the difference was huge. Full planet import in 16 hours :) Processing: Node(4512083k 3689.4k/s) Way(494141k 114.12k/s) Relation(5827810 456.22/s) Thanks for your help - but it just looks like it was a hardware issue.
(28 May '18, 11:11)
mike_de_funk
|