1

Hi all,

Unfortunately planet.osm.pbf osm2pgsql import crashed because of disk space lack (1TB available).

What I don't understand is that I used --slim --drop options and, after osm2pgsql crash, only 30% of disk space is used (259GB). So I guess the unecessary tables were already dropped when osm2pgsql went out of disk space. I don't think osm2pgsql need more than 700GB, even temporary, to build indexes.

I would like to relaunch the import, do you have any advice to use less disk space ? maybe

  • use --flat-nodes osm2pgsql option ? if yes: do you think 1TB will be sufficent for node file + database ? or I should try to host node file on another 500GB disk ? Can you confirm that --drop osm2pgsql option will drop the node file at the end of the process ?
  • turn on autovaccum as suggested here osm2pgsql import - disk space running out during index creation ?

In fact I got 916GB available, minus planet.osm.pbf file (50GB), so I got ~865GB. If it's confirmed that it is not sufficient for a full planet import, I'll add the information to this post ( Tile server hardware requirements ).

Below are more informations, thanks in advance for your help !

Augustin

osm2pgsql command

osm2pgsql --username $DBPG_USER --database $DBPG_DB \ --hstore \ --style $WORKING_DIR/openstreetmap-carto/openstreetmap-carto.style \ --tag-transform-script $WORKING_DIR/openstreetmap-carto/openstreetmap-carto.lua \ --slim --drop \ --cache 18800 \ --number-processes 4 \ --multi-geometry 20200102_planet.osm.pbf

system

OS Debian 9 Stretch Architecture x86_64 Processor model: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz 4 cores (1 thread per core) 32GB RAM/32GB swap disk space: 1To SSD

postgresql.conf extract (PostgreSQL version: 10)

shared_buffers = 8GB work_mem = 50MB maintenance_work_mem = 1GB min_wal_size = 1G max_wal_size = 2GB effective_cache_size = 20GB

osm2pgsql logs extract vs. disk space usage

On january 9th at 6pm, osm2pgsql was still processing relations

Processing: Node(5682827k 291.6k/s) Way(630406k 5.06k/s) Relation(3798620 120.78/s)

and 78% of disk space was used:

Filesystem Size Used Avail Use% Mounted on /dev/sda1 916G 670G 200G 78% /var/lib/postgresql/data

On january 10th at 8pm osm2pgsql crashed because of lack of disk space (logs below) whereas only 30% disk space is used

Filesystem Size Used Avail Use% Mounted on /dev/sda1 916G 259G 612G 30% /data

here are osm2pgsql logs with the crash:

Using PBF parser. Processing: Node(5682827k 291.6k/s) Way(630406k 5.06k/s) Relation(5590480 140.30/s) Standard exception processing relation id=8346358: TopologyException: side location conflict at 721638.68999999994 1016915.5699999999 Processing: Node(5682827k 291.6k/s) Way(630406k 5.06k/s) Relation(7379160 155.86/s) parse time: 191305s Node stats: total(5682827154), max(7094416299) in 19487s Way stats: total(630406201), max(759474335) in 124474s Relation stats: total(7379167), max(10509928) in 47344s Committing transaction for planet_osm_point Committing transaction for planet_osm_line Committing transaction for planet_osm_polygon Committing transaction for planet_osm_roads Setting up table: planet_osm_nodes Setting up table: planet_osm_ways Setting up table: planet_osm_rels Using lua based tag processing pipeline with script /docker_mounted_volumes/working_dir/openstreetmap-carto/openstreetmap-carto.lua Setting up table: planet_osm_nodes Setting up table: planet_osm_ways Setting up table: planet_osm_rels Using lua based tag processing pipeline with script /docker_mounted_volumes/working_dir/openstreetmap-carto/openstreetmap-carto.lua Setting up table: planet_osm_nodes Setting up table: planet_osm_ways Setting up table: planet_osm_rels Using lua based tag processing pipeline with script /docker_mounted_volumes/working_dir/openstreetmap-carto/openstreetmap-carto.lua Setting up table: planet_osm_nodes Setting up table: planet_osm_ways Setting up table: planet_osm_rels Using lua based tag processing pipeline with script /docker_mounted_volumes/working_dir/openstreetmap-carto/openstreetmap-carto.lua Going over pending ways... 434307049 ways are pending Using 4 helper-processes Finished processing 434307049 ways in 71654 s 434307049 Pending ways took 71654s at a rate of 6061.17/s Committing transaction for planet_osm_point Committing transaction for planet_osm_line Committing transaction for planet_osm_polygon Committing transaction for planet_osm_roads Committing transaction for planet_osm_point Committing transaction for planet_osm_line Committing transaction for planet_osm_polygon Committing transaction for planet_osm_roads Committing transaction for planet_osm_point Committing transaction for planet_osm_line Committing transaction for planet_osm_polygon Committing transaction for planet_osm_roads Committing transaction for planet_osm_point Committing transaction for planet_osm_line Committing transaction for planet_osm_polygon Committing transaction for planet_osm_roads Going over pending relations... 0 relations are pending Using 4 helper-processes Finished processing 0 relations in 0 s Committing transaction for planet_osm_point WARNING: there is no transaction in progress Committing transaction for planet_osm_line WARNING: there is no transaction in progress Committing transaction for planet_osm_polygon WARNING: there is no transaction in progress Committing transaction for planet_osm_roads WARNING: there is no transaction in progress Committing transaction for planet_osm_point WARNING: there is no transaction in progress Committing transaction for planet_osm_line WARNING: there is no transaction in progress Committing transaction for planet_osm_polygon WARNING: there is no transaction in progress Committing transaction for planet_osm_roads WARNING: there is no transaction in progress Committing transaction for planet_osm_point WARNING: there is no transaction in progress Committing transaction for planet_osm_line WARNING: there is no transaction in progress Committing transaction for planet_osm_polygon WARNING: there is no transaction in progress Committing transaction for planet_osm_roads WARNING: there is no transaction in progress Committing transaction for planet_osm_point WARNING: there is no transaction in progress Committing transaction for planet_osm_line WARNING: there is no transaction in progress Committing transaction for planet_osm_polygon WARNING: there is no transaction in progress Committing transaction for planet_osm_roads WARNING: there is no transaction in progress Sorting data and creating indexes for planet_osm_point Sorting data and creating indexes for planet_osm_line Sorting data and creating indexes for planet_osm_polygon Stopping table: planet_osm_nodes Sorting data and creating indexes for planet_osm_roads Stopping table: planet_osm_ways Stopping table: planet_osm_rels Stopped table: planet_osm_rels in 0s Stopped table: planet_osm_nodes in 1s Stopped table: planet_osm_ways in 1s Copying planet_osm_roads to cluster by geometry finished Creating geometry index on planet_osm_roads Creating indexes on planet_osm_roads finished All indexes on planet_osm_roads created in 2661s Completed planet_osm_roads node cache: stored: 2266194517(39.88%), storage efficiency: 91.97% (dense blocks: 275092, sparse nodes: 105299977), hit rate: 38.95% Osm2pgsql failed due to ERROR: CREATE TABLE planet_osm_point_tmp AS SELECT * FROM planet_osm_point ORDER BY ST_GeoHash(ST_Transform(ST_Envelope(way),4326),10) COLLATE "C" failed: ERROR: could not extend file "base/16385/4462032.9": wrote only 4096 of 8192 bytes at block 1292910 HINT: Check free disk space.

asked 13 Jan, 12:14

augustind's gravatar image

augustind
116239
accept rate: 10%


4 Answers:
2

865 GB is imho not enough anymore, but I'm not using --drop. I would strongly adise to keep the planet.osm.pbf somewhere else as well as using --flat-nodes (about 50 GB in size) and keeping the flat-nodes bin somewhere else (extra drive).

I recently did an import with 1.1 TB free space (used for postgresql db + flat-node file), the planet.osm.pbf was on another drive) and that worked (and probably won't work anymore in 2021).

You could as well look into the fs you are using. ZFS offers compression and would help.

permanent link

answered 13 Jan, 12:24

Spiekerooger's gravatar image

Spiekerooger
696816
accept rate: 16%

edited 13 Jan, 12:30

Thanks a lot Spiekerooger for your quick answer. I'll try to relaunch the import with your advices:

  • use --flat-nodes and host the node file on the system disk (500GB)
  • host the planet.osm.pbf on the system disk also
  • keep hosting the database on the extra disk (1TB)

I'll come back here to report what happened !

(13 Jan, 12:53) augustind

I just had a look in my statistics: with a ZFS filesystem and ZFS compression turned on I maxed at 692 GB incl. the flat-nodes file during import (index building I think) on a recent import (without the --drop option and with autovacuum on).

Right now the db plus flat-nodes file takes up about 530GB after the import and after some --append updates.

With ext4 and about ~ 900GB free space for db + flat-node file I ran into the same probs you have on a first import (but again, I'm not using the --drop option).

(13 Jan, 14:24) Spiekerooger

Thanks for these benchmarks.

So let's go for ZFS. Following Paul’s Blog - ZFS Settings for Osm2pgsql recommendations, I would like to use ZFS to convert my 1TB disk (currently ext4, mounted on /data) with lz4 compression enabled and a 8Kb recordsize.

As I'm not very familiar with filesystems and don't know zf4, does somebody could confirm that below approach is approximately correct (source: https://wiki.debian.org/ZFS) ?

  • Umount current ext4 disk

umount /data rm -r /data

  • Debian ZF4 installation

apt update apt install linux-headers-`uname -r` apt install -t buster-backports dkms spl-dkms apt install -t buster-backports zfs-dkms zfsutils-linux

  • Creating a single disk stripe pool

zpool create ashift=8 tank /dev/sda

  • Provisioning file systems or volume

mkdir -p /data zfs create -o mountpoint=/data tank/data

  • Enable lz4 compression

zfs set compression=lz4 tank

(14 Jan, 11:37) augustind

If the /dev/sda (are you sure about that, especially the sda?) is the extra 1TB drive where you plan to keep the db, this sounds about ok. You may have to format the drive before using zpool. Personally I wouldn't even create an exta filesystem under the zfs pool but just create the pool directly as "data".

so I would just:

  • zpool create ashift=8 data /dev/sda
  • zfs set compression=lz4 data

(without the extra zfs create in between).

By that, the pool should be mounted under /data directly.

But I don't know what else you are planing for that drive, so you may go ahead with your plan.

(14 Jan, 12:34) Spiekerooger

Finally it has worked like below for ZFS compression. For the whole conclusion, see answer at the bottom of the ticket. Thanks.

Be careful: ashift option evoked above is not the good one to set recordsize.

bash sudo su - cat <<'EOF' > /etc/apt/sources.list.d/backports.list \# Backports repository deb http://deb.debian.org/debian stretch-backports main contrib non-free # disponible après la publication de buster EOF apt update apt install linux-headers-`uname -r` apt install -t stretch-backports dkms spl-dkms apt install -t stretch-backports zfs-dkms zfsutils-linux

  • Erase disk first 100M (lsblk to find name)

bash dd if=/dev/zero of=/dev/sda bs=1M count=100

  • Then

bash zpool create data $disk_id \# use legacy mode is not required zfs set mountpoint=legacy data mount -t zfs /dev/sda1 /data/ df -khT /data/ \# check zfs list \# set compression and recordsize zfs set compression=lz4 data zfs set recordsize=8k data \# check zfs get recordsize /data zfs get compression /data

(22 Jan, 14:03) augustind

Ups, yes, actually I'm using set recordsize as well and not ashift.

Glad you got it running.

(22 Jan, 14:07) Spiekerooger
showing 5 of 6 show 1 more comments
5

Run your import with --disable-parallel-indexing to save disk space.

Why? You ran out of disk space in the final stage of osm2pgsql where tables are sorted and indexes are created. In order to sort the tables, Postgresql needs to make a complete copy of the table. So you temporarily need twice the size of the table on disk. To speed things up, osm2pgsql runs parallel threads that do the sorting and indexing. At the worst case it therefore sorts all tables in parallel, so that you need twice the disk space you would actually need in the end. --disable-parallel-indexing ensures that sorting and index creation is done sequentially. It will take longer and you still need some additional space for the sorting but your current disk should be enough.

Regarding the --drop option: when enabled, tables that are only needed for updates get dropped early and some indexes are not even created in the first place. Therefore a lot less space is needed. Note though that this is only true for newer versions of osm2pgsql. The 0.92 version that ships with stretch is still badly optimised in this regard.

permanent link

answered 13 Jan, 19:09

lonvia's gravatar image

lonvia
5.6k25280
accept rate: 40%

Thanks @lonvia for your answer! It's pretty interesting. I've upgraded osm2pgsql from 0.92 to 1.2.0. Regarding --disable-parallel-indexing I keep it in mind as a fallback solution: I'd like to not give up speed aspect.

(14 Jan, 11:30) augustind
0

After @Spiekerooger answer, it could be nice if somebody else could provide some explanations about how the --drop option impact the disk usage over the osm2pgsql import time ? Particularly:

  • does it impact the disk space peak usage ?
  • or it's just impacting the final disk space usage ?
permanent link

answered 13 Jan, 12:56

augustind's gravatar image

augustind
116239
accept rate: 10%

0

Hi,

Thanks everybody for your help, the import was successful :) What has been changed for this second import:

  • use --flat-nodes and host the nodes.cache file on another disk
  • move planet.osm.pbf to another disk too
  • upgrade from osm2pgsql 0.92 to 1.2.0
  • use a ZFS filesystem with lz4 compression and recordsize=8k

Benchmark details here: Wiki OSM / Osm2pgsql Benchmarks / Desktop Debian 9, 4 cores i5-6500 CPU @ 3.20GHz/32GB RAM, 1TB+500GB SSD (hstore slim drop flat-nodes and ZFS filesystem)

About PostgreSQL/PostGIS OSM database (on ZFS filesystem): peak usage 460 GB, end size: 185 GB.

The source code of the related project has been just released on Github: osmtilemaker (generate your OSM tiles (one shot, no updates).)

permanent link

answered 22 Jan, 13:58

augustind's gravatar image

augustind
116239
accept rate: 10%

powered by OSQA