Vector

Author:HostGIS
Revision:$Revision$
Date:$Date$
Last Updated:2008/08/08

Splitting your data

If you find yourself making several layers, all of them using the same dataset but filtering to only use some of the records, you could probably do it better. If the criteria are static, one approach is to pre-split the data.

The ogr2ogr utility can select on certain features from a datasource, and save them to a new data source. Thus, you can split your dataset into several smaller ones that are already effectively filtered, and remove the FILTER statement.

Shapefiles

Use shptree to generate a spatial index on your shapefile. This is quick and easy (“shptree foo.shp”) and generates a .qix file. MapServer will automagically detect an index and use it.

MapServer also comes with the sortshp utility. This reorganizes a shapefile, sorting it according to the values in one of its columns. If you’re commonly filtering by criteria and it’s almost always by a specific column, this can make the process slightly more efficient.

Although shapefiles are a very fast data format, PostGIS is pretty speedy as well, especially if you use indexes well and have memory to throw at caching.

PostGIS

The single biggest boost to performance is indexing. Make sure that there’s a GIST index on the geometry column, and each record should also have an indexed primary key. If you used shp2pgsql, then these statements should create the necessary indexes:

ALTER TABLE table ADD PRIMARY KEY (gid);
CREATE INDEX table_the_geom ON table (the_geom) USING GIST;

PostgreSQL also supports reorganizing the data in a table, such that it’s physically sorted by the index. This allows PostgreSQL to be much more efficient in reading the indexed data. Use the CLUSTER command, e.g.

CLUSTER the_geom ON table;

Then there are numerous optimizations one can perform on the database server itself, aside from the geospatial component. The easiest is to increase max_buffers in the postgresql.conf file, which allows PostgreSQL to use more memory for caching. More information can be found at the PostgreSQL website.

Databases in General (PostGIS, Oracle, MySQL)

By default, MapServer opens and closes a new database connection for each database-driven layer in the mapfile. If you have several layers reading from the same database, this doesn’t make a lot of sense. And with some databases (Oracle) establishing connections takes enough time that it can become significant.

Try adding this line to your database layers:

PROCESSING "CLOSE_CONNECTION=DEFER"

This causes MapServer to not close the database connection for each layer until after it has finished processing the mapfile and this may shave a few seconds off of map generation times.