jmckenna at gatewaygeomatics.com
- Original Author
- Last Updated
Table of Contents
The best/optimal vector data source for MapServer in terms of speed-of-display is .shp.
For databases, MapServer displays PostGIS layers very fast, and some custom tricks are included in the MapServer source specifically for PostGIS + MapServer draw speed. PostGIS is therefore recommended.
SpatiaLite is recommended for those who require a portable format, and works very well with MapServer.
This is a critical step, as MapServer relies on GDAL (actually the OGR part of the project, which handles vectors) for vector data access. Each driver (OGR format) has its own set of abilities and switches. Find your vector format and review its options here.
For data management in MapServer, this should always be one of your first steps. Sometimes desktop GIS programs will display a format or its attributes in a certain way (such as in uppercase) but your data might not display in MapServer; checking how OGR/GDAL reads your data file or database table, will help you manage the data. The OGR Vector Layers Through MapServer document has excellent examples (for ogrinfo and other commands) to connect to your data. Here is an example connecting to a PostGIS database, and returning a list of spatial tables:
ogrinfo -ro PG:"host=127.0.0.1 user=postgres password=postgres port=5432 dbname=mydb" INFO: Open of `PG:host=127.0.0.1 user=postgres password=postgres port=5432 dbname=mydb' using driver `PostgreSQL' successful. 1: popplace (Point) 2: province (Multi Polygon)
Then get a summary of the 'popplace' table through ogrinfo:
ogrinfo -ro PG:"host=127.0.0.1 user=postgres password=postgres port=5433 dbname=mydb" popplace -summary INFO: Open of `PG:host=127.0.0.1 user=postgres password=postgres port=5433 dbname=mydb' using driver `PostgreSQL' successful. Layer name: popplace Geometry: Point Feature Count: 497 Extent: (-2303861.750000, -681502.875000) - (2961766.250000, 3798856.750000) Layer SRS WKT: PROJCS["NAD83 / Canada Atlas Lambert", GEOGCS["NAD83", DATUM["North_American_Datum_1983", SPHEROID["GRS 1980",6378137,298.257222101, AUTHORITY["EPSG","7019"]], TOWGS84[0,0,0,0,0,0,0], AUTHORITY["EPSG","6269"]], PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]], UNIT["degree",0.0174532925199433, AUTHORITY["EPSG","9122"]], AUTHORITY["EPSG","4269"]], PROJECTION["Lambert_Conformal_Conic_2SP"], PARAMETER["standard_parallel_1",49], PARAMETER["standard_parallel_2",77], PARAMETER["latitude_of_origin",49], PARAMETER["central_meridian",-95], PARAMETER["false_easting",0], PARAMETER["false_northing",0], UNIT["metre",1, AUTHORITY["EPSG","9001"]], AXIS["Easting",EAST], AXIS["Northing",NORTH], AUTHORITY["EPSG","3978"]] FID Column = gid Geometry Column = geom area: Real (0.0) perimeter: Real (0.0) popplace_: Real (0.0) popplace_i: Real (0.0) unique_key: String (5.0) name: String (25.0) name_e: String (20.0) name_f: String (20.0) reg_code: Real (0.0) nts50: String (7.0) lat: String (7.0) long: String (7.0) sgc_code: Real (0.0) capital: Real (0.0) pop_range: Real (0.0)
You can use the extent values returned from ogrinfo, to paste into your mapfile's EXTENT parameter. You can also notice in that summary the PROJCS/AUTHORITY line, which states that this data is currently in the EPSG:3978 projection.
OGR commandline utilities are very useful, making it easy to manipulate your source vector file or database, before display in MapServer. For example, you might want to import a spatial file into your existing PostGIS database, which you can do easily through ogr2ogr as the following (that takes a shapefile, imports it into an existing PostGIS database, and renames the new table to 'roadways') :
ogr2ogr PG:"host=127.0.0.1 user=postgres password=postgres port=5432 dbname=mydb" road.shp -nln roadways
where ogr2ogr syntax is actually: ogr2ogr destination source
You might also import a spatial file, and reproject it to another EPSG projection, as the following (that takes a shapefile, reprojects it into the web mercator EPSG:3857, imports it into an existing PostGIS database, and renames the new table to 'roadways3857') :
ogr2ogr -t_srs EPSG:3857 -s_srs EPSG:3978 PG:"host=127.0.0.1 user=postgres password=postgres port=5432 dbname=mydb" road.shp -nln roadways3857
where ogr2ogr syntax is actually: ogr2ogr -t_srs (ouput projection) -s_srs (source projection) destination source
Review all of the available OGR vector utilities here.
This of course is an important step for any vector layer, for fast display in MapServer. See Tile Indexes for more detailed information.
If you find yourself making several layers, all of them using the same dataset but filtering to only use some of the records, you could probably do it better. If the criteria are static, one approach is to pre-split the data.
The ogr2ogr utility can select on certain features from a datasource, and save them to a new data source. Thus, you can split your dataset into several smaller ones that are already effectively filtered, and remove the FILTER statement.
If you are using shapefiles, the shp2tile utility is a wonderful commandline tool for this.
For Windows users, MS4W includes the shp2tile utility, and all utilities mentioned here.
Use shptree to generate a spatial index on your shapefile. This is quick and easy ("shptree foo.shp") and generates a .qix file. MapServer will automagically detect an index and use it.
Tileindexes can also be indexed with shptree.
MapServer also comes with the sortshp utility. This reorganizes a shapefile, sorting it according to the values in one of its columns. If you're commonly filtering by criteria and it's almost always by a specific column, this can make the process slightly more efficient.
Although shapefiles are a very fast data format, PostGIS is pretty speedy as well, especially if you use indexes well and have memory to throw at caching.
The single biggest boost to performance is indexing. Make sure that there's a GIST index on the geometry column, and each record should also have an indexed primary key. If you used shp2pgsql, then these statements should create the necessary indexes:
ALTER TABLE table ADD PRIMARY KEY (gid); CREATE INDEX table_the_geom ON table (the_geom) USING GIST;
PostgreSQL also supports reorganizing the data in a table, such that it's physically sorted by the index. This allows PostgreSQL to be much more efficient in reading the indexed data. Use the CLUSTER command, e.g.
CLUSTER the_geom ON table;
Then there are numerous optimizations one can perform on the database server itself, aside from the geospatial component. The easiest is to increase shared_buffers in the postgresql.conf file, which allows PostgreSQL to use more memory for caching. It is worth the time to investigate the various options in the Resource Consumption section of the PostgreSQL documentation.
To handle queries, MapServer requires a unique ID column, as part of your PostgreSQL table. MapServer will try to guess the unique ID column, but that is costly as several more database queries have to be made; instead, you should always specify a unique ID column in your DATA statement of the layer, with the using unique syntax, such as:
DATA "geom FROM mydata USING UNIQUE myid USING SRID=3857"
If your table does not have a unique ID column, you can add one such as:
ALTER TABLE mytable ADD COLUMN unique_id SERIAL PRIMARY KEY;
In older PostgreSQL versions, a (dirty) trick was to use the existing OID as the unique ID, however OIDs were removed from PostgreSQL as of the PostgreSQL 12.0 release. So always specify an actual unique ID column instead, in your layer's DATA statement, of your mapfile.
You may face a situation where your PostGIS table is drawing slowly in MapServer. The following steps will help you examine the issue (using a WFS case) :
Always start by getting the draw time for your layer, with a shp2img command:
shp2img -m postgis-wfs.map -o ttt.png -map_debug 3 msDrawMap(): Layer 0 (provinces), 1.587s msDrawMap(): Drawing Label Cache, 0.000s msDrawMap() total time: 1.589s msSaveImage(ttt.png) total time: 0.005s freeLayer(): freeing layer at 010E0EC0. msPostGISLayerIsOpen called. msConnPoolClose(host=127.0.0.1 user=postgres password=postgres port=5432 dbname=gmap,01197840)
To diagnose if the problem is with your PostGIS table configuration, or with MapServer, execute the exact request sent by MapServer, at the psql.exe commandline, by doing the following steps:
add into your MAP-level of your mapfile:
CONFIG "CPL_DEBUG" "ON" CONFIG "MS_ERRORFILE" "/ms4w/tmp/ms_error.txt" DEBUG 5
add into your PostGIS LAYER of your mapfile:
now use a WFS client such as QGIS and add your WFS PostGIS layer
open "/ms4w/tmp/ms_error.txt" in Notepad++
search for "msPostGISLayerWhichShapes query:"
that line should list the long exact query sent from MapServer to the PostreSQL instance, it may look like:
[Mon Apr 12 11:27:34 2021].207000 msPostGISLayerWhichShapes query: SELECT "gid"::text,"area"::text,"perimeter"::text,"province_"::text,"province_i"::text,"status"::text,"name"::text,"name_e"::text,"name_f"::text,"reg_code"::text,"poly_featu"::text,"island"::text,"island_e"::text,"island_f"::text,ST_AsBinary(("geom"),'NDR') as geom,"gid"::text FROM province WHERE "geom" && ST_GeomFromText('POLYGON((-5814679.36987815 -1504714.04276694,-5814679.36987815 4439806.52253364,5943763.33635122 4439806.52253364,5943763.33635122 -1504714.04276694,-5814679.36987815 -1504714.04276694))',3978) LIMIT 2 OFFSET 0
For one QGIS action, there could be several queries sent to MapServer, so also look for other "msPostGISLayerWhichShapes query:" instances in that error log.
now connect to that database through psql.exe
psql -U postgres -p 5432 -d mydb
using that error file line, grab everything from "SELECT", and inside your database prompt, start the command with "EXPLAIN ANALYZE" and then paste your full query, such as:
mydb=# EXPLAIN ANALYZE SELECT "gid"::text,"area"::text,"perimeter"::text,"province_"::text,"province_i"::text,"status"::text,"name"::text,"name_e"::text,"name_f"::text,"reg_code"::text,"poly_featu"::text,"island"::text,"island_e"::text,"island_f"::text,ST_AsBinary(("geom"),'NDR') as geom,"gid"::text FROM province WHERE "geom" && ST_GeomFromText('POLYGON((-5814679.36987815 -1504714.04276694,-5814679.36987815 4439806.52253364,5943763.33635122 4439806.52253364,5943763.33635122 -1504714.04276694,-5814679.36987815 -1504714.04276694))',3978) LIMIT 2 OFFSET 0;
the response will tell you how long that query took, such as:
Execution time: 0.293 ms
if the query takes a long time to execute at the psql commandline, then you know to focus your efforts on improving the indexing/settings of your PostGIS table.
By default, MapServer opens and closes a new database connection for each database-driven layer in the mapfile. If you have several layers reading from the same database, this doesn't make a lot of sense. And with some databases (such as Oracle) establishing connections takes enough time that it can become significant.
Try adding this line to your database layers:
This causes MapServer to not close the database connection for each layer until after it has finished processing the mapfile and this may shave a few seconds off of map generation times.
Also for performance, each of your LAYERs with a database connection should have EXTENT set at the layer level, such as:
/* my database layer */ LAYER NAME "provinces" TYPE POLYGON STATUS ON CONNECTIONTYPE postgis CONNECTION "host=127.0.0.1 user=postgres password=postgres port=5432 dbname=gmap" DATA "geom FROM province USING unique gid using srid=3978" EXTENT -2340603.75 -719746.0625 3009430.5 3836605.25 #this helps for performance PROJECTION "init=epsg:3978" END # projection ... END # layer
PostGIS users can use the ST_Extent() spatial function to get the bounding box of a table, such as
SELECT ST_Extent(geom) as table_extent FROM province; table_extent ---------------------------------------------------- BOX(-2340603.75 -719746.0625,3009430.5 3836605.25) (1 row)