Running Geospatial Queries with GeoTools

Running Geospatial Queries with GeoTools

4331986510_bb69fd7a3c

Geospatial information analysis normally requires pretty complex calculations and transformations between different representation types.  The Google Maps APIs are a great tool because they hide all the complexity of these operations. However when the geospatial information that you need to analyse is not from Google Maps, things get more complicated. Operations like finding polygons representing geographical places, or finding polygons that intercept other polygons, require a lot of time and intricate code to have any acceptable  solution. At Shine we are working on a solution for a big telecommunication customer that requires being able to query large geospatial databases without degrading performance.

Shapefiles

As part of our project we needed to be able to query shapefiles provided by the Australian Bureau of Statistics (ABS) with some associated statistical information. So what exactly is a shapefile? Vertices and polygons representing geographic information usually have associated data. This data can be stored in a database where you can run queries in a language very similar to SQL. The shapefile database format is a digital vector format for storing geometric location and associated attribute information. This attribute information can be about anything – for example, average temperatures, city names or average population in a specific geographical area. However, shapefiles are pretty low-level – we’d rather query this information through a high level language that abstracts-out all the details and returns a human-readable result.

Introducing GeoTools

GeoTools is an open source toolkit that provides an implementation of the Open Geospatial Consortium (OGC) specifications. Amongst other things, GeoTools includes a tool that allows you to open files in shapefile format and execute CQL (Contextual Query Language) queries. CQL allows you to write filter expressions for working with geospatial information. With CQL we can filter things like “find all the polygons where state is Victoria” or “find the polygons that contain the point X,Y” GeoTools also provides normalisation across different types of geometries. Thanks to this normalisation, the coordinates used when specifying the filters doesn’t necessarily need to use the same coordinate system as the geospatial data.

Querying a Geospatial Database

Below are some coding examples that demonstrate how to access shapefiles using GeoTools. The shapefile in the examples is from the ABS. It’s publicly available and can  be downloaded from here. To have the example compiling you will need to add the following repository and dependencies in your build.gradle file:

repositories {
    ...
    maven {
        url "http://download.osgeo.org/webdav/geotools"
    }

}
...
dependencies {
   compile "org.geotools:gt-cql:12.0"
   compile "org.geotools:gt-shapefile:12.0"
}
...

Opening a Shapefile

To open the shapefile we need to get the file path as a ‘URL’ using the ClassLoader.getResource() method:

...
ClassLoader classLoader = GeoQuery.class.getClassLoader();
URL fileUrl = 
    classLoader.getResource("Victoria ABS SHP/MB_2011_VIC.shp");
Map<String, Serializable> map = new HashMap<>();
map.put("url", fileUrl);
DataStore dataStore = DataStoreFinder.getDataStore(map);
...

Writing the Query

Now we can write a query:

Filter filter = ECQL.toFilter("SA2_NAME11 = 'St Kilda'");

This will return all the polygons where the field “SA2_NAME11” is ‘St Kilda’. The metadata that is associated with the polygons is called a ‘feature’. In order to execute the query you will need to get the feature source using the type name. Once we have the feature source we can use it to execute the query:

...
String typeName = dataStore.getTypeNames()[0];
SimpleFeatureSource simpleFeatureSource = 
    dataStore.getFeatureSource(typeName);
SimpleFeatureCollection result = 
    simpleFeatureSource.getFeatures(filter);
    ...

 Extracting the Results

The feature source returns a FeatureCollection. This result set provides an iterator that can be used to access to the Features matching with the query criteria.

....
while (featureIterator.hasNext()) {
    SimpleFeature feature = featureIterator.next();

    System.out.printf(
        “Polygon found  SA1_MAIN11=%s, SA2_NAME11=%s, SA3_NAME11=%s, STE_NAME11=%s, ALBERS_SQM=%s\n",
        feature.getAttribute("SA1_MAIN11"), 
        feature.getAttribute("SA2_NAME11"), 
        feature.getAttribute("SA3_NAME11"),
        feature.getAttribute("STE_NAME11"), 
        feature.getAttribute("ALBERS_SQM")
    );
}
....

The columns where each piece of information is stored are called ‘Attributes’. The attributes can easily be accessed using the method getAttribute(String attributeName). The actual polygon representing the geometry is always stored in a special attribute named the_geom. This attribute can be used to access to the underlaying geometry and opens the door to performing more complex operations. We get an output similar to this:

...
Polygon found  SA1_MAIN11=20605113351, SA2_NAME11=St Kilda, SA3_NAME11=Port Phillip, STE_NAME11=Victoria
Polygon found  SA1_MAIN11=20605113351, SA2_NAME11=St Kilda, SA3_NAME11=Port Phillip, STE_NAME11=Victoria
Polygon found  SA1_MAIN11=20605113349, SA2_NAME11=St Kilda, SA3_NAME11=Port Phillip, STE_NAME11=Victoria
Polygon found  SA1_MAIN11=20605113321, SA2_NAME11=St Kilda, SA3_NAME11=Port Phillip, STE_NAME11=Victoria
Polygon found  SA1_MAIN11=20605113349, SA2_NAME11=St Kilda, SA3_NAME11=Port Phillip, STE_NAME11=Victoria
...

Other Query Examples

Here’s a query that finds all the features where the area is smaller than 100 square meters:

Filter filter = ECQL.toFilter("ALBERS_SQM < 100”);

and produces the result:

...
Polygon found  SA1_MAIN11=29999949999, SA2_NAME11=No usual address (Vic.), SA3_NAME11=Special Purpose Codes SA3 (Vic.), STE_NAME11=Victoria, ALBERS_SQM=0.0
Polygon found  SA1_MAIN11=29797979991, SA2_NAME11=Migratory - Offshore - Shipping (Vic.), SA3_NAME11=Migratory - Offshore - Shipping (Vic.), STE_NAME11=Victoria, ALBERS_SQM=0.0
Polygon found  SA1_MAIN11=29797979992, SA2_NAME11=Migratory - Offshore - Shipping (Vic.), SA3_NAME11=Migratory - Offshore - Shipping (Vic.), STE_NAME11=Victoria, ALBERS_SQM=0.0
...

Here’s one that finds the feature that contains a coordinate with a particular longitude and latitude: Point with lat lon

Filter filter = ECQL.toFilter(
    " CONTAINS (the_geom, POINT(145.004940 -37.907817))”
);

The result being:

Polygon found  SA1_MAIN11=20801116914, SA2_NAME11=Brighton (Vic.), SA3_NAME11=Bayside, STE_NAME11=Victoria, ALBERS_SQM=24134.2536224367

Finally, here’s a query that find the features that intersect a specific area. We can use the BBOX function to define the bounding rectangle by passing the points with the minimum and maximum longitude latitude, as shown in the following map: Geospatial Area In this case, the minimum coordinate is 145.044937, -37.903511 and the maximum coordinate is 145.079613, -37.938448. This leads us to the query:

Filter filter = ECQL.toFilter(
    "BBOX(the_geom, 145.044937, -37.903511, 145.079613, -37.938448)”
);

which results in:

...
Polygon found  SA1_MAIN11=20802118220, SA2_NAME11=Ormond - Glen Huntly, SA3_NAME11=Glen Eira, STE_NAME11=Victoria, ALBERS_SQM=28853.4952009239
Polygon found  SA1_MAIN11=20802118106, SA2_NAME11=Murrumbeena, SA3_NAME11=Glen Eira, STE_NAME11=Victoria, ALBERS_SQM=36484.530884945
Polygon found  SA1_MAIN11=20802118101, SA2_NAME11=Murrumbeena, SA3_NAME11=Glen Eira, STE_NAME11=Victoria, ALBERS_SQM=29131.116371965
Polygon found  SA1_MAIN11=20802118220, SA2_NAME11=Ormond - Glen Huntly, SA3_NAME11=Glen Eira, STE_NAME11=Victoria, ALBERS_SQM=24569.1521100066
Polygon found  SA1_MAIN11=20802118101, SA2_NAME11=Murrumbeena, SA3_NAME11=Glen Eira, STE_NAME11=Victoria, ALBERS_SQM=27852.3109884579
...

You can find more details about writing CQL queries here and more examples in here.

Conclusion

Accessing and analysing geospatial information can be very challenging technically. GeoTools provides a comprehensive and easy-to-use toolkit that allows you to read different formats of geospatial databases. This, combined with the power of CQL, can make your life so much easier when you are trying to work with geospatial databases.

No Comments

Leave a Reply