After setting up Tokyo Cabinet and Ruby its time to use it. As with my post about MongoDB I'm going to load 500.000 POIs in a database and query them with a bounding box query. I will use the table database from Tokyo Cabinet because it supports the most querying facilities. With a table database you can query numbers with full matched and range queries and for strings you can do full matching, forward matching, regular expression matching,...
To load the data in my database I will need to read my shapefile with POIs with Ruby and write the attributes to a new database. First we create the database with the following code.
require 'tokyocabinet'
include TokyoCabinet
# create the object
tdb = TDB::new
# open or create the database
if !tdb.open("poi_db.tct", TDB::OWRITER | TDB::OCREAT)
STDERR.printf("open error: %s\n", tdb.errmsg(tdb.ecode))
end
To read the features in my shapefile I am going to use the Ruby bindings for GDAL/OGR. Because I installed Tokyo Cabinet on GISVM I already had FWTools installed but I still needed to install the Ruby bindings for it. I did this with the following command.
sudo apt-get install libgdal-ruby
Now we are going to read a shapefile with 500.000 point features and write the records to the database. First we open the shapefile and get the layer. Then we loop over the features, create a new record and fill the record with the x,y information and the other fields when they aren't empty. The values need to be converted to strings otherwise the record can't be saved. Then we put the record in the database.
require 'gdal/ogr'
# open my shapefile
dataset = Gdal::Ogr.open("poi_500000.shp")
layer = dataset.get_layer(0)
feature_defn = layer.get_layer_defn
layer.get_feature_count.times do |i|
record = Hash.new # create new record
feature = layer.get_feature(i)
geom = feature.get_geometry_ref()
record['x'] = geom.get_x(0).to_s()
record['y'] = geom.get_y(0).to_s()
pkey = tdb.genuid # init primary key
feature_defn.get_field_count.times do |i|
field_defn = feature_defn.get_field_defn(i)
fieldname = field_defn.get_name_ref
value = feature.get_field_as_string(i);
if not value.nil? and value != ""
if field_defn.get_name_ref == "ID"
pkey = value
else
record[fieldname] = value.to_s()
end
end
end
# store the record in Tokyo Cabinet
tdb.put(pkey, record)
end
To add indexes on the x and y field we call the following code. This creates two supplementary files called poi_db.tct.idx.x.dec and poi_db.tct.idx.y.dec.
# add index on x and y
tdb.setindex('x', TDB::ITDECIMAL)
tdb.setindex('y', TDB::ITDECIMAL)
To query the POIs in the database I created a function to query the POIs for a given bounding box and then I benchmarked it. I used the same bounding box as in my previous posts about MongoDB, Rtree, Pythonnet and PostGIS.
# query POIs by bounding box
def query(tdb, minx, maxx, miny, maxy)
qry = TDBQRY::new(tdb)
qry.addcond("x", TDBQRY::QCNUMGE, minx.to_s())
qry.addcond("x", TDBQRY::QCNUMLE, maxx.to_s())
qry.addcond("y", TDBQRY::QCNUMGE, miny.to_s())
qry.addcond("y", TDBQRY::QCNUMLE, maxy.to_s())
qry.setorder("x", TDBQRY::QONUMASC)
res = qry.search
puts res.length # number of results found
return res
end
require 'benchmark'
puts Benchmark.measure { query(tdb, 4.5, 5.0, 50.5, 51.0) }
The query returned 98000 POIs. I ran the benchmark 12 times and this where the results :
1.620000 0.190000 1.810000 ( 1.866339)
1.570000 0.030000 1.600000 ( 1.625303)
1.640000 0.030000 1.670000 ( 1.668573)
1.650000 0.000000 1.650000 ( 1.664806)
1.650000 0.020000 1.670000 ( 1.708228)
1.730000 0.010000 1.740000 ( 1.744645)
1.410000 0.310000 1.720000 ( 1.749268)
1.620000 0.050000 1.670000 ( 1.724199)
1.610000 0.010000 1.620000 ( 1.657794)
1.660000 0.020000 1.680000 ( 1.680383)
1.710000 0.020000 1.730000 ( 1.767141)
1.720000 0.010000 1.730000 ( 1.809114)
According to the Ruby documentation the benchmark outputs the user CPU time, the system CPU time, the sum of the user and system CPU times, and the elapsed real time. So this means that the query took between 1.65 and 1.87 seconds to get a list of 98000 POIs within the given bounding box. This is a nice indication of the speed of Tokyo Cabinet.
To demonstrate how you can access the attribute I created the following code. It loops over the first 100 found POIs and prints the ID and the x- and y-coordinate.
res = query(tdb, 4.5, 5.0, 50.5, 51.0)
# print the first hundred found POIs
i = 0
res.each do |rkey|
rcols = tdb.get(rkey)
puts rcols['id'].to_s() + " " + rcols['x'].to_s() + " " + rcols['y'].to_s()
i += 1
if i > 100
break
end
end
Now we are ready to close the database. I hope you enjoyed this post and as always I welcome any comments.
# close the database
if !tdb.close
ecode = tdb.ecode
STDERR.printf("close error: %s\n", tdb.errmsg(ecode))
end
Related Posts
Installing Tokyo Cabinet and Ruby on Ubuntu
Populating a MongoDb with POIs
Spatial indexing a MongoDb with Rtree
PostGIS : Loading and querying data