The Pressing Need
We often tend to take Google for granted, especially if we are the everyday consumer – that is, we consume things on the front end, including apps and other services. The fact that we have a smartphone, and internet access to supplement it seems to be enough for us to preclude Google’s involvement and a certain freedom that comes with their ‘free’ tag to most services. Sadly, the same cannot be said of the developer, or the business that has to do with logistics.
Google Maps, one of their flagship services, is taken as THE go-to service for almost all mapping solutions. This is at least the case with the Indian scenario – as it is in the United States. If you happen to be a company that is involved in the business of logistics and transportation, you can incur significant costs to Google. Say you’re in the furniture delivery business, you will need to pay for automating the logistics workflow on the map, since the addresses happen to be within Google’s map domain, even if these addresses belong to your loyal customers and tradesmen involved. You also need to do this repeatedly.
The reasons as to why these costs come about is simple – not many governments invest in a public dataset, or mapping solutions of their own incentive – thus making Google the only viable option. Making businesses pay for something they already own seems a trite unfair, especially in the wake of negligent governments. Sweden, Denmark, France and Germany are examples of the exceptions to this rule – they have taken significant steps toward making map data and other such information as part of the public domain. The fact that these governments are proactive, and progressive can be highlighted by their relative economic prosperity and living standards – they haven’t managed to maintain that by simply being reactive to crises. Perhaps we can take an example, perhaps not.
Regardless, we need a remedy to this situation. Simply put, we need a mapping solution, which is something easier said than done. People who have significant mapping costs are shifting towards various low cost service providers like Mapbox, Mapzen, MapQuest etc. You can also develop your own solutions. This task is not altogether impossible – however, it is hard to develop an adaptive mapping service, what with the OSM (OpenStreetMap) service getting regular, torrential updates. Another problem has to do with digital infrastructure. Developing a solution may work with significant accuracy in the West, but the same cannot be claimed for all countries.
The Necessary Measures
There are three significant steps in the process of developing your own mapping solutions:
- Geocoding and Reverse geocoding
The order of implementation here is not important, and varies according to the needs of the business at hand. Let’s assume, that your company specializes in solving vehicle routing problems. Here, it would be obvious that if you have your own routing systems, there will be more value to this service!
The task is quite enormous, if one crunches the figures. Having to solve the problems on one singular route for 10 cities involves (10!) permutations. That is the factorial sign, so it’s roughly 10*9*8….*1 which equals 3.62 million ways that have to be crunched. Figures in this region are simply incomprehensible – thus, finding optimal routes over several countries, with many vehicles, delivery points, (not to mention toll-booths and a thousand other intricate constraints and variables!) becomes a herculean task, one that firmly incurs heavy costs as far as Google is concerned.
So what is the best solution? We recommend OSRM (Open Source Resource Manager) to be suitable for this task. We can credit project OSM (mentioned above) for this gem – as it is written in C/C++, it also happens to be speedy.
Now, we do not advocate seeking this immediately on a planetary scale, as that would also be a heavy monetary investment. The idea is to define the need of the hour and build a suitable, small scale version at the outset. Later, we can scale it as desired.
This is for routing purposes. If you’re involved with mapping, then geocoding and reverse geocoding are necessary. Geocoding is the act of crunching a set of data fragments to do with a certain location. This can be in the form of descriptions, co-ordinates, addresses and so on. These data fragments are then translated into an actual location on Earth. While the process sounds simplistic, it is the most challenging aspect of trying to develop your own systems.
Geocoding and Challenges
The very first problem is that of Normalization, and address parsing to add to your datasets. For example, many places have the same addresses – there exists a Mayflower Avenue in many cities, and creates data redundancies. Yet other problems can arise. Anybody with a basic literary capabilities can differentiate between ‘Strand’ and ‘Stanford’. However, extensions come into play – what if I’m referring to Stanford Church and not the University?
Letters and other extensions create bigger problems. 36W 26 St. Fl#7 can be read as – 36W, 26th Street, and the F can be anything from Floor, to Flat, to even Florida. What gets parsed will change the outcome in your geocoder drastically. Different identities for the same names are common too – W St John’s St, W Saint John’s St, W St John’s Street and West Saint John’s Street are all equivalent.
International addresses make it altogether more tedious. France utilises a mixture of the decimal (10 base) and the vigesimal (20 base) systems. Quatre-vingt douze here translates to ninety-two, where quatre is 40, multiplied by 20 (vingt) with 12 added, making it 92. Address schemas for different countries vary significantly with very great extent.
There is light in this seemingly tedious, data infested tunnel. A lot of the hard work has been done by many open source enthusiasts. ‘Libpostal’ is a multilingual, international address parserator that is used by many open source databases. It is a machine learned solution that has a high degree of accuracy. ‘Usaddress’ is another solution; but, it only works for the United States. As the flexibility with open source solutions goes, you can define your own schema and customize these for your needs.
There are various low cost solution providers for Geocoding MapQuest Open Initiative, PickPoint, OpenCage Geocoder, LocationIQ, MapZen and so on. If we were compelled to pick, OpenCage would be the best option amongst these, as they provide an accuracy score depending upon the bounding box size.
All these names use “Nominatim” (a Geocoder provided by OSM) behind their instances. In case you are looking for your open source solution, you can also host it.
Most of this requires a translation of data, as has been emphasized. So, if you want to transform ‘a’ to ‘b’, then the number of changes, deletions etc. required to change ‘a’ to ‘b’ is called the Levenshtein distance. A prudent use of address parsing and filtering out the right geocode will be the one that possesses the lowest Levenshtein distance with respect to the actual address provided. Guaranteeing a high accuracy in this case requires a bit of reverse engineering.
Mapzen supported Pelia is another solution where you can create your own dataset and schema. It uses Elasticsearch for the data base in its architecture. While it easy to install, with its vagrant installation being available on Github, its accuracy is not that good. Flexibility comes at an obvious cost.
So, here is a rough idea of how to develop a mapping solution for your enterprise in a manner that optimizes on costs and does not compromise all too much on the quality of the mapping system involved. Fingers crossed, one can only hope for developments on that front in the positive direction.