Free open-source who-what-where-when semantic tagger for catalogue data. This tagger is based on the open-source AnnoCultor and four vocabularies of tags and categories all licensed under Creative Commons.
Imagine a catalogue record: a products in a webshop, an offer in online auction, a museum object, a news article or a blog, it does not matter.
Let it be a Dutch record describing a horse with field
what:paard (Dutch for horse).
A search for a 'horse' would not return it.
This tagging tool would match 'paard' to the corresponding term from a vocabulary
and pull its multilingual names: kôň ; cavallo ; hest ; horse ; cheval ; ganado equino ; кон ; hevonen ; pferd ; häst ; konj ; kůň ; hobune ; paard ; arklys ; ίππος/άλογο ; cavalos ; ló ; cal ; лошадь ; koń.
This horse can be now found by your international customers searching in different languages.
The same is done to places, time, and people. In addition, for places coordinates and population are added, so that they can be nicely placed on a map; for fuzzy time periods, such as 'late 13-th century', exact begin-end dates are added, so that they can be placed on a timeline. The tool does data cleaning, retrieval of terms, and disambiguation. It does no miracles, however: it can only find things that are listed in the vocabularies and cannot go beyond them. This technology was successfully used to tag nearly 20 million records in Europeana.
The tagger is deployed on Google App Engine platform. After a few minutes of inactivity the tagger is hibernated and needs up to a minute to start up.