Advanced mining tools

Machine learning and language technology tools are used to derive information from documents. Machine learning software is used to mine the taxonomical relation in documents. Statistical tools are used to suggest classification terms for a document.

Topics we work on in this product group: 

  • Structure mining (see below)
  • Significant term classification 
  • Semantic relations tools 
  • Concept extractor


Structure mining

Through so-called structure mining, the substructures present in dictionaries and encyclopaedias can be analysed and mapped out. This information can then be used to provide optimal searchability.

The substructures are almost always separated 'islands' that consist of clusters of related concepts. If another reference structure is available that does have a purely hierarchic set-up and contains general concepts that are related to the dictionary, it is possible to use this reference structure - again, with the help of language technology - to combine the separate dictionary substructures into a coherent structure.

When linking large amounts of existing, non-classified content (i.e. content lacking metadata) to reference structures, it is possible to use language technology. For this purpose training material should be created first. In case of further mutations, the training material can also be used for smaller amounts of content.