Repository avatar
Other Tools
v1.0.2
active

dannet

io.github.kuhumcst/dannet

DanNet - Danish WordNet with rich lexical relationships and SPARQL access.

Documentation

DanNet logo

DanNet is a WordNet for the Danish language. DanNet uses RDF as its native representation at both the database level, in the application space, and as its primary serialisation format.

Table of Contents

Dataset Formats

DanNet is available in multiple formats to maximise compatibility:

FormatDescription
RDF (Turtle)Native representation. Load into any RDF graph database (such as Apache Jena) and query with SPARQL.
CSVPublished with column metadata as CSVW.
WN-LMFXML format compatible with Python libraries like wn.

Example: Using DanNet with Python

import wn

wn.add("dannet-wn-lmf.xml.gz")

for synset in wn.synsets('kage'):
    print((synset.lexfile() or "?") + ": " + (synset.definition() or "?"))

Differences Between Formats

While every format includes all synsets/senses/words, the CSV and WN-LMF variants do not include every data point:

  • CSV: Some data is lost when converting from an open graph to fixed tables.
  • WN-LMF: Only official GWA relations are included per the standard (proprietary DanNet relations from the DanNet schema are excluded).

For the complete dataset, use the RDF format or browse at wordnet.dk.

Companion Datasets

Several companion datasets expand the RDF graph with additional data:

DatasetDescription
CORLinks DanNet resources to IDs from the COR project.
DDSAdds sentiment data to DanNet resources.
OEWN extensionProvides DanNet-style labels for the Open English WordNet to facilitate browsing connections between the two datasets.

Inferred Data

Additional data is implicitly inferred from the base dataset, companion datasets, and ontological metadata. These inferences can be browsed at wordnet.dk. Releases containing fully inferred graphs are specifically marked as such.

Standards

DanNet is based on the Ontolex-lemon standard combined with relations defined by the Global Wordnet Association as used in the official GWA RDF standard.

Ontolex-lemon classRepresents
ontolex:LexicalConceptSynsets
ontolex:LexicalSenseWord senses
ontolex:LexicalEntryWords
ontolex:FormForms

Ontolex-lemon representation

URI Prefixes

PrefixURIPurpose
dnhttps://wordnet.dk/dannet/data/Dataset instances
dnchttps://wordnet.dk/dannet/concepts/Ontological type members
dnshttps://wordnet.dk/dannet/schema/Schema definitions

All DanNet URIs resolve to HTTP resources. Accessing one of these URIs via a GET request returns the data for that resource.

Schemas

DanNet has proprietary relations defined in the DanNet schema in an Ontolex-compatible way. There is also a schema for EuroWordNet concepts. Both schemas follow the RDF conventions listed by Philippe Martin.

LLM Integration

DanNet can be connected to AI tools like Claude via MCP (Model Context Protocol).

  • MCP server URL: https://wordnet.dk/mcp
  • Registry ID: io.github.kuhumcst/dannet

To connect in e.g. Claude Desktop: go to Settings > Connectors > Browse Connectors, click "add a custom one", enter a name (e.g., "DanNet") and the MCP server URL.

Claude Desktop setup

Once connected, you can query DanNet's semantic relations directly through Claude.

Implementation

The database backend is Apache Jena, a mature RDF triplestore with OWL inference support. When represented in Jena, DanNet's relations form a queryable knowledge graph. DanNet is developed in Clojure, using libraries like Aristotle to interact with Jena.

See rationale.md for more on the design decisions.

Full Production Setup

The production deployment at wordnet.dk consists of three services managed via Docker Compose:

  • DanNet — the Clojure/ClojureScript web application
  • MCP server — a Python-based MCP server providing LLM access to DanNet
  • Caddy — reverse proxy handling HTTPS and routing

Clojure Support

DanNet can be queried in various ways from Clojure (see queries.md). Apache Jena transactions are built-in and enable persistence via the TDB 2 layer.

Web Application

The frontend is written in ClojureScript using Rum, served by Pedestal. The app works both as a single-page application (with JavaScript) and as a regular HTML website (without). Content negotiation serves different representations (HTML, RDF, Transit+JSON) based on the request.

See doc/web.md for details.

Bootstrap Process

New releases are bootstrapped from the preceding release. The process (in dk.cst.dannet.db.bootstrap):

  1. Load and clean the previous version's RDF data
  2. Convert to triples using the current schema
  3. Import into Apache Jena graphs and apply release changes
  4. Infer additional triples via OWL/RDFS schemas
  5. Export the final RDF dataset (see Database Release Workflow)

Bootstrap data should be located in ./bootstrap relative to the execution directory.

Setup

DanNet requires Java and Clojure's official CLI tools. Dependencies are specified in deps.edn.

Development

  1. Start the web service using (restart) in dk.cst.dannet.web.service — available at localhost:3456
  2. Run the frontend with shadow-cljs:
    npx shadow-cljs watch app
    

Testing a Release Build

Using Docker (requires Docker daemon running):

# From the docker/ directory
docker compose up --build

Or manually:

shadow-cljs --aliases :frontend release app
clojure -T:build org.corfield.build/uber :lib dk.cst/dannet :main dk.cst.dannet.web.service :uber-file "\"dannet.jar\""
java -jar -Xmx4g dannet.jar

Memory Requirements

The system uses ~1.5 GB when idle and ~3 GB when rebuilding the database. A server should have at least 4 GB of available RAM.

Validating WN-LMF

python3 -m venv examples/venv
source examples/venv/bin/activate
python3 -m pip install wn
python -m wn validate --output-file examples/wn-lmf-validation.json export/wn-lmf/dannet-wn-lmf.xml

Deployment

The production server at wordnet.dk runs as a systemd service delegating to Docker.

Service Setup

cp system/dannet.service /etc/systemd/system/dannet.service
systemctl enable dannet
systemctl start dannet

Updating the Web Service

To update the web service software without changing the database:

# From the docker/ directory
docker compose up -d dannet --build

Database Release Workflow

When releasing a new version of the database:

  1. Build the database locally via REPL in dk.cst.dannet.web.service:

    (restart)
    ;; Then in dk.cst.dannet.db:
    (export-rdf! @dk.cst.dannet.web.resources/db)
    (export-csv! @dk.cst.dannet.web.resources/db)
    (export-wn-lmf! "export/wn-lmf/")
    
  2. Stop the service on production:

    docker compose stop dannet
    
  3. Transfer database and export files via SFTP, then:

    unzip -o tdb2.zip -d /dannet/db/
    mv cor.zip dannet.zip dds.zip oewn-extension.zip /dannet/export/rdf/
    mv dannet-csv.zip /dannet/export/csv/
    mv dannet-wn-lmf.xml.gz /dannet/export/wn-lmf/
    
  4. Restart:

    docker compose up -d dannet --build