KnowWhereGraph's GraphDB deployment configuration
KnowWhereGraph uses a single node GraphDB Enterprise instance to store and process data requests.
There are six docker-compose files here. The two main flavors are
- Preloading: These compose files are used to the first upload of data. There are three (local/stage/prod)
- Running: These compose files are used when running GraphDB to serve content. There are three (local/stage/prod)
Data is persisted on the host machine, not the container. This is achieved by a volume mount between the host and GraphDB's repository data directory which is set in the docker-compose file. Graph DB stores its repository, configuration, and logging data under /opt/graphdb/home. This path can be mounted to the local system, persisting the data. When a new container is launched, it will reference the persisted data and load it.
GraphDB's initial database is constructed using the importrdf tool from Ontotext. This runs with GraphDB offline and offers much faster data loading than other options. In this process, GraphDB creates a new repository and inserts data into it. To account for this, separate docker-compose files are needed to manage the offline instances.
In order to properly load data,
- The repository configuration must be supplied in
graphdb-data/home/data/repositories/<your-repo-name-here>/config.ttl - The data being imported must be placed in
graphdb-data/import-data - If the repository name is anything other than
KWG, modify the Makefile to account for this change make start-<env>-preloadshould be called from the project root, where env={local/stage/prod}
Loading KnowWhereGraph's data can take days. Once this is complete,
- The docker container will exit
- Confirm the success by checking the logs in the
logsfolder here, or by getting the docker logs - The Data Serving deployment can be initiated (see below)
In the case that something goes wrong, the docker container will most likely exit.
- Get the logs of the stopped container with
docker ps --all docker logs <container_id>- Also check the mounted logs folder in
graphdb-data/logs
When data doesn't need to be loaded and GraphDB is meant to be started as a service that functions as a normal database,
- Use
make start-<env>where env={local/stage.prod} to start the service with the rest of the stack
If the stack is running, stop the stack and start it back up with the command above.
GraphDB has several rolling log files that are in the GraphDB home directory, making it difficult to use docker logs <container_name>. Instead, the logs are mounted to the local volume through the docker-compose file.
Updating GraphDB can be achieved by bumping the version in the docker-compose file. Data should be persisted through the mounted graphdb-data folder. The service should then be redeployed by bringing the stack down, and then back up (kludgy, with downtime).
The Knowledge Explorer webapp relies on integrating GraphDB with Elasticsearch. The Elasticsearch index is on a per-repository basis. This means that the Manhattan repository has its own Elasticsearch index. The Vienna repository has its own Elasticsearch index, etc. As of right now creating these is a manual process. Elasticsearch indexes are created through SPARQL. The SPARQL queries for the indexes are found in the scripts/ folder.
To integrate with Elasticsearch, run the sparql queries in the scripts/ folder.
If GraphDB is not reachable,
- Make sure the container is running
- Make sure nginx is running
- Check the nginx error logs
- Check the graphdb logs
- Restart the service
If GraphDB is running but is unresponsive,
- Check if there's a data load process happening (can slow the service down)
- Check the GraphDB logs
- Restart the pod
docker ps --all- Get the killed container id
docker logs <container_id>