<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://ontogen.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://ontogen.io/" rel="alternate" type="text/html" /><updated>2026-03-19T22:30:03+00:00</updated><id>https://ontogen.io/feed.xml</id><title type="html">Ontogen</title><subtitle>&quot;A Version Control System for RDF datasets&quot;</subtitle><author><name>Marcel Otto</name></author><entry><title type="html">DCAT-R, Gno, and RDF.ex 3.0</title><link href="https://ontogen.io/blog/2026/03/19/dcat-r-gno-and-rdf-ex-3.html" rel="alternate" type="text/html" title="DCAT-R, Gno, and RDF.ex 3.0" /><published>2026-03-19T10:00:00+00:00</published><updated>2026-03-19T10:00:00+00:00</updated><id>https://ontogen.io/blog/2026/03/19/dcat-r-gno-and-rdf-ex-3</id><content type="html" xml:base="https://ontogen.io/blog/2026/03/19/dcat-r-gno-and-rdf-ex-3.html"><![CDATA[<p>It has been almost a year since the last project update, and a lot has happened behind the scenes. The most substantial part of the current development phase is not finished yet - but in the course of this work, several sub-projects have emerged that are independently useful and ready for release today.</p>

<p>As described in the <a href="/roadmap-update-1">previous roadmap update</a>, the development required extracting and generalizing several foundational components. Today, I am pleased to announce three of these:</p>

<ol>
  <li><strong>Gno</strong> - a library for managing RDF datasets in SPARQL triple stores</li>
  <li><strong>DCAT-R</strong> - a specification, vocabulary, and Elixir implementation for describing RDF repositories</li>
  <li><strong>RDF.ex 3.0</strong> - with the new <code class="language-plaintext highlighter-rouge">RDF.Data.Source</code> protocol for polymorphic RDF data access</li>
</ol>

<p>Each of these grew out of Ontogen’s internals but has been designed to stand on its own. In the following sections, I will introduce each project, explain where it comes from, and highlight what is new.</p>

<h2 id="gno">Gno</h2>

<p><a href="https://hex.pm/packages/gno">Gno</a> is a library for managing RDF datasets in SPARQL triple stores. The name “Gno” comes from the Greek root for “knowledge” (as in <em>gnosis</em>). It provides a unified API that abstracts the differences between storage backends, so you can work with your data the same way regardless of the underlying store. Built-in adapters are available for Apache Jena Fuseki, Oxigraph, QLever, and Ontotext GraphDB, and any other SPARQL 1.1-compatible store can be configured with explicit endpoint URLs. Gno also normalizes behavioral differences between stores - for instance, it transparently handles the divergent default graph semantics (isolated vs. union) across backends.</p>

<p>Readers of the <a href="/introduction/part-3">Repository and Service Model</a> article will recognize the store-related parts of Gno. That article introduced the <code class="language-plaintext highlighter-rouge">og:Store</code> concept and the overall service architecture that connects a repository with a storage backend. Gno is, in essence, an extraction of this store adapter system and the surrounding data management operations into an independent library. What was previously only available as part of Ontogen - the store adapter abstraction, the SPARQL operation API, the changeset and configuration system - is now usable on its own, without any dependency on Ontogen’s versioning machinery.</p>

<p>Gno covers all standard SPARQL operations - SELECT, ASK, CONSTRUCT, DESCRIBE queries as well as INSERT, DELETE, and graph management operations (CREATE, DROP, CLEAR, COPY, ADD, MOVE). Beyond raw SPARQL, it provides two higher-level systems:</p>

<p>A <strong>changeset system</strong> for expressing structured changes through four actions: <em>add</em> (insert new statements), <em>update</em> (property-level overwrite), <em>replace</em> (subject-level overwrite), and <em>remove</em> (delete statements). Before applying changes, a changeset can be converted to an <em>effective changeset</em> that queries the current state and computes only the minimal changes actually needed - statements that already exist are not added again, and statements that do not exist are not removed.</p>

<h3 id="the-commit-system">The commit system</h3>

<p>The main addition that Gno brings beyond what existed in Ontogen is an extensible commit system. While the store operations and changeset system were extracted largely unchanged, the commit system is a new layer designed from the start to support middleware-based extensibility.</p>

<p>The commit processor implements a state machine that orchestrates the application of changes through well-defined phases:</p>

<pre><code class="language-mermaid">graph LR
    A[init] --&gt; B[preparing]
    B --&gt; C[prepared]
    C --&gt; D[starting\ntransaction]
    D --&gt; E[applying\nchanges]
    E --&gt; F[changes\napplied]
    F --&gt; G[ending\ntransaction]
    G --&gt; H[finalizing]
    H --&gt; I[completed]

    B -.-&gt;|error| J[rollback]
    C -.-&gt;|error| J
    D -.-&gt;|error| J
    E -.-&gt;|error| J
    F -.-&gt;|error| J
    G -.-&gt;|error| J

    style A fill:#e8e8e8,stroke:#333,stroke-width:2px
    style I fill:#c2f0c2,stroke:#333,stroke-width:2px
    style J fill:#f0c2c2,stroke:#333,stroke-width:2px
</code></pre>

<p>At each state transition, the configured middleware pipeline is invoked. Middleware components can participate in every phase of the commit lifecycle - they can validate changes before they are applied, enrich the commit with additional metadata, add supplementary changes to other graphs, or perform cleanup after completion. If an error occurs at any point during the transactional phases, the processor automatically rolls back all changes.</p>

<p>Middleware is configured declaratively in the service manifest. For example, to enable commit logging:</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">@prefix</span><span class="w"> </span><span class="nn">gno:</span><span class="w"> </span><span class="nl">&lt;https://w3id.org/gno#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nl">&lt;CommitOperation&gt;</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">gno:</span><span class="n">CommitOperation
</span><span class="w">    </span><span class="p">;</span><span class="w"> </span><span class="nn">gno:</span><span class="n">commitMiddleware</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="nl">&lt;Logger&gt;</span><span class="w"> </span><span class="p">)</span><span class="w">
</span><span class="p">.</span><span class="w">

</span><span class="nl">&lt;Logger&gt;</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">gno:</span><span class="n">CommitLogger
</span><span class="w">    </span><span class="p">;</span><span class="w"> </span><span class="nn">gno:</span><span class="n">commitLogLevel</span><span class="w"> </span><span class="s">"debug"</span><span class="w">
    </span><span class="p">;</span><span class="w"> </span><span class="nn">gno:</span><span class="n">commitLogChanges</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>This middleware architecture is the primary extension point through which higher-level systems build on Gno. Ontogen, for instance, implements its entire versioning logic - creating commit objects, updating the history graph, advancing the repository HEAD - as Gno commit middleware. This means Ontogen’s versioning is not a separate mechanism layered on top; it participates directly in Gno’s transactional commit lifecycle, with full rollback support.</p>

<p>For more details, see the <a href="https://rdf-elixir.dev/gno-guide/">Gno User Guide</a> or its <a href="https://hexdocs.pm/gno">API documentation</a>.</p>

<h2 id="dcat-r">DCAT-R</h2>

<p>The <a href="/introduction/part-3">Repository and Service Model</a> article introduced the idea of modeling Ontogen repositories as DCAT catalogs and Ontogen instances as DCAT services. The <code class="language-plaintext highlighter-rouge">og:Repository</code> was defined as a DCAT catalog containing the user dataset and the history graph; the <code class="language-plaintext highlighter-rouge">og:Service</code> combined a repository with a store backend.</p>

<p>During the subsequent development, this pattern kept recurring. Gno, the store management library introduced above, needed the same kind of structure. So did other projects in the pipeline. In each case, the application was an RDF infrastructure service - providing generic capabilities like store access, versioning, or identity management over RDF datasets - and in each case, the same organizational questions arose: How are graphs organized? Which are user data, which are configuration, which are operational infrastructure?</p>

<p>What these applications share is that they leverage RDF’s universality not just for the user data they manage, but also for their own configuration and metadata. The repository description, the service settings, the graph organization - it is all RDF, stored as named graphs alongside the user data. When application structure and user data coexist in the same dataset, the need for principled organization naturally arises.</p>

<p>DCAT-R (Data Catalog Vocabulary for RDF Repositories) addresses this need. It is a <a href="https://w3id.org/dcatr">language-independent specification</a> of a vocabulary extending the W3C’s <a href="https://www.w3.org/TR/vocab-dcat-3/">Data Catalog Vocabulary (DCAT) 3</a>, alongside an Elixir implementation (<a href="https://hex.pm/packages/dcatr">DCAT-R.ex</a>).</p>

<p>Where DCAT focuses on an external perspective - cataloging datasets for discovery, describing service endpoints for consumers - DCAT-R adds an <strong>intra-service perspective</strong>: vocabulary for how a service organizes its data internally. It models this internal structure using DCAT’s own concepts: a repository is a <code class="language-plaintext highlighter-rouge">dcat:Catalog</code>, each graph is a <code class="language-plaintext highlighter-rouge">dcat:Dataset</code>, the service remains a <code class="language-plaintext highlighter-rouge">dcat:DataService</code>. This means existing DCAT tooling can process DCAT-R descriptions without any knowledge of the DCAT-R vocabulary - it simply sees catalogs containing datasets served by data services.</p>

<p>DCAT-R works on two levels. At its simplest, it provides vocabulary for describing RDF datasets at the graph level - classifying graphs by purpose, organizing them into directories, attaching metadata. But it is also designed as a foundation for application frameworks: applications extend DCAT-R by subclassing <code class="language-plaintext highlighter-rouge">dcatr:Service</code> with their own operations, adding application-specific <code class="language-plaintext highlighter-rouge">dcatr:SystemGraph</code> subclasses for operational data, and extending the manifest with application-specific configuration. DCAT-R provides the organizational skeleton; applications fill it with their operations.</p>

<h3 id="the-four-level-hierarchy">The four-level hierarchy</h3>

<p>DCAT-R models RDF repositories through a four-level hierarchy, each level refining a DCAT 3 concept:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Service         (what you can do)
 └── Repository (what you have - distributable)
      └── Dataset   (the user data)
           └── Graph     (individual RDF graphs)
</code></pre></div></div>

<ul>
  <li><strong>Service</strong> (<code class="language-plaintext highlighter-rouge">dcatr:Service</code>, extends <code class="language-plaintext highlighter-rouge">dcat:DataService</code>): The operations layer. A service provides access to a repository and defines what operations are available.</li>
  <li><strong>Repository</strong> (<code class="language-plaintext highlighter-rouge">dcatr:Repository</code>, extends <code class="language-plaintext highlighter-rouge">dcat:Catalog</code>): A managed collection that bundles an RDF dataset with operational infrastructure and catalog metadata. Analogous to a software repository that combines content with build scripts, configuration, and metadata.</li>
  <li><strong>Dataset</strong> (<code class="language-plaintext highlighter-rouge">dcatr:Dataset</code>, extends <code class="language-plaintext highlighter-rouge">dcat:Catalog</code>): The actual RDF 1.1 dataset - the user data that the repository manages, modeled as a catalog of its constituent data graphs.</li>
  <li><strong>Graph</strong> (<code class="language-plaintext highlighter-rouge">dcatr:Graph</code>, extends <code class="language-plaintext highlighter-rouge">dcat:Dataset</code>): An individual RDF graph carrying its own metadata.</li>
</ul>

<h3 id="multi-graph-support">Multi-graph support</h3>

<p>The original Ontogen model only supported a single graph. DCAT-R supports two patterns for organizing data graphs within a repository:</p>

<ul>
  <li><strong>Multi-graph pattern</strong>: <code class="language-plaintext highlighter-rouge">dcatr:repositoryDataset</code> links to a <code class="language-plaintext highlighter-rouge">dcatr:Dataset</code> catalog containing multiple data graphs. An optional <code class="language-plaintext highlighter-rouge">dcatr:repositoryPrimaryGraph</code> designates one graph as the main entry point.</li>
  <li><strong>Single-graph shortcut</strong>: <code class="language-plaintext highlighter-rouge">dcatr:repositoryDataGraph</code> links directly to a single data graph, which also serves as the primary graph.</li>
</ul>

<p>This lays the groundwork for multi-graph support in Ontogen.</p>

<h3 id="distribution-boundary">Distribution boundary</h3>

<p>A key architectural addition is the clear separation between <strong>distributed</strong> data and <strong>local</strong> data:</p>

<pre><code class="language-mermaid">graph TD
    A[dcatr:Service] --&gt;|dcatr:serviceRepository| B(dcatr:Repository)
    A --&gt;|dcatr:serviceLocalData| C(dcatr:ServiceData)

    B --&gt;|dcatr:repositoryDataset| D[dcatr:Dataset]
    B --&gt;|dcatr:repositoryManifestGraph| E[dcatr:RepositoryManifestGraph]
    B --&gt;|dcatr:repositorySystemGraph| F[dcatr:SystemGraph\n- distributed -]

    D --&gt;|dcatr:dataGraph| G[dcatr:DataGraph 1]
    D --&gt;|dcatr:dataGraph| H[dcatr:DataGraph n]

    C --&gt;|dcatr:serviceManifestGraph| I[dcatr:ServiceManifestGraph]
    C --&gt;|dcatr:serviceWorkingGraph| J[dcatr:WorkingGraph]
    C --&gt;|dcatr:serviceSystemGraph| K[dcatr:SystemGraph\n- local -]

    style A fill:#ccd1e0,stroke:#333,stroke-width:4px
    style B fill:#d1c2f0,stroke:#333,stroke-width:2px
    style C fill:#f0e6c2,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
    style D fill:#f0e6ff,stroke:#333,stroke-width:2px
    style E fill:#f0e6ff,stroke:#333,stroke-width:2px
    style F fill:#f0e6ff,stroke:#333,stroke-width:2px
    style G fill:#fff,stroke:#333,stroke-width:1px
    style H fill:#fff,stroke:#333,stroke-width:1px
    style I fill:#fff5e0,stroke:#333,stroke-width:1px
    style J fill:#fff5e0,stroke:#333,stroke-width:1px
    style K fill:#fff5e0,stroke:#333,stroke-width:1px
</code></pre>

<p>The <strong>Repository</strong> contains everything that is part of the distribution: the dataset with its data graphs, the repository manifest graph with DCAT catalog metadata, and distributed system graphs (e.g., version history, provenance). When the repository is replicated or shared, all of this travels together.</p>

<p><strong>ServiceData</strong> contains everything local to a particular service instance: the service manifest graph with instance-specific configuration, working graphs for temporary data, and local system graphs (caches, logs). Service data is never distributed.</p>

<p>This separation enables multi-instance deployments where different service instances serve the same repository with different configurations or storage backends.</p>

<h3 id="graph-naming">Graph naming</h3>

<p>In an RDF dataset, graph names are dataset-local identifiers - they are not inherently suited as global identifiers in a distributed context. When a repository is replicated or shared, this becomes a problem: which names are stable, globally meaningful identities, and which are just local conventions of a particular service instance?</p>

<p>DCAT-R addresses this by consistently separating a graph’s <strong>graph ID</strong> - its RDF resource URI, serving as a globally stable identifier - from the <strong>local graph name</strong> under which it appears in a particular service’s RDF dataset. Following the distribution boundary principle, graph IDs belong to the repository (distributed), while local graph names belong to the service configuration (local). By default, DCAT-R uses the graph ID as the graph name. When a different local name is needed, the <code class="language-plaintext highlighter-rouge">dcatr:localGraphName</code> property allows defining one in the service manifest.</p>

<p>The same principle underlies the distinction between <strong>primary graph</strong> (a repository-level concept: the graph that operations target by default) and <strong>default graph</strong> (a service-level concept: the unnamed graph in the RDF dataset). The <code class="language-plaintext highlighter-rouge">dcatr:usePrimaryAsDefault</code> property controls the relationship between these two.</p>

<h3 id="graph-type-taxonomy">Graph type taxonomy</h3>

<p>Every graph in DCAT-R belongs to exactly one of four disjoint types:</p>

<ul>
  <li><strong>DataGraph</strong>: User data forming the dataset content</li>
  <li><strong>ManifestGraph</strong>: DCAT-R configuration and catalog metadata (with subtypes <code class="language-plaintext highlighter-rouge">RepositoryManifestGraph</code> and <code class="language-plaintext highlighter-rouge">ServiceManifestGraph</code>)</li>
  <li><strong>SystemGraph</strong>: Application-specific operational data (e.g., version history, indexes, provenance records)</li>
  <li><strong>WorkingGraph</strong>: Temporary, service-local graphs for drafts, staging, or caches</li>
</ul>

<p>These four types are defined as pairwise disjoint OWL classes whose union equals <code class="language-plaintext highlighter-rouge">dcatr:Graph</code>, ensuring that every graph has an unambiguous classification. This enables applications to reliably distinguish user data from infrastructure without relying on naming conventions.</p>

<h3 id="manifest-system">Manifest system</h3>

<p>Building on Ontogen’s original <code class="language-plaintext highlighter-rouge">Ontogen.Config</code>, the configuration system has been formalized as a two-graph manifest system reflecting the distribution boundary: the <strong>repository manifest graph</strong> carries distributed catalog metadata, while the <strong>service manifest graph</strong> carries instance-local configuration. Additionally, DCAT-R introduces <strong>Manifest Graph Expansion</strong> (MGE), a mechanism for automatically including referenced resources from a shared pool into the appropriate manifest graphs. This provides a DRY pattern for shared resources (such as agent descriptions) across multiple manifest files.</p>

<h3 id="directory-support">Directory support</h3>

<p>Real-world RDF repositories can contain dozens or hundreds of named graphs. DCAT-R introduces <code class="language-plaintext highlighter-rouge">dcatr:Directory</code> as a hierarchical containment mechanism for organizing graphs into named collections, much like a filesystem organizes files into directories. Directories can be nested to arbitrary depth, and each graph belongs to at most one directory. When graph URIs follow a hierarchical naming scheme, directories can make this structure explicit and navigable.</p>

<h3 id="dcat-rex">DCAT-R.ex</h3>

<p><a href="https://hex.pm/packages/dcatr">DCAT-R.ex</a> is the Elixir implementation of the DCAT-R specification. It provides <a href="https://hex.pm/packages/grax">Grax</a>-based schemas for all DCAT-R classes and a manifest loading pipeline that resolves environment-specific configurations from Turtle files.</p>

<p>The key design principle of DCAT-R.ex is extensibility through behaviors. Applications define specialized types by implementing:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">DCATR.Service.Type</code> - to define a service with custom operations and configuration</li>
  <li><code class="language-plaintext highlighter-rouge">DCATR.Repository.Type</code> - to add distributed system graphs</li>
  <li><code class="language-plaintext highlighter-rouge">DCATR.ServiceData.Type</code> - to add local system graphs or working graphs</li>
  <li><code class="language-plaintext highlighter-rouge">DCATR.Manifest.Type</code> - to register the specialized service type and optionally integrate custom configuration logic (such as the <a href="/introduction/part-4">Bog</a>-based interpretation used in Ontogen)</li>
</ul>

<h3 id="gno-as-a-dcat-r-service">Gno as a DCAT-R service</h3>

<p>Gno itself is a concrete example of this extension pattern. A <code class="language-plaintext highlighter-rouge">gno:Service</code> is a subclass of <code class="language-plaintext highlighter-rouge">dcatr:Service</code> that adds two elements:</p>

<pre><code class="language-mermaid">graph TD
    A[gno:Service] --&gt;|dcatr:serviceRepository| B(dcatr:Repository)
    A --&gt;|gno:serviceStore| C(gno:Store)
    A --&gt;|gno:serviceCommitOperation| D(gno:CommitOperation)

    B --&gt;|dcatr:repositoryDataset| E[dcatr:Dataset]
    B --&gt;|dcatr:repositoryManifestGraph| F[dcatr:RepositoryManifestGraph]

    C --&gt;|rdf:type| G[gnoa:Fuseki / gnoa:Oxigraph / ...]

    D --&gt;|gno:commitMiddleware| H["( Middleware 1, Middleware 2, ... )"]

    style A fill:#ccd1e0,stroke:#333,stroke-width:4px
    style B fill:#d1c2f0,stroke:#333,stroke-width:2px
    style C fill:#c2f0d1,stroke:#333,stroke-width:2px
    style D fill:#c2f0d1,stroke:#333,stroke-width:2px
    style E fill:#f0e6ff,stroke:#333,stroke-width:2px
    style F fill:#f0e6ff,stroke:#333,stroke-width:2px
</code></pre>

<ol>
  <li>A <strong>Store</strong> (<code class="language-plaintext highlighter-rouge">gno:Store</code>) representing the SPARQL triple store backend, with vendor-specific subclasses that know how to construct the correct endpoint URLs (e.g., <code class="language-plaintext highlighter-rouge">gnoa:Fuseki</code> constructs Fuseki’s <code class="language-plaintext highlighter-rouge">/{dataset}/sparql</code>, <code class="language-plaintext highlighter-rouge">/{dataset}/update</code>, etc. from a dataset name).</li>
  <li>A <strong>CommitOperation</strong> (<code class="language-plaintext highlighter-rouge">gno:CommitOperation</code>) carrying the middleware pipeline configuration for the commit system.</li>
</ol>

<p>This means Gno does not introduce its own repository model - it reuses DCAT-R’s repository, dataset, and graph structure and adds only the store-related and commit-related configuration on top. Systems that build on Gno (like Ontogen) can in turn extend the Gno service type further, adding their own system graphs (a history graph in this case), commit middleware, and application-specific configuration - all within the DCAT-R framework.</p>

<h2 id="rdfex-30">RDF.ex 3.0</h2>

<p>Alongside these higher-level frameworks, <a href="https://hex.pm/packages/rdf">RDF.ex 3.0</a> brings a significant redesign of the <code class="language-plaintext highlighter-rouge">RDF.Data</code> API, among other improvements. The previous <code class="language-plaintext highlighter-rouge">RDF.Data</code> protocol is now structured in two parts, following Elixir’s <code class="language-plaintext highlighter-rouge">Enumerable</code>/<code class="language-plaintext highlighter-rouge">Enum</code> pattern:</p>

<ul>
  <li>The <code class="language-plaintext highlighter-rouge">RDF.Data.Source</code> protocol defines a minimal set of primitives that RDF data structures implement.</li>
  <li>The <code class="language-plaintext highlighter-rouge">RDF.Data</code> module builds a rich, user-friendly API on top of these primitives, providing functions for iteration, transformation, navigation, aggregation, and conversion.</li>
</ul>

<p>Just as implementing <code class="language-plaintext highlighter-rouge">Enumerable</code> for a custom data structure gives access to all of <code class="language-plaintext highlighter-rouge">Enum</code>’s functions, implementing <code class="language-plaintext highlighter-rouge">RDF.Data.Source</code> gives access to the entire <code class="language-plaintext highlighter-rouge">RDF.Data</code> API. This enables uniform processing of RDF data regardless of whether it comes from an <code class="language-plaintext highlighter-rouge">RDF.Description</code>, an <code class="language-plaintext highlighter-rouge">RDF.Graph</code>, an <code class="language-plaintext highlighter-rouge">RDF.Dataset</code>, or a custom implementation.</p>

<p>For a comprehensive overview, see the new <a href="https://rdf-elixir.dev/rdf-ex/rdf-data">RDF.Data section in the user guide</a> or the <a href="https://hexdocs.pm/rdf/RDF.Data.html">API documentation</a>.</p>

<h2 id="ontogen-status">Ontogen status</h2>

<p>Ontogen has already been fully migrated to DCAT-R and Gno. The architecture now forms a clean three-layer stack:</p>

<ul>
  <li><strong>DCAT-R</strong> provides the structural vocabulary</li>
  <li><strong>Gno</strong> adds store operations</li>
  <li><strong>Ontogen</strong> adds versioning semantics</li>
</ul>

<p>In concrete terms: Ontogen services are now realized as DCAT-R services (via Gno), and Ontogen’s versioning logic - creating commit objects, writing to the history graph, updating the repository HEAD - is implemented as Gno commit middleware.</p>

<p>However, completing Ontogen’s next version also depends on two other projects that are not release-ready yet. The new version of Ontogen is planned for release this summer, together with the release of these other projects.</p>

<p>One item from the original roadmap that will not be realized is the planned DID integration by Patrick, whose other commitments did not leave him enough time to pursue this work.</p>

<p>As always, I would like to express my sincere gratitude to the <a href="https://nlnet.nl">NLnet Foundation</a> for their continued support through the <a href="https://nlnet.nl/core">NGI Zero Core</a> fund, which makes all of this work possible.</p>]]></content><author><name>Marcel Otto</name></author><category term="blog" /><summary type="html"><![CDATA[It has been almost a year since the last project update, and a lot has happened behind the scenes. The most substantial part of the current development phase is not finished yet - but in the course of this work, several sub-projects have emerged that are independently useful and ready for release today.]]></summary></entry><entry><title type="html">Project Update: New Roadmap for 2025 and release announcements</title><link href="https://ontogen.io/blog/2025/04/09/roadmap-update-1.html" rel="alternate" type="text/html" title="Project Update: New Roadmap for 2025 and release announcements" /><published>2025-04-09T10:00:00+00:00</published><updated>2025-04-09T10:00:00+00:00</updated><id>https://ontogen.io/blog/2025/04/09/roadmap-update-1</id><content type="html" xml:base="https://ontogen.io/blog/2025/04/09/roadmap-update-1.html"><![CDATA[<p>The work on Mud, the extracted and enhanced version of Bog, had been progressing well towards solving the problem of static UUID graph names in Ontogen repositories. However, when it came to extending automatic URI generation and management to the resources of the dataset itself, it became increasingly clear that Bog’s strategy alone was insufficient. So, after going back to the drawing board, I’ve developed a new approach. While requiring significantly more effort than initially planned, it promises a comprehensive solution to the URI generation problem which, as described in the <a href="/introduction/part-4">Bog article</a>, I consider one of the fundamental challenges of the Semantic Web.</p>

<p>During the design of this new solution, it also became clear that the project must be split up further, so that there’s now also a need for an additional extraction: that of the DCAT-based Repository and Service model (described in <a href="/introduction/part-3">this article</a>). This separation will make the dataset management capabilities available independently while allowing Ontogen to focus exclusively on its core versioning functionality.</p>

<p>The project’s evolution, particularly additional requirements that emerged for the planned DID implementation, has also led to several updates to the RDF on Elixir libraries, some of which are being released today:</p>

<ul>
  <li>JSON-LD.ex v1.0 with JSON-LD 1.1 support</li>
  <li>RDF.ex v2.1 with <code class="language-plaintext highlighter-rouge">rdf:JSON</code> literal support</li>
  <li>Grax v0.6 with <code class="language-plaintext highlighter-rouge">rdf:JSON</code> support and ordered lists based on <code class="language-plaintext highlighter-rouge">rdf:List</code>s</li>
</ul>

<p>Please refer to the respective CHANGELOGs for a comprehensive list of the changes.</p>

<p>Further planned library updates include:</p>

<ul>
  <li>Support for the upcoming RDF 1.2 specification in RDF.ex</li>
  <li>Support for JSON-LD framing</li>
  <li>Two patterns used repeatedly to solve problems in the new design will be implemented as Grax extensions:
    <ul>
      <li>Managing temporal values in RDF, tracking how values change over time while maintaining their history</li>
      <li>Handling multiple language-tagged string values as a unique value, enabling functional property behavior for localized strings</li>
    </ul>
  </li>
</ul>

<p>Some previously announced features, at least Sagas, will need to be deferred to a future development phase to accommodate this more fundamental work.</p>

<p>While this represents a shift from the original roadmap, I strongly believe that the benefits of the planned comprehensive URI management outweigh the delay of other features.</p>

<p>I’d like to express my sincere gratitude to the NLnet Foundation for their continued support, making all of this possible.</p>]]></content><author><name>Marcel Otto</name></author><category term="blog" /><summary type="html"><![CDATA[The work on Mud, the extracted and enhanced version of Bog, had been progressing well towards solving the problem of static UUID graph names in Ontogen repositories. However, when it came to extending automatic URI generation and management to the resources of the dataset itself, it became increasingly clear that Bog’s strategy alone was insufficient. So, after going back to the drawing board, I’ve developed a new approach. While requiring significantly more effort than initially planned, it promises a comprehensive solution to the URI generation problem which, as described in the Bog article, I consider one of the fundamental challenges of the Semantic Web.]]></summary></entry><entry><title type="html">Roadmap for next NLnet funding</title><link href="https://ontogen.io/blog/2024/10/22/roadmap.html" rel="alternate" type="text/html" title="Roadmap for next NLnet funding" /><published>2024-10-22T10:00:00+00:00</published><updated>2024-10-22T10:00:00+00:00</updated><id>https://ontogen.io/blog/2024/10/22/roadmap</id><content type="html" xml:base="https://ontogen.io/blog/2024/10/22/roadmap.html"><![CDATA[<p>I’m pleased to announce that the NLnet foundation has approved follow-up funding for Ontogen from the <a href="https://nlnet.nl/commonsfund/">NGI0 Commons Fund</a>.  I’m grateful for NLnet’s trust in this project and their general commitment to advancing open source technologies.</p>

<p>Unfortunately, the future of such funding through the EU is uncertain. There are plans to redirect these funds in questionable directions, potentially affecting this and many other open source projects. For more information on this concerning development, see this <a href="https://fsfe.org/news/2024/news-20240719-01.en.html">article by the Free Software Foundation</a>.</p>

<p>Fortunately, the funding for Ontogen’s development is secured for the coming year. With this support, the plan is to overcome the current limitations mentioned in the introductory articles (and now also clearly stated in the project READMEs) and enhance Ontogen’s functionality. Here’s an overview of the planned work.</p>

<h2 id="roadmap">Roadmap</h2>

<h3 id="mud">Mud</h3>

<p><a href="https://ontogen.io/introduction/part-4">Bog, Ontogen’s configuration language</a> and corresponding precompiler for automatic ID management in RDF, will be extracted from Ontogen and realized as its own project, as its functions can be usefully applied outside the versioning context. In the process, it will be renamed to Mud.</p>

<p>The problem of cryptic graph names will be solved by extending the mechanism for generating URIs for resources to include the ability to later rename them to custom URIs and track the past URIs. The possibility of generating URIs, previously limited to the resources of an Ontogen repository, will be extended to the resources of the version-controlled dataset.</p>

<p>Furthermore, the range of functions will be expanded to include additional typical and extensible operations that one might want to perform when working with RDF datasets, such as URI normalizations, RDF smushing, etc.</p>

<h3 id="saga-synchronization-protocol">Saga Synchronization Protocol</h3>

<p>Sagas introduces a synchronization protocol to keep different storage locations in sync by utilizing Ontogen’s version history itself. Just as VCS can be viewed as a synchronization solution for copies of a repo, “Sagas” can be viewed as “clones” within an Ontogen instance that are kept in sync via the shared version history.</p>

<p>This allows us to turn necessity into virtue and actually duplicate and synchronize not just the repository configuration in the file system, but the entire content of the dataset. This duplication opens up new possibilities. By having a complete copy of the repository in the file system, we enable the use of the many file-based tools for working with the data, for example, we can use ordinary text editors to edit the data kept in sync with the triple store.</p>

<p>This significantly enhances the flexibility and accessibility of the data, allowing users to leverage their preferred tools and workflows. Furthermore, this approach allows for integration with Git. By leveraging Git’s versioning capabilities alongside Ontogen’s specialized RDF versioning, we can achieve better collaboration, history backups and migrations, and even more flexible workflows in RDF dataset management, especially until Ontogen is more mature and can offer similar functionalities (although the syntactic versioning of Git will never be offered and for users who want or need that, this integration is useful in the long term).</p>

<h3 id="multi-graph-dataset-versioning">Multi-Graph Dataset Versioning</h3>

<p>Supporting versioning of datasets with multiple graphs is a larger undertaking, unfortunately, as RDF-star, which forms the basis of Ontogen’s versioning model, only allows annotation of triples and not quads. Therefore, a rather complex tracking of the assignment of individual triples to graphs must be realized at the level of the RDF-star annotations of RTC compounds.</p>

<h3 id="did-integration">DID Integration</h3>

<p>In addition to these core developments, <a href="https://www.patrickoscity.de/">Patrick Oscity</a> will contribute by developing an Elixir implementation of W3C Decentralized Identifiers (DIDs). This implementation will be integrated into Mud and Ontogen to support the automatic generation of DIDs, providing a standardized, interoperable format for persistent identifiers with enhanced privacy features, for datasets where this is needed or useful.</p>

<h2 id="scalability-problem">Scalability problem</h2>

<p>A previously unmentioned limitation should be mentioned: Ontogen, in its current form, is not suitable for large datasets. While I work on the planned developments, I’m also exploring solutions to this scalability issue.</p>

<p>At least an issue causing timeouts with larger queries was removed in the recently released version 0.1.3. If you’ve already installed Ontogen, you can upgrade using</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>brew upgrade ontogen
</code></pre></div></div>]]></content><author><name>Marcel Otto</name></author><category term="blog" /><summary type="html"><![CDATA[I’m pleased to announce that the NLnet foundation has approved follow-up funding for Ontogen from the NGI0 Commons Fund. I’m grateful for NLnet’s trust in this project and their general commitment to advancing open source technologies.]]></summary></entry><entry><title type="html">Ontogen Configuration with Bog</title><link href="https://ontogen.io/introduction/part-4" rel="alternate" type="text/html" title="Ontogen Configuration with Bog" /><published>2024-08-14T07:00:00+00:00</published><updated>2024-08-14T07:00:00+00:00</updated><id>https://ontogen.io/introduction/ontogen-configuration-with-bog</id><content type="html" xml:base="https://ontogen.io/introduction/part-4"><![CDATA[<p>This is the fourth and last of a series of four blog posts introducing the different parts of the Ontogen version control system and the ideas behind it:</p>

<ol>
  <li><a href="/introduction/part-1">Introducing Ontogen</a></li>
  <li><a href="/introduction/part-2">Ontogen’s Versioning Model</a></li>
  <li><a href="/introduction/part-3">Ontogen’s Repository and Service Model</a></li>
  <li><strong><a href="/introduction/part-4">Ontogen Configuration with Bog</a></strong></li>
</ol>

<hr />

<p>In the <a href="/introduction/part-3">previous article</a>, we explored the interpretation of <code class="language-plaintext highlighter-rouge">dcat:Catalog</code> and <code class="language-plaintext highlighter-rouge">dcat:DataService</code> concepts within the framework of Ontogen as a Data Control Management (DCM) system. Now, let’s turn our attention to another crucial aspect: the configuration of Ontogen components.</p>

<p>We’ve seen that an Ontogen service represents a concrete instantiation of a repository on a user’s machine, consisting of the repository itself and its associated triple store. Configuring these components presents us with several challenges, particularly when it comes to naming and identifying resources.</p>

<p>In this article, we’ll delve into Ontogen’s configuration system. We’ll explore Bog, a specialized configuration language developed for Ontogen. Bog addresses some of the fundamental challenges in configuring RDF-based systems and offers innovative solutions for resource naming and identification.</p>

<h2 id="bog">Bog</h2>

<p>“Naming is hard.” This is especially true in the RDF world with its URIs, and even more so under Linked Data rules, where the preference for HTTP URIs further complicates the issue by introducing DNS as a social authority, thus bringing societal and political aspects into play.</p>

<p>To be clear: URIs as a central component of the RDF model are one of the reasons that make it so powerful. Nevertheless, the additional complexity this creates for the naming problem is, in my opinion, one of the reasons why it still causes reservations compared to other data models. Particularly when combined with the lack of automated solutions for this problem, it makes traditional, relational data model proponents shake their heads and happily continue to generate records (or models of their ORMs) with automatically generated primary keys.</p>

<p>Bog aims to automate the problem of resource naming (aka URI minting) for a specific class of resources, initially and specifically for the resources of an Ontogen service that need to be minted as part of the configuration.</p>

<p>To this end, Bog introduces some special RDF properties and resources with particular semantics that are processed by a Bog interpreter. The most fundamental of these properties, to which all others are ultimately reduced, is <code class="language-plaintext highlighter-rouge">bog:ref</code>. With <code class="language-plaintext highlighter-rouge">bog:ref</code>, a resource initially identified by a blank node can be given a locally valid name.</p>

<p>Example:</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># we omit this prefix in the following code snippets</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">:</span><span class="w"> </span><span class="nl">&lt;https://w3id.org/bog#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="p">[</span><span class="w"> </span><span class="nn">:</span><span class="n">ref</span><span class="w"> </span><span class="s">"this-service-instance"</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">og:</span><span class="n">Service</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>When first interpreted by the Bog interpreter, this is interpreted as minting a new, locally named resource. A random salt is then stored in a file with the specified name. The interpreter then replaces the blank node with a generated UUIDv5 URI. For each subsequent interpretation, loading the salt reproduces exactly the same UUIDv5 URI, allowing for consistent translation to the same graph.</p>

<p>Note: the name itself is not part of this hash, meaning that the locally used <code class="language-plaintext highlighter-rouge">:ref</code> name can be changed at any time by renaming the file and all names in the local configuration files, yet still leads to the same UUID URIs.</p>

<p>To prevent accidental changes, Bog throws an error if the salt file is not present. This is only allowed during initial minting.</p>

<p>Bog offers another solution to avoid name changes: the indexical <code class="language-plaintext highlighter-rouge">bog:this</code> property. The concept of <a href="https://en.wikipedia.org/wiki/Indexicality">indexicality</a> comes from the philosophy of language and refers to linguistic expressions whose meaning depends on the context of their utterance. In Bog, this concept is applied to the referencing of resources. It allows referencing individual instances of classes that represent a unique individual relative to this instance in the context of the executing instance. Consequently, these quasi-relative singletons can also be referenced directly via their class in the context of the executing instance.</p>

<p>Example: In the context of execution on an Ontogen instance, there is exactly one distinguished individual of the class <code class="language-plaintext highlighter-rouge">og:Service</code> that represents this very instance as an <code class="language-plaintext highlighter-rouge">og:Service</code>. Thus, in Bog, it can also be referenced using the <code class="language-plaintext highlighter-rouge">bog:this</code> property as follows:</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w"> </span><span class="nn">:</span><span class="n">this</span><span class="w"> </span><span class="nn">og:</span><span class="n">Service</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>This is interpreted by the Bog interpreter as follows:</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w"> </span><span class="nn">:</span><span class="n">ref</span><span class="w"> </span><span class="s">"service"</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">og:</span><span class="n">Service</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>In fact, almost all elements of an <code class="language-plaintext highlighter-rouge">og:Service</code> are unique singletons relative to this instance and thus indexically referenceable via <code class="language-plaintext highlighter-rouge">bog:this</code> and the corresponding class name.</p>

<p>Additionally, for referencing the user of the application, Bog provides the indexical <code class="language-plaintext highlighter-rouge">bog:I</code> resource, which can be used to reference the user of the system:</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">:</span><span class="n">I</span><span class="w"> </span><span class="nn">ex:</span><span class="n">p</span><span class="w"> </span><span class="nn">ex:</span><span class="n">O</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>This is interpreted as:</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w"> </span><span class="nn">:</span><span class="n">this</span><span class="w"> </span><span class="nn">og:</span><span class="n">Agent</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="nn">ex:</span><span class="n">p</span><span class="w"> </span><span class="nn">ex:</span><span class="n">O</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>Through local interpretation, Bog thus allows us to reference the various components of an Ontogen service without having to name them, but instead getting automatically generated, stable URIs for these resources.</p>

<p>It is planned to spin off Bog as its own project with expanded functionality in the next version of Ontogen. In particular, it should be possible to give the resources managed with Bog proper Linked Data URIs at a later point in time if needed.</p>

<p>In future versions, there are plans to apply the Bog-based minting process to resources of the version-controlled dataset itself, and to include this minting process as a speech act within the version history. The secret salts possessed by the creator of such resources would then give them cryptographic control over the further use and modification possibilities of these resources.</p>

<p>These developments aim to enhance Bog’s capabilities and integrate it more deeply with Ontogen’s versioning system, potentially offering new levels of security and control over resource management.</p>

<h2 id="bog-based-configuration-of-ontogen-services-and-repositories">Bog-based configuration of Ontogen services and repositories</h2>

<p>With this mechanism, we have a method for generating persistent and reproducible UUID URIs. Now, let’s look at how the concrete specification of Ontogen services and their components (store, repository, etc.) works in practice with the Bog interpreter.</p>

<p>For a project created with the Ontogen CLI, the directory structure typically looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>my_dataset/  
├── .ontogen  
│    ├── .salts  
│    │   ├── agent.salt  
│    │   ├── dataset.salt  
│    │   ├── fuseki.salt  
│    │   ├── history.salt  
│    │   ├── oxigraph.salt  
│    │   ├── repository.salt  
│    │   ├── service.salt  
│    │   └── store.salt  
│    └── config  
│        ├── agent.bog.ttl  
│        ├── dataset.bog.ttl  
│        ├── fuseki.bog.ttl  
│        ├── oxigraph.bog.ttl  
│        ├── repository.bog.ttl  
│        ├── service.bog.ttl  
│        └── store.bog.ttl  
│── ...
</code></pre></div></div>

<p>Each of the <code class="language-plaintext highlighter-rouge">.ontogen/config/*.bog.ttl</code> files configures exactly one singleton instance of the respective class of an Ontogen service instance, to which the <code class="language-plaintext highlighter-rouge">bog:this</code> in the respective Bog Turtle file refers. The salt for generating the URI is persisted in the respective <code class="language-plaintext highlighter-rouge">.ontogen/.salts/*.salt</code> file of the same name.</p>

<p>Rather than using a single large configuration file, a modular approach is followed here, where each component is configured in its own Bog Turtle file. This approach is particularly advisable because the component descriptions should not only contain the configuration of the respective component (for which there are not many configuration options at this point in this first version), but also the general description of the respective resource belongs here, i.e., the description of the <code class="language-plaintext highlighter-rouge">og:Service</code> as a <code class="language-plaintext highlighter-rouge">dcat:DataService</code>, the description of the <code class="language-plaintext highlighter-rouge">og:Dataset</code> as a <code class="language-plaintext highlighter-rouge">dcat:Catalog</code> or the <code class="language-plaintext highlighter-rouge">og:Agent</code> as a <code class="language-plaintext highlighter-rouge">foaf:Agent</code>, etc. A complete description of all these resources in one file would quickly become very large and confusing.</p>

<p>Here’s an example of the configuration of the <code class="language-plaintext highlighter-rouge">og:Dataset</code> with a complete DCAT description:</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">@prefix</span><span class="w"> </span><span class="nn">:</span><span class="w"> </span><span class="nl">&lt;https://w3id.org/bog#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">og:</span><span class="w"> </span><span class="nl">&lt;https://w3id.org/ontogen#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">dcat:</span><span class="w"> </span><span class="nl">&lt;http://www.w3.org/ns/dcat#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">dcterms:</span><span class="w"> </span><span class="nl">&lt;http://purl.org/dc/terms/&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">foaf:</span><span class="w"> </span><span class="nl">&lt;http://xmlns.com/foaf/0.1/&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">xsd:</span><span class="w"> </span><span class="nl">&lt;http://www.w3.org/2001/XMLSchema#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="p">[</span><span class="w"> </span><span class="nn">:</span><span class="n">this</span><span class="w"> </span><span class="nn">og:</span><span class="n">Dataset
</span><span class="w">  </span><span class="p">;</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">dcat:</span><span class="n">Dataset
</span><span class="w">  </span><span class="p">;</span><span class="w"> </span><span class="nn">dcterms:</span><span class="n">title</span><span class="w"> </span><span class="s">"My Dataset"</span><span class="na">@en</span><span class="w">
  </span><span class="p">;</span><span class="w"> </span><span class="nn">dcterms:</span><span class="n">description</span><span class="w"> </span><span class="s">"An example dataset"</span><span class="na">@en</span><span class="w">
  </span><span class="p">;</span><span class="w"> </span><span class="nn">dcterms:</span><span class="n">created</span><span class="w"> </span><span class="s">"2023-08-13"</span><span class="p">^^</span><span class="nn">xsd:</span><span class="n">date
</span><span class="w">  </span><span class="p">;</span><span class="w"> </span><span class="nn">dcterms:</span><span class="n">creator</span><span class="w"> </span><span class="nn">:</span><span class="n">I
</span><span class="w">  </span><span class="p">;</span><span class="w"> </span><span class="nn">dcterms:</span><span class="n">publisher</span><span class="w"> </span><span class="nn">:</span><span class="n">I
</span><span class="w">  </span><span class="p">;</span><span class="w"> </span><span class="nn">dcat:</span><span class="n">contactPoint</span><span class="w"> </span><span class="nn">:</span><span class="n">I
</span><span class="w">  </span><span class="p">;</span><span class="w"> </span><span class="nn">dcat:</span><span class="n">keyword</span><span class="w"> </span><span class="s">"RDF"</span><span class="na">@en</span><span class="p">,</span><span class="w"> </span><span class="s">"Ontology"</span><span class="na">@en</span><span class="p">,</span><span class="w"> </span><span class="s">"Versioning"</span><span class="na">@en</span><span class="w">
  </span><span class="p">;</span><span class="w"> </span><span class="nn">dcat:</span><span class="n">theme</span><span class="w"> </span><span class="nl">&lt;http://example.org/themes/semantic-web&gt;</span><span class="w">
  </span><span class="p">;</span><span class="w"> </span><span class="nn">dcterms:</span><span class="n">license</span><span class="w"> </span><span class="nl">&lt;http://creativecommons.org/licenses/by/4.0/&gt;</span><span class="w">
</span><span class="p">]</span><span class="w">
</span></code></pre></div></div>

<p>(You can find more examples of the configuration in the <a href="/docs/user-guide/#setting-up-a-repository-in-a-triple-store">User Guide</a>.)</p>

<p>Additionally, in global files, similar to Git, Bog Turtle files with more general default values for certain configurations and metadata can be specified, for example, about the user agent in general:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">/etc/ontogen.conf.bog.ttl</code></li>
  <li><code class="language-plaintext highlighter-rouge">~/.ontogen.conf.bog.ttl</code></li>
</ul>

<p>The structure and interpretation of the description of an Ontogen service and its components thus looks as follows:</p>

<ol>
  <li>
    <p>When Ontogen starts, all configuration files are loaded into a graph, starting with the global files. During this incremental construction of the graph, it should be noted that values for properties of the same resource are overwritten. This ensures that the values from the global configuration files only serve as default values and the repository-specific configuration files can completely overwrite these values if necessary.</p>
  </li>
  <li>The following Bog Turtle fragment is then added to this graph to ensure the basic static linking of the resources of the <code class="language-plaintext highlighter-rouge">og:Service</code> aggregate:
    <div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="nn">:</span><span class="n">this</span><span class="w"> </span><span class="nn">og:</span><span class="n">Service</span><span class="w">  
     </span><span class="p">;</span><span class="w"> </span><span class="nn">og:</span><span class="n">serviceRepository</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="nn">:</span><span class="n">this</span><span class="w"> </span><span class="nn">og:</span><span class="n">Repository</span><span class="w">  
         </span><span class="p">;</span><span class="w"> </span><span class="nn">og:</span><span class="n">repositoryDataset</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="nn">:</span><span class="n">this</span><span class="w"> </span><span class="nn">og:</span><span class="n">Dataset</span><span class="w"> </span><span class="p">]</span><span class="w">  
         </span><span class="p">;</span><span class="w"> </span><span class="nn">og:</span><span class="n">repositoryHistory</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="nn">:</span><span class="n">this</span><span class="w"> </span><span class="nn">og:</span><span class="n">History</span><span class="w"> </span><span class="p">]</span><span class="w">  
     </span><span class="p">]</span><span class="w">  
      
     </span><span class="p">;</span><span class="w"> </span><span class="nn">og:</span><span class="n">serviceOperator</span><span class="w"> </span><span class="nn">:</span><span class="n">I</span><span class="w">  
 </span><span class="p">]</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div>    </div>
  </li>
  <li>
    <p>Then the Bog precompiler is applied to this graph, which resolves blank nodes to URIs according to the salts or mints them.</p>
  </li>
  <li>Finally, the resulting graph is loaded using <a href="https://github.com/rdf-elixir/grax">Grax</a> (the RDF Data Mapper for Elixir used) into a deeply nested, native Elixir structure for the Ontogen service and all its components, which then serves as the state of the <code class="language-plaintext highlighter-rouge">Ontogen</code> singleton GenServer.</li>
</ol>

<p>An issue that needs to be addressed soon is that the repository description is copied to the repository metadata graph in the store when the repository is set up (with <code class="language-plaintext highlighter-rouge">og setup</code>). Currently, there’s no way to update this metadata when the configuration has been changed.</p>

<p>This issue, along with the previously mentioned lack of ability to introduce custom URIs for resources, is the reason why users currently have to work with cryptic graph names. Specifically, the Bog-generated URI of the <code class="language-plaintext highlighter-rouge">og:Dataset</code> is used as the name of the graph for the version-controlled dataset, the Bog-generated URI of the <code class="language-plaintext highlighter-rouge">og:History</code> serves as the name for the history graph, and the Bog-generated URI of the <code class="language-plaintext highlighter-rouge">og:Repository</code> is employed as the name of the repository metadata graph.</p>

<p>I consider this a significant drawback of the current implementation. Addressing these two issues - allowing metadata updates and introducing custom URI support - is therefore of the highest priority in the upcoming development work. The goal is to provide users with more intuitive and manageable graph naming conventions.</p>

<h2 id="store-adapters">Store adapters</h2>

<p>An careful reader may have noticed that the basic structure of an <code class="language-plaintext highlighter-rouge">og:Service</code> described earlier lacks the link to the <code class="language-plaintext highlighter-rouge">og:Store</code>. The reason for this is that the configuration of the <code class="language-plaintext highlighter-rouge">og:Store</code> differs from that of other components, as different triple stores, although all SPARQL-based, require different configurations. Ontogen addresses this challenge through the implementation of store adapters.</p>

<p>In Ontogen, triple store adapters are implemented as subclasses of <code class="language-plaintext highlighter-rouge">og:Store</code>. This solution is not only conceptually very simple but also provides a comprehensive basis for solving this problem thanks to Grax and its support for polymorphic links. (In particular, we overcome the <a href="https://marcelotto.medium.com/the-walled-gardens-within-elixir-d0507a568015">“Walled Gardens within Elixir”</a>, which potentially allows store adapters to be developed and versioned as separate Hex packages if their complexity should increase over time, which might happen quickly when triple store-specific extensions are implemented in a store adapter.)</p>

<p>This should now also make clear why the statically added basic structure does not define a link to the <code class="language-plaintext highlighter-rouge">og:Store</code>: It is the user’s decision which triple store their Ontogen service instance should run with. The user makes this decision by specifying an instance of the corresponding <code class="language-plaintext highlighter-rouge">og:Store</code> subclass in the service configuration. Consequently, the configuration files generated during the initialization of a repository look like this:</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># all triple store adapters have URIs in a dedicated namespace</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">oga:</span><span class="w"> </span><span class="nl">&lt;https://w3id.org/ontogen/store/adapter/&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="p">[</span><span class="w"> </span><span class="nn">:</span><span class="n">this</span><span class="w"> </span><span class="nn">og:</span><span class="n">Service

</span><span class="w">    </span><span class="c1">#################################################</span><span class="w">
    </span><span class="c1"># store selection</span><span class="w">

    </span><span class="c1"># Select the adapter of your choice and feel free to remove the unused</span><span class="w">
    </span><span class="c1"># (incl. their respective config files)</span><span class="w">
    </span><span class="p">;</span><span class="w"> </span><span class="nn">og:</span><span class="n">serviceStore</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="nn">:</span><span class="n">this</span><span class="w"> </span><span class="nn">og:</span><span class="n">Store</span><span class="w"> </span><span class="p">]</span><span class="w">
    </span><span class="c1"># ; og:serviceStore [ :this oga:Oxigraph ]</span><span class="w">
    </span><span class="c1"># ; og:serviceStore [ :this oga:Fuseki ]</span><span class="w">

    </span><span class="c1">#################################################</span><span class="w">
    </span><span class="c1"># metadata</span><span class="w">

    </span><span class="p">;</span><span class="w"> </span><span class="nn">dcterms:</span><span class="n">title</span><span class="w"> </span><span class="s">"Your service name"</span><span class="w">
    </span><span class="p">;</span><span class="w"> </span><span class="nn">dcterms:</span><span class="n">creator</span><span class="w"> </span><span class="nn">:</span><span class="n">I
</span><span class="w">    </span><span class="p">;</span><span class="w"> </span><span class="nn">dcterms:</span><span class="n">publisher</span><span class="w"> </span><span class="nn">:</span><span class="n">I

</span><span class="p">]</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>When initializing a repository, configuration files are generated for all available adapters. However, only the adapter specified here in the <code class="language-plaintext highlighter-rouge">og:Service</code> is actually used during operation.</p>

<p>If, as in the above case, no instance of a subclass is used but directly of <code class="language-plaintext highlighter-rouge">og:Store</code>, the <code class="language-plaintext highlighter-rouge">og:Store</code> configuration of the instance from <code class="language-plaintext highlighter-rouge">.ontogen/config/store.bog.ttl</code> is used, which implements a generic adapter. In this case, the URLs of the various endpoints of a triple store must be specified:</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w"> </span><span class="nn">:</span><span class="n">this</span><span class="w"> </span><span class="nn">og:</span><span class="n">Store
</span><span class="w">    </span><span class="p">;</span><span class="w"> </span><span class="nn">og:</span><span class="n">storeQueryEndpoint</span><span class="w"> </span><span class="nl">&lt;http://localhost:7878/query&gt;</span><span class="w">
    </span><span class="p">;</span><span class="w"> </span><span class="nn">og:</span><span class="n">storeUpdateEndpoint</span><span class="w"> </span><span class="nl">&lt;http://localhost:7878/update&gt;</span><span class="w">
    </span><span class="p">;</span><span class="w"> </span><span class="nn">og:</span><span class="n">storeGraphStoreEndpoint</span><span class="w"> </span><span class="nl">&lt;http://localhost:7878/store&gt;</span><span class="w">
</span><span class="p">]</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>In the configurations of the adapters, on the other hand, where the logic for generating the various URLs of the endpoints is implemented, only the corresponding components of these URLs need to be specified. Since the default values of these components for a standard installation of the corresponding triple store are also known in the adapter and are assumed if not present, an adapter configuration could theoretically be empty and still functional:</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w"> </span><span class="nn">:</span><span class="n">this</span><span class="w"> </span><span class="nn">oga:</span><span class="n">Oxigraph</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>However, during initialization, complete configurations with all properties for the components (using the defaults) are generated to make them explicit and easily adjustable if necessary:</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w"> </span><span class="nn">:</span><span class="n">this</span><span class="w"> </span><span class="nn">oga:</span><span class="n">Oxigraph
</span><span class="w">    </span><span class="p">;</span><span class="w"> </span><span class="nn">og:</span><span class="n">storeEndpointScheme</span><span class="w"> </span><span class="s">"http"</span><span class="w">
    </span><span class="p">;</span><span class="w"> </span><span class="nn">og:</span><span class="n">storeEndpointHost</span><span class="w"> </span><span class="s">"localhost"</span><span class="w">
    </span><span class="p">;</span><span class="w"> </span><span class="nn">og:</span><span class="n">storeEndpointPort</span><span class="w"> </span><span class="mi">7878</span><span class="w">
</span><span class="p">]</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>In some adapters, there are also components for which no sensible default exists and which therefore must be configured with a suitable value, such as in the case of Fuseki, the name of the dataset in which the repository should be stored (and which was previously created by the user in Fuseki):</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w"> </span><span class="nn">:</span><span class="n">this</span><span class="w"> </span><span class="nn">oga:</span><span class="n">Fuseki
</span><span class="w">    </span><span class="p">;</span><span class="w"> </span><span class="nn">og:</span><span class="n">storeEndpointScheme</span><span class="w"> </span><span class="s">"http"</span><span class="w">
    </span><span class="p">;</span><span class="w"> </span><span class="nn">og:</span><span class="n">storeEndpointHost</span><span class="w"> </span><span class="s">"localhost"</span><span class="w">
    </span><span class="p">;</span><span class="w"> </span><span class="nn">og:</span><span class="n">storeEndpointPort</span><span class="w"> </span><span class="mi">3030</span><span class="w">
    </span><span class="p">;</span><span class="w"> </span><span class="nn">og:</span><span class="n">storeEndpointDataset</span><span class="w"> </span><span class="s">"name-of-the-dataset"</span><span class="w">
</span><span class="p">]</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>At this point, operation via an adapter does not yet differ from that with an analogous configuration of the generic <code class="language-plaintext highlighter-rouge">og:Store</code>. However, the implementation of the adapters is structured in such a way that the HTTP requests can be flexibly wrapped and very easily provided with additional logic if needed, for example, to support triple store-specific optimizations or extensions.</p>

<p>In the future, further generic configuration options should be specifiable in the generic <code class="language-plaintext highlighter-rouge">og:Store</code>, for example, for the different SPARQL HTTP request forms, so that further adaptation and optimization options exist for triple stores without their own adapter implementation in Ontogen.</p>

<p>Furthermore, adapters for other popular triple stores will of course be offered in the future.</p>

<h2 id="conclusion">Conclusion</h2>

<p>With this exploration of Ontogen’s configuration system and the introduction to Bog, the introductory series on Ontogen comes to an end. The aim of these articles has been to present the ideas behind Ontogen, from its fundamental concepts to its practical implementation and configuration.</p>

<p>Ontogen aims to address some of the challenges in managing and versioning RDF datasets. While it’s still in its early stages, it offers a new approach that combines established semantic web standards with novel ideas in versioning and configuration.</p>

<p>As an open-source project, Ontogen’s future development will greatly benefit from community feedback and contributions. The project can be found on <a href="https://github.com/ontogen/ontogen">GitHub</a>, where stars, issues, and pull requests are always appreciated.</p>

<p>For those interested in following the project’s progress or reading future articles, updates are available via <a href="https://ontogen.io/feed.xml">RSS</a>, <a href="https://mastodon.social/@marcelotto">Mastodon</a>, or <a href="https://www.linkedin.com/in/marcel-otto-a68245b8/">LinkedIn</a>.</p>

<p>Thank you for your interest in Ontogen. I look forward to seeing how it might be used and improved in the future.</p>]]></content><author><name>Marcel Otto</name></author><category term="introduction" /><category term="blog" /><summary type="html"><![CDATA[This is the fourth and last of a series of four blog posts introducing the different parts of the Ontogen version control system and the ideas behind it:]]></summary></entry><entry><title type="html">Ontogen’s Repository and Service Model</title><link href="https://ontogen.io/introduction/part-3" rel="alternate" type="text/html" title="Ontogen’s Repository and Service Model" /><published>2024-08-12T07:00:00+00:00</published><updated>2024-08-12T07:00:00+00:00</updated><id>https://ontogen.io/introduction/ontogens-repository-and-service-model</id><content type="html" xml:base="https://ontogen.io/introduction/part-3"><![CDATA[<p>This is the third in a series of four blog posts introducing the different parts of the Ontogen version control system and the ideas behind it:</p>

<ol>
  <li><a href="/introduction/part-1">Introducing Ontogen</a></li>
  <li><a href="/introduction/part-2">Ontogen’s Versioning Model</a></li>
  <li><strong><a href="/introduction/part-3">Ontogen’s Repository and Service Model</a></strong></li>
  <li><a href="/introduction/part-4">Ontogen Configuration with Bog</a></li>
</ol>

<hr />

<p>After examining the PROV-based <code class="language-plaintext highlighter-rouge">og:SpeechAct</code> and <code class="language-plaintext highlighter-rouge">og:Commit</code> model in the <a href="/introduction/part-2">previous article</a>, we will now focus on the structure of an Ontogen repository.</p>

<h2 id="isolated-history-graph">Isolated history graph</h2>

<p>As we saw in the last article, Ontogen’s version history is based on so-called <code class="language-plaintext highlighter-rouge">og:Proposition</code>s, which are implemented as RDF Triple Compounds. These propositions form the foundation for the Ontogen speech acts and commits, which represent the actual versioning information. A particular challenge in implementing a version control system for RDF is the question of how to store this versioning information in relation to the actual data.</p>

<p>A distinctive feature of storing versioning information in Ontogen is the strict separation between the actual data and the versioning artifacts. Similar to file-based version control systems like Git, where the version history is encapsulated in a hidden <code class="language-plaintext highlighter-rouge">.git</code> directory, Ontogen stores all versioning information in a separate graph, the so-called <code class="language-plaintext highlighter-rouge">og:History</code> graph.</p>

<p>This history graph stores the proposition compounds with the RDF-star statements and the assertions of all higher-level resources such as speech acts and commits. This approach ensures that the actual data of the RDF dataset remains completely free of version control artifacts.</p>

<p>The implementation is achieved through the use of RDF-star and the RTC vocabulary. We use the inverse <code class="language-plaintext highlighter-rouge">rtc:elements</code> property of the <code class="language-plaintext highlighter-rouge">rtc:elementOf</code> property and store the RDF-star assertions as “unasserted” in the history graph. This means that the actual assertions in the history graph are not restated, but only annotated.</p>

<p>Another advantage of this approach is that we save at least a little storage space: the statements are asserted only once in the graph with the version-controlled user data, while they appear in the history graph merely as quoted triples. This prevents additional duplication that would occur with “asserted” RDF-star annotations, where the statements would need to be stored again as regular RDF triples in the history graph. (Due to the identity properties of propositions, where identical statements coincide, at least some propositions can generally be saved in individual cases when identical statements are repeated in speech acts.)</p>

<p>It should be noted that the current version of Ontogen only supports versioning of individual RDF graphs. Managing changes across different graphs of an RDF dataset is planned for future versions.</p>

<h2 id="ontogen-repositories-as-dcat-catalogs">Ontogen repositories as DCAT catalogs</h2>

<p>Regardless of the versioning issue, a general question arises: how do we actually manage the datasets that are versioned but free from the versioning history? Of course, we don’t want to impose any obligations on the user here. Rather, it’s about how exactly we can describe the graphs of our RDF dataset and make their metadata accessible to consumers of our dataset. Moreover, if our dataset consists of a larger number of graphs (when we support this in the future), this can easily become unwieldy and makes further structuring, if not necessary, at least advisable.</p>

<p>According to the standard, RDF datasets are just a default graph and a set of named graphs, and beyond that, they make no further recommendations. There is the <a href="https://www.w3.org/TR/sparql12-service-description/">SPARQL Service Description Vocabulary</a> that reflects the graph structure of a SPARQL store, but this is primarily designed for technical details of a SPARQL endpoint and offers little room for rich semantic descriptions of the data itself and nested or hierarchical relationships between datasets or graphs.</p>

<p>In fact, there is a general W3C standard for this problem, the <a href="https://www.w3.org/TR/vocab-dcat-3/">Data Catalog (DCAT) Vocabulary</a>, which is suitable for our purposes in many ways:</p>

<ol>
  <li>Flexibility and extensibility: DCAT provides a basic vocabulary that can be easily adapted and extended to specific needs.</li>
  <li>Hierarchical structuring: DCAT allows the modeling of nested catalog structures, which is ideal for organizing complex RDF datasets.</li>
  <li>Comprehensive metadata: DCAT offers a rich set of properties for describing datasets, including license information, access rights, and temporal aspects, some of which can even be provided automatically in the context of Ontogen, as these can be derived from the speech act and commit history (authors, creation period, data sources, etc.)</li>
  <li>Support for versioning and integration with PROV: The soon-to-be-completed version 3 of DCAT introduces a comprehensive <a href="https://www.w3.org/TR/vocab-dcat-3/#dataset-versions">versioning concept</a> that is particularly relevant for Ontogen’s use case. This extension also seamlessly integrates the PROV vocabulary, which forms the basis for Ontogen’s RDF speech act and commit history. This integration enables the direct derivation and modeling of provenance information and version metadata from the version history. For example, authors, creation dates, and other relevant metadata for specific revisions of a dataset can be automatically generated and presented in a standardized form. This close intertwining of versioning and provenance tracking makes DCAT 3 an ideal vocabulary for metadata description in Ontogen, as it can precisely map the complex temporal and authorial aspects of versioned RDF datasets.</li>
  <li>Standardization and interoperability: As a W3C standard, DCAT enjoys wide acceptance and support in the data management community. This promotes Ontogen’s compatibility with other systems and tools in the field of data management. For example, DCAT has gained great importance in the European Union, where it serves as the basis for DCAT-AP (DCAT Application Profile for data portals in Europe) and is supported in prominent data catalog platforms. The use of DCAT in Ontogen thus allows seamless integration into existing data ecosystems and facilitates the exchange of metadata with a variety of platforms and services.</li>
</ol>

<p>These properties make DCAT an ideal basis for modeling and describing Ontogen repositories, as it covers both the technical and organizational aspects of RDF data management.</p>

<p>So how do we organize our Ontogen repository with DCAT?</p>

<p>First of all, it should be noted that a dataset in the sense of DCAT is a more general, much broader class than an RDF dataset:</p>

<blockquote>
  <p><em>“<code class="language-plaintext highlighter-rouge">dcat:Dataset</code> := A collection of data, published or curated by a single agent, and available for access or download in one or more representations.</em></p>

  <p>– <a href="https://www.w3.org/TR/vocab-dcat-3/#Class:Dataset">https://www.w3.org/TR/vocab-dcat-3/#Class:Dataset</a></p>
</blockquote>

<p>In Ontogen, however, we are dealing with RDF datasets as the subject of versioning, so with a specific subclass. Therefore, we define an <code class="language-plaintext highlighter-rouge">og:Dataset</code> as a subclass of <code class="language-plaintext highlighter-rouge">dcat:Dataset</code>. In fact, we can define it even more specifically as a subclass of <code class="language-plaintext highlighter-rouge">dcat:Catalog</code>, because according to the above broader definition, one can also consider a single graph of an RDF dataset as a <code class="language-plaintext highlighter-rouge">dcat:Dataset</code> and thus define it as a collection of these.</p>

<p>While <code class="language-plaintext highlighter-rouge">og:Dataset</code>s are now DCAT catalogs of the pure user-defined graphs of an RDF dataset versioned with Ontogen, we define an <code class="language-plaintext highlighter-rouge">og:Repository</code> as a DCAT catalog around such an <code class="language-plaintext highlighter-rouge">og:Dataset</code>, supplementing it with two additional entries, so that an <code class="language-plaintext highlighter-rouge">og:Repository</code> as a DCAT catalog consists of exactly two explicit DCAT dataset entries and one implicit graph:</p>

<pre><code class="language-mermaid">graph TD
    A[og:Repository] --&gt;|og:repositoryDataset| B(og:Dataset)
    A --&gt;|og:repositoryHistory| C(og:History)
    A --&gt;|implicit dcat:dataset| D(Repository Metadata Graph)
    
    B --&gt;|dcat:dataset| E[Graph 1]
    B --&gt;|dcat:dataset| F[Graph 2]
    B --&gt;|dcat:dataset| G[Graph n]
    
    C --&gt;|contains| H[SpeechActs]
    C --&gt;|contains| I[Commits]
    C --&gt;|contains| J[PROV Entities]
    C --&gt;|contains| K[PROV Agents]
    
    D --&gt;|describes| A
    D --&gt;|describes| B
    D --&gt;|describes| C
    
    style A fill:#d1c2f0,stroke:#333,stroke-width:4px
    style B fill:#f0e6ff,stroke:#333,stroke-width:2px
    style C fill:#f0e6ff,stroke:#333,stroke-width:2px
    style D fill:#f0e6ff,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
	
</code></pre>

<ol>
  <li>One entry for the <code class="language-plaintext highlighter-rouge">og:Dataset</code> DCAT catalog with the pure user-defined graphs (which is defined using the <code class="language-plaintext highlighter-rouge">og:repositoryDataset</code> property, a sub-property of <code class="language-plaintext highlighter-rouge">dcat:dataset</code>).</li>
  <li>One entry for the <code class="language-plaintext highlighter-rouge">og:History</code> graph with the provenance history of the speech acts and commits as PROV activities, including the linked PROV entities and PROV agents (which is defined using the <code class="language-plaintext highlighter-rouge">og:repositoryHistory</code> property, a sub-property of <code class="language-plaintext highlighter-rouge">dcat:dataset</code>).</li>
  <li>A repository graph that contains the DCAT metadata description of the <code class="language-plaintext highlighter-rouge">og:Repository</code> itself, including the DCAT metadata description of the <code class="language-plaintext highlighter-rouge">og:Dataset</code> catalog. This graph poses a particular challenge as it is both part of the repository and its description. This self-referencing leads to a conceptual ambiguity: on the one hand, the graph is a <code class="language-plaintext highlighter-rouge">dcat:Dataset</code>, on the other hand, it contains the description of the entire repository including itself. In the current DCAT specification, there is no clear solution for this problem of self-description as an explicit part of a <code class="language-plaintext highlighter-rouge">dcat:Catalog</code>. Therefore, in the current version of Ontogen, this graph is treated implicitly as part of the definition of an <code class="language-plaintext highlighter-rouge">og:Repository</code>, without an explicit <code class="language-plaintext highlighter-rouge">dcat:dataset</code> entry for it in the catalog. This solution is pragmatic, but ultimately not really satisfactory. Better suggestions for solving this problem would be very welcome.</li>
</ol>

<h2 id="ontogen-instances-as-dcat-services">Ontogen instances as DCAT services</h2>

<p>Ontogen follows the DCAT model beyond the catalog structure in the implementation of an Ontogen instance. A locally running instance is implemented as a <code class="language-plaintext highlighter-rouge">dcat:DataService</code>.</p>

<blockquote>
  <p><em><code class="language-plaintext highlighter-rouge">dcat:DataService</code> := “A collection of operations that provides access to one or more datasets or data processing functions.”</em></p>

  <p>– <a href="https://www.w3.org/TR/vocab-dcat-3/#Class:Data_Service">https://www.w3.org/TR/vocab-dcat-3/#Class:Data_Service</a></p>
</blockquote>

<p>An <code class="language-plaintext highlighter-rouge">og:Service</code>, which is defined as a subclass of <code class="language-plaintext highlighter-rouge">dcat:DataService</code>, is a resource that structurally consists of two elements:</p>

<pre><code class="language-mermaid">graph TD
    A[og:Service] --&gt;|og:serviceRepository| B(og:Repository)
    A --&gt;|og:serviceStore| C(og:Store)
    
    B --&gt;|og:repositoryDataset| D[og:Dataset]
    B --&gt;|og:repositoryHistory| E[og:History]
    B --&gt;|implicit| F[Repository Metadata Graph]
    
    C --&gt;|rdf:type| G[Specific Triple Store Implementation]
    
    style A fill:#ccd1e0,stroke:#333,stroke-width:4px
    style B fill:#d1c2f0,stroke:#333,stroke-width:2px
    style C fill:#c2f0d1,stroke:#333,stroke-width:2px
</code></pre>

<ol>
  <li>The <code class="language-plaintext highlighter-rouge">og:Repository</code> linked via the <code class="language-plaintext highlighter-rouge">dcat:servesDataset</code> sub-property <code class="language-plaintext highlighter-rouge">og:serviceRepository</code></li>
  <li>An <code class="language-plaintext highlighter-rouge">og:Store</code> linked via the property <code class="language-plaintext highlighter-rouge">og:serviceStore</code>, which represents the locally running SPARQL triple store in which the repository is stored</li>
</ol>

<p>While the same Ontogen repository can exist on different computers, the various Ontogen instances on these computers operate as different Ontogen services with different stores but the same repository.</p>

<p>The main module <code class="language-plaintext highlighter-rouge">Ontogen</code> is a <a href="https://hexdocs.pm/elixir/GenServer.html">GenServer</a> over such an <code class="language-plaintext highlighter-rouge">og:Service</code> as state, which executes the Ontogen operations on the repository specified therein, in the triple store specified therein.</p>

<p>How exactly such an <code class="language-plaintext highlighter-rouge">og:Service</code> looks and is configured, especially its <code class="language-plaintext highlighter-rouge">og:Store</code> using triple store vendor-specific subclasses, will be the subject of the following and for the time <a href="/introduction/part-4">last article</a> in this series. This configuration is done in a special language specifically created for Ontogen, which needs to be introduced first.</p>

<h2 id="future-developments">Future developments</h2>

<p>In upcoming versions of Ontogen, it is planned to expand the use of the DCAT integration. The interpretation of <code class="language-plaintext highlighter-rouge">og:Dataset</code> as <code class="language-plaintext highlighter-rouge">dcat:Catalog</code> is intended to form the basis for supporting versioning of datasets with multiple graphs, where DCAT’s capabilities for structuring complex datasets should be utilized.</p>

<p>The implementation of automatic generation of DCAT metadata from the PROV history is also envisioned, which will require developing a concept for dataset revisions in Ontogen based on the versioning concepts in DCAT 3.</p>

<p>These developments aim to enhance Ontogen’s compatibility with DCAT standards and provide more comprehensive dataset management features.</p>]]></content><author><name>Marcel Otto</name></author><category term="introduction" /><category term="blog" /><summary type="html"><![CDATA[This is the third in a series of four blog posts introducing the different parts of the Ontogen version control system and the ideas behind it:]]></summary></entry><entry><title type="html">Ontogen’s Versioning Model</title><link href="https://ontogen.io/introduction/part-2" rel="alternate" type="text/html" title="Ontogen’s Versioning Model" /><published>2024-08-08T09:00:00+00:00</published><updated>2024-08-08T09:00:00+00:00</updated><id>https://ontogen.io/introduction/ontogens-versioning-model</id><content type="html" xml:base="https://ontogen.io/introduction/part-2"><![CDATA[<p>This is the second in a series of four blog posts introducing the different parts of the Ontogen version control system and the ideas behind it:</p>

<ol>
  <li><a href="/introduction/part-1">Introducing Ontogen</a></li>
  <li><strong><a href="/introduction/part-2">Ontogen’s Versioning Model</a></strong></li>
  <li><a href="/introduction/part-3">Ontogen’s Repository and Service Model</a></li>
  <li><a href="/introduction/part-4">Ontogen Configuration with Bog</a></li>
</ol>

<hr />

<p>In the <a href="/introduction/part-1">previous article</a> of our introduction series to Ontogen, we introduced RDF triple compounds (RTC). These triple compounds now serve as the foundation for Ontogen as a Data Control Management (DCM) system for RDF datasets. So, how exactly do the triple compounds used in Ontogen for versioning RDF datasets look?</p>

<p>Ultimately, our goal is to annotate sets of statements with metadata to automatically organize them in a version history. However, the challenge with triple compounds is that the changes we want to make and commit atomically are not necessarily a simple set with a single semantic meaning. Instead, they are sets with different change semantics that can potentially occur simultaneously in any combination: sets of statements to be added to a dataset, updated, or deleted.</p>

<p>Let’s consider the update of personal data after a marriage involving a name change. This complex but atomically related process requires different types of changes that should be encapsulated in a single, atomic entity. Imagine Sarah Miller gets married and takes her partner’s last name. Her new name is now Sarah Johnson. This scenario could involve various sets of statements with different change semantics. While we want to update the family name and marital status, we might simply want to add a new email address or wedding date in this context without overwriting old values.</p>

<p>Therefore, we need a potentially complex entity that encompasses these sets of changes.</p>

<h2 id="rdf-speechacts">RDF SpeechActs</h2>

<p>In the Ontogen vocabulary (prefixed in the following with <code class="language-plaintext highlighter-rouge">og:</code>), these entities are called <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>s. We draw on the concept of speech acts from philosophy, which might seem unusual in this context at first glance. However, upon closer examination and when limited to RDF statements as the subject of speech acts, it perfectly models the situation in our versioning problem for RDF datasets.</p>

<p>A <a href="https://en.wikipedia.org/wiki/Speech_act">speech act</a>, a term coined by J.L. Austin and further developed by John Searle, refers in linguistics and philosophy of language to an utterance that not only conveys information but also performs an action. Central to speech act theory is also the consideration of the context in which an utterance takes place. By including context, speech act theory extends the analysis of utterances beyond the purely semantic level to the pragmatic dimension.</p>

<p>To bridge the gap between the general concept of speech acts and their specific application in Ontogen, it’s important to understand how we’ve adapted this philosophical idea to our technical context. An <code class="language-plaintext highlighter-rouge">og:SpeechAct</code> represents a very specific form of speech act: the utterance of RDF statements or modification. This adaptation allows us to capture not just the content of RDF data, but also the act of asserting or changing that data, along with all the contextual information that surrounds that act.</p>

<p>In the context of Ontogen and RDF data, we apply this concept by treating each utterance of RDF statements as an <code class="language-plaintext highlighter-rouge">og:SpeechAct</code> - an action that does not represent the actual addition or modification of the dataset (we will continue to call this action a <em>commit</em>, following the usual versioning terminology). Instead, it represents the act of the original utterance of the statements in this dataset or subsequent acts that supplement, revise, confirm, etc. the original statements. This is because the central questions of provenance, i.e., the origin of the data, revolve around these acts. Some examples:</p>

<ul>
  <li>Who or what uttered, derived, or generated the data and when?</li>
  <li>In what context was the data collected or modified?</li>
  <li>With what intention or for what purpose was the data created or modified?</li>
  <li>How was the data validated or verified?</li>
</ul>

<p>With <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>s, our aim is to provide a model that allows us to capture information related to all possible questions about these utterances and record them as metadata.</p>

<p>Fortunately, we don’t need to develop a new ontology from scratch to model these metadata. There is already an excellent and standardized basis: the <a href="https://www.w3.org/TR/prov-overview/">W3C PROV vocabulary</a>. The W3C defines Provenance as follows:</p>

<blockquote>
  <p><em>“Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. The PROV Family of Documents defines a model, corresponding serializations and other supporting definitions to enable the inter-operable interchange of provenance information in heterogeneous environments […].</em></p>

  <p>– <a href="https://www.w3.org/TR/prov-overview/">https://www.w3.org/TR/prov-overview/</a></p>
</blockquote>

<p>This definition fits perfectly with our approach of using speech acts as a basis for capturing provenance information in RDF datasets. In Ontogen, we therefore define our <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>s as a special manifestation, i.e., a subclass of <code class="language-plaintext highlighter-rouge">prov:Activity</code>.</p>

<blockquote>
  <p><em>“An activity is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities.”</em></p>

  <p>– <a href="https://www.w3.org/TR/prov-o/#Activity">https://www.w3.org/TR/prov-o/#Activity</a></p>
</blockquote>

<p>This allows us to leverage the extensive semantics of the PROV vocabulary while modelling our specific concept of speech acts for RDF data.</p>

<h2 id="propositions">Propositions</h2>

<p>What characterizes <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>s as special <code class="language-plaintext highlighter-rouge">prov:Activity</code>s, i.e., how exactly are they defined? To answer this, we return to the issue mentioned at the beginning: how can we express potentially complex sets of statements with different “change semantics” or, as we can now more precisely say, with different pragmatics using triple compounds? The recently introduced <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>s now take on the role of the carrier resource with which the triple compounds are associated. An <code class="language-plaintext highlighter-rouge">og:SpeechAct</code> is thus a <code class="language-plaintext highlighter-rouge">prov:Activity</code> that can be associated with triple compounds with different pragmatics via four properties, the so-called <em>action properties</em> (which are all defined as sub-properties of <code class="language-plaintext highlighter-rouge">prov:used</code>).</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">og:add</code>: A triple compound, i.e., a set of statements that is simply asserted without any further intentions. When persisted to a dataset in a triple store, these statements should be added.</li>
  <li><code class="language-plaintext highlighter-rouge">og:update</code>: A triple compound, i.e., a set of statements that is asserted with the intention to overwrite all previous statements with the same subject and predicate. When persisted to a dataset in a triple store, these statements should be added and the existing statements with the same subject and predicate should be overwritten.</li>
  <li><code class="language-plaintext highlighter-rouge">og:replace</code>: A triple compound, i.e., a set of statements that is asserted with the intention to overwrite all previous statements with the same subject. When persisted to a dataset in a triple store, these statements should be added and the existing statements with the same subject should be overwritten.</li>
  <li><code class="language-plaintext highlighter-rouge">og:remove</code>: A triple compound, i.e., a set of statements that should be retracted (“negated”). When persisted to a dataset in a triple store, it should remove these statements.</li>
</ol>

<p>Let’s first look at how our initial example of a name change would look as an <code class="language-plaintext highlighter-rouge">og:SpeechAct</code> with normal triple compounds, if we initially use simple blank nodes as identifiers for the <code class="language-plaintext highlighter-rouge">og:SpeechAct</code> and their triple compounds:</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">@prefix</span><span class="w"> </span><span class="nn">:</span><span class="w"> </span><span class="nl">&lt;http://example.com/&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">og:</span><span class="w"> </span><span class="nl">&lt;https://w3id.org/ontogen#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">rtc:</span><span class="w"> </span><span class="nl">&lt;https://w3id.org/rtc#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">xsd:</span><span class="w"> </span><span class="nl">&lt;http://www.w3.org/2001/XMLSchema#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nn">_:</span><span class="n">SarahMillerMarriageUpdate
</span><span class="w">    </span><span class="k">a</span><span class="w"> </span><span class="nn">og:</span><span class="n">SpeechAct</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">og:</span><span class="n">update</span><span class="w"> </span><span class="p">[</span><span class="w">
        </span><span class="k">a</span><span class="w"> </span><span class="nn">og:</span><span class="n">Proposition</span><span class="w"> </span><span class="p">;</span><span class="w">
        </span><span class="nn">rtc:</span><span class="n">elements</span><span class="w"> 
            </span><span class="nl">&lt;&lt; :employee39 :familyName "Johnson" &gt;</span><span class="err">&gt;</span><span class="p">,</span><span class="w">
            </span><span class="nl">&lt;&lt; :employee39 :maritalStatus :Married &gt;</span><span class="err">&gt;</span><span class="w">
    </span><span class="p">]</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">og:</span><span class="n">add</span><span class="w"> </span><span class="p">[</span><span class="w">
        </span><span class="k">a</span><span class="w"> </span><span class="nn">og:</span><span class="n">Proposition</span><span class="w"> </span><span class="p">;</span><span class="w">
        </span><span class="nn">rtc:</span><span class="n">elements</span><span class="w"> 
            </span><span class="nl">&lt;&lt; :employee39 :emailAddress "sarah.johnson@example.com" &gt;</span><span class="err">&gt;</span><span class="w"> </span><span class="p">,</span><span class="w">
            </span><span class="nl">&lt;&lt; :employee39 :marriageDate "2023-07-25"^^xsd:date &gt;</span><span class="err">&gt;</span><span class="w">
    </span><span class="p">]</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>Regarding these action properties, it should be noted that it is the properties that define the pragmatics of the triple compound. The <code class="language-plaintext highlighter-rouge">rdfs:range</code> of these properties is always a triple compound set of the same type.</p>

<p>However, the action properties do not associate the <code class="language-plaintext highlighter-rouge">og:SpeechAct</code> with triple compounds in general, but with <code class="language-plaintext highlighter-rouge">og:Proposition</code>s, a subclass of <code class="language-plaintext highlighter-rouge">rtc:Compound</code>, i.e., a special kind of triple compound that should exhibit some particular properties in this versioning context. We achieve this by using URI-encoded SHA256 hashes of the statement set for the URIs of the <code class="language-plaintext highlighter-rouge">og:Proposition</code>s.</p>

<p>This approach allows us to achieve some important properties for our use case. On the one hand, we have a method to automatically generate the URIs of the <code class="language-plaintext highlighter-rouge">og:Proposition</code>s. On the other hand, the authenticity of the statement set of the <code class="language-plaintext highlighter-rouge">og:Proposition</code> is verifiable, as we can detect whether the statement set is unchanged. Each <code class="language-plaintext highlighter-rouge">og:Proposition</code> can be verified by recalculating the hash of its canonicalized triple sets. This ensures that the integrity of the data is guaranteed throughout the entire version history. Changes or manipulations to the data would inevitably lead to a change in the hash and thus an inconsistent URI, which would be easily detectable.</p>

<p>Furthermore, the <code class="language-plaintext highlighter-rouge">og:Proposition</code>s exhibit an interesting identity property: propositions with the same set of statements receive the same URI. This means that if the <code class="language-plaintext highlighter-rouge">og:Proposition</code> statement set appears in different <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>s, for example, because it was <code class="language-plaintext highlighter-rouge">og:add</code>ed once and <code class="language-plaintext highlighter-rouge">og:remove</code>d once, the same statement set should not be duplicated a third, fourth, or fifth time, etc., in the <code class="language-plaintext highlighter-rouge">rtc:elements</code> of different <code class="language-plaintext highlighter-rouge">og:Proposition</code> compounds, but should reference the same <code class="language-plaintext highlighter-rouge">og:Proposition</code>.</p>

<p>Thus, <code class="language-plaintext highlighter-rouge">og:Proposition</code>s represent immutable, abstract sets of statements that can be used in different, independent contexts and always have the same URI. However, this is only true to a limited extent without further measures, as there is still one problem: if two <code class="language-plaintext highlighter-rouge">og:Proposition</code> statement sets contain blank nodes and differ only in the local names used for the blank nodes, but are otherwise isomorphic, they are actually the same abstract statement set, which should therefore lead to the same URI. This can be achieved by bringing the RDF dataset into a canonical form before hashing, using the W3C-standardized <a href="https://www.w3.org/TR/rdf-canon/">RDF Dataset Canonicalization Algorithm</a>, in which isomorphic graphs with different blank nodes receive the same blank node identifiers.</p>

<p>So, <code class="language-plaintext highlighter-rouge">og:Proposition</code>s are an abstraction over concrete statement sets: the same statement set in different triple stores, with potentially different blank nodes, are all identified by the same URI of a resource. We identify the resulting equivalence set with a unique hash, so to speak. These <code class="language-plaintext highlighter-rouge">og:Proposition</code> compounds are thus, like the propositions of logic, abstract entities that are not bound to any utterance. Only through the utterance within an <code class="language-plaintext highlighter-rouge">og:SpeechAct</code> do they become time- and context-bound. As abstract entities, unlike RTC compounds in general, the <code class="language-plaintext highlighter-rouge">og:Proposition</code> compounds usually do not contain metadata, as the interesting metadata is utterance-related and therefore belongs to the <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>.</p>

<p>However, we are still missing the URI of the <code class="language-plaintext highlighter-rouge">og:SpeechAct</code> itself. In Ontogen, this is</p>

<ul>
  <li>the URI-encoded SHA256 hash of all <code class="language-plaintext highlighter-rouge">og:Proposition</code>s linked through its action properties (i.e., their SHA256 URIs),</li>
  <li>the <code class="language-plaintext highlighter-rouge">prov:endedAtTime</code> timestamp,</li>
  <li>and the <code class="language-plaintext highlighter-rouge">og:speaker</code> (a subproperty of <code class="language-plaintext highlighter-rouge">prov:wasAssociatedWith</code>) of the <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>, or the <code class="language-plaintext highlighter-rouge">og:dataSource</code> (a subproperty of <code class="language-plaintext highlighter-rouge">prov:used</code>) if the speaker is unknown.</li>
</ul>

<p>Now we can provide the actual Ontogen form of our above example <code class="language-plaintext highlighter-rouge">og:SpeechAct</code> of a name change by adding the previously missing SHA256 URIs of the <code class="language-plaintext highlighter-rouge">og:Proposition</code>s and the <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>:</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">@prefix</span><span class="w"> </span><span class="nn">:</span><span class="w"> </span><span class="nl">&lt;http://example.com/&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">og:</span><span class="w"> </span><span class="nl">&lt;https://w3id.org/ontogen#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">rtc:</span><span class="w"> </span><span class="nl">&lt;https://w3id.org/rtc#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">xsd:</span><span class="w"> </span><span class="nl">&lt;http://www.w3.org/2001/XMLSchema#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">prov:</span><span class="w"> </span><span class="nl">&lt;http://www.w3.org/ns/prov#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">dc:</span><span class="w"> </span><span class="nl">&lt;http://purl.org/dc/terms/&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nl">&lt;urn:hash::sha256:b1f9fb63d4cbfcc48c9da35c0526a1aeb394dce9f6fc10368d5ec3d248a8f070&gt;</span><span class="w">
    </span><span class="k">a</span><span class="w"> </span><span class="nn">og:</span><span class="n">SpeechAct</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">dc:</span><span class="n">description</span><span class="w"> </span><span class="s">"Update of Sarah Miller's personal information following her marriage"</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">og:</span><span class="n">add</span><span class="w"> </span><span class="nl">&lt;urn:hash::sha256:3e9347bcd3f39fba5afc5a4b2f132bb2468cf399a4dfa6e34f52aea85e755a6e&gt;</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">og:</span><span class="n">update</span><span class="w"> </span><span class="nl">&lt;urn:hash::sha256:4d33b032442d54f94052b55643240adf79dbea7ae635c5fb167a124dc3d6444a&gt;</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">og:</span><span class="n">speaker</span><span class="w"> </span><span class="nn">:</span><span class="n">JaneSmith</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">startedAtTime</span><span class="w"> </span><span class="s">"2023-07-26T09:30:00Z"</span><span class="p">^^</span><span class="nn">xsd:</span><span class="n">dateTime</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">endedAtTime</span><span class="w"> </span><span class="s">"2023-07-26T09:31:15Z"</span><span class="p">^^</span><span class="nn">xsd:</span><span class="n">dateTime</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">wasInformedBy</span><span class="w"> </span><span class="nn">:</span><span class="n">MarriageCertificateSubmission</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">used</span><span class="w"> </span><span class="nn">:</span><span class="n">MarriageCertificate20230725</span><span class="w">  </span><span class="p">.</span><span class="w">

</span><span class="nl">&lt;urn:hash::sha256:3e9347bcd3f39fba5afc5a4b2f132bb2468cf399a4dfa6e34f52aea85e755a6e&gt;</span><span class="w">
    </span><span class="nn">rtc:</span><span class="n">elements</span><span class="w"> 
        </span><span class="nl">&lt;&lt; :employee39 :emailAddress "sarah.johnson@example.com" &gt;</span><span class="err">&gt;</span><span class="p">,</span><span class="w"> 
        </span><span class="nl">&lt;&lt; :employee39 :marriageDate "2023-07-25"^^xsd:date &gt;</span><span class="err">&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nl">&lt;urn:hash::sha256:4d33b032442d54f94052b55643240adf79dbea7ae635c5fb167a124dc3d6444a&gt;</span><span class="w">
    </span><span class="nn">rtc:</span><span class="n">elements</span><span class="w"> 
        </span><span class="nl">&lt;&lt; :employee39 :familyName "Johnson" &gt;</span><span class="err">&gt;</span><span class="p">,</span><span class="w"> 
        </span><span class="nl">&lt;&lt; :employee39 :maritalStatus :Married &gt;</span><span class="err">&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nn">:</span><span class="n">JaneSmith</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">prov:</span><span class="n">Person</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">foaf:</span><span class="n">name</span><span class="w"> </span><span class="s">"Jane Smith"</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">foaf:</span><span class="n">mbox</span><span class="w"> </span><span class="nl">&lt;mailto:jane.smith@example.com&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nn">:</span><span class="n">MarriageCertificate20230725</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">prov:</span><span class="n">Entity</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">dc:</span><span class="n">title</span><span class="w"> </span><span class="s">"Marriage Certificate for Sarah Miller and Michael Johnson"</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">generatedAtTime</span><span class="w"> </span><span class="s">"2023-07-25"</span><span class="p">^^</span><span class="nn">xsd:</span><span class="n">date</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nn">:</span><span class="n">MarriageCertificateSubmission</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">prov:</span><span class="n">Activity</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">wasAssociatedWith</span><span class="w"> </span><span class="nn">:</span><span class="n">employee39</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">generated</span><span class="w"> </span><span class="nn">:</span><span class="n">MarriageCertificate2023-07-25</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">endedAtTime</span><span class="w"> </span><span class="s">"2023-07-26T09:00:00Z"</span><span class="p">^^</span><span class="nn">xsd:</span><span class="n">dateTime</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<h2 id="commits">Commits</h2>

<p>In Ontogen, commits represent the actual changes made to a repository, resulting from applying an <code class="language-plaintext highlighter-rouge">og:SpeechAct</code> to this repository with its existing data. Like an <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>, <code class="language-plaintext highlighter-rouge">og:Commit</code>s are a <code class="language-plaintext highlighter-rouge">prov:Activity</code>. However, they represent the act of adding or modifying data in a dataset within a triple store in a specific state, rather than the act of uttering these statements, which may have occurred at a much earlier time and by a different speaker.</p>

<p>Like an <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>, an <code class="language-plaintext highlighter-rouge">og:Commit</code> is a structure composed of <code class="language-plaintext highlighter-rouge">og:Proposition</code>s linked through various action properties. However, since they have a slightly different pragmatics here and a different <code class="language-plaintext highlighter-rouge">rdfs:domain</code>, different properties and an additional one are used for this purpose. The semantics of this set of properties is characterized by encoding repository-relative changes, i.e., expressing the minimal changes relative to the current state of the dataset. This is crucial to ensure that <code class="language-plaintext highlighter-rouge">og:Commit</code>s are revertible, as otherwise ambiguities in the history would arise:</p>

<ul>
  <li>Can we simply remove every statement added by a commit from the triple store during a revert? If we cannot assume minimal changesets, it cannot be automatically decided whether this deletion can be performed. If the statement didn’t already exist, it must be removed to reproduce the old state. If a statement already existed before, it must not be removed.</li>
  <li>The same applies to removals of statements that do not actually exist in the current dataset and therefore should not be restored during a revert.</li>
  <li>Additionally, the specific statements implicitly deleted by <code class="language-plaintext highlighter-rouge">og:update</code>s and <code class="language-plaintext highlighter-rouge">og:replace</code> must be explicitly recorded in an additional proposition so that they can be restored during a revert.</li>
</ul>

<p>To ensure the reversibility of commits, the internally so-called “effective changeset” must be determined. From this, corresponding propositions are then generated and linked to the <code class="language-plaintext highlighter-rouge">og:Commit</code> via the following action properties, along with the <code class="language-plaintext highlighter-rouge">og:SpeechAct</code> that this commit reproduces on the repository:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">og:committedAdd</code>, <code class="language-plaintext highlighter-rouge">og:committedRemove</code>, <code class="language-plaintext highlighter-rouge">og:committedUpdate</code>, <code class="language-plaintext highlighter-rouge">og:committedReplace</code>: These action properties represent the minimal sets of statements as propositions necessary to change the state of the repository to match the corresponding <code class="language-plaintext highlighter-rouge">og:Proposition</code>s of the <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>. They contain only the actually required changes.</li>
  <li><code class="language-plaintext highlighter-rouge">og:committedOverwrite</code>: This property contains a set of statements as a proposition that represents the statements to be implicitly deleted due to updates or replacements.</li>
</ul>

<p>If none of the changes conflict with existing triples (or in the case of <code class="language-plaintext highlighter-rouge">deletion</code>s, with non-existing triples) in the triple store, the <code class="language-plaintext highlighter-rouge">og:Proposition</code>s of the <code class="language-plaintext highlighter-rouge">og:Commit</code> are the same as the <code class="language-plaintext highlighter-rouge">og:Proposition</code>s in the <code class="language-plaintext highlighter-rouge">og:SpeechAct</code> and do not require additional space for dedicated modified <code class="language-plaintext highlighter-rouge">og:Proposition</code>s. Theoretically, we could rely on two simple sets for additions and deletions to determine the statement sets of the effective changes. However, to increase the chance of reusability of <code class="language-plaintext highlighter-rouge">og:Proposition</code>s, we continue to use the same actions for commits, so that only in case of overlap with existing data, a separate, dedicated <code class="language-plaintext highlighter-rouge">og:Proposition</code> for the commit must exist. (To further optimize this case, support for <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>-relative commits is planned for the future, where only <code class="language-plaintext highlighter-rouge">og:Proposition</code>s of the overlap are necessary, which is much more efficient in many cases.)</p>

<p>Beyond this composition of <code class="language-plaintext highlighter-rouge">og:Proposition</code>s, a commit is, of course, like in other version control systems, a sequential, uni-directionally linked list of commits to the respective predecessor commits. The predecessor commit is defined in Ontogen via the <code class="language-plaintext highlighter-rouge">og:parentCommit</code> property.</p>

<p>The automatically generatable identifiers of an <code class="language-plaintext highlighter-rouge">og:Commit</code> are, like those of <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>s and <code class="language-plaintext highlighter-rouge">og:Proposition</code>s, again URI-encoded SHA256 hashes, in this case of:</p>

<ul>
  <li>the hash URI of the parent commit</li>
  <li>the hash URIs of the propositions of the commit</li>
  <li>the URI of the committer</li>
  <li>the timestamp of the commit</li>
  <li>and the commit message</li>
</ul>

<p>Finally, let’s look at an <code class="language-plaintext highlighter-rouge">og:Commit</code> for our above example <code class="language-plaintext highlighter-rouge">og:SpeechAct</code> of the name change. Let’s assume that Sarah was previously married once, but the divorce was not recorded in our data, so the <code class="language-plaintext highlighter-rouge">:maritalStatus</code> is already set to <code class="language-plaintext highlighter-rouge">:Married</code>and the corresponding change effectively does not need to be made.</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">@prefix</span><span class="w"> </span><span class="nn">:</span><span class="w"> </span><span class="nl">&lt;http://example.com/&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">og:</span><span class="w"> </span><span class="nl">&lt;https://w3id.org/ontogen#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">rtc:</span><span class="w"> </span><span class="nl">&lt;https://w3id.org/rtc#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">xsd:</span><span class="w"> </span><span class="nl">&lt;http://www.w3.org/2001/XMLSchema#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">prov:</span><span class="w"> </span><span class="nl">&lt;http://www.w3.org/ns/prov#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">dc:</span><span class="w"> </span><span class="nl">&lt;http://purl.org/dc/terms/&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nl">&lt;urn:hash::sha256:a4f8c480f406140af4259783154811982d2bdd45eb39560c461a3f0601564357&gt;</span><span class="w">
    </span><span class="k">a</span><span class="w"> </span><span class="nn">og:</span><span class="n">Commit</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">og:</span><span class="n">add</span><span class="w"> </span><span class="nl">&lt;urn:hash::sha256:3e9347bcd3f39fba5afc5a4b2f132bb2468cf399a4dfa6e34f52aea85e755a6e&gt;</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">og:</span><span class="n">update</span><span class="w"> </span><span class="nl">&lt;urn:hash::sha256:0ca0d5b6b453830221787836bd8f85083fe571db2b340fbee106a04a7d206720&gt;</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">og:</span><span class="n">committer</span><span class="w"> </span><span class="nn">:</span><span class="n">JohnDoe</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">endedAtTime</span><span class="w"> </span><span class="s">"2023-07-27T09:31:15Z"</span><span class="p">^^</span><span class="nn">xsd:</span><span class="n">dateTime</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">og:</span><span class="n">commitMessage</span><span class="w"> </span><span class="s">"Update of Sarah Miller's personal information following her marriage"</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nl">&lt;urn:hash::sha256:0ca0d5b6b453830221787836bd8f85083fe571db2b340fbee106a04a7d206720&gt;</span><span class="w">
    </span><span class="nn">rtc:</span><span class="n">elements</span><span class="w"> 
        </span><span class="nl">&lt;&lt; :employee39 :familyName "Johnson" &gt;</span><span class="err">&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">


</span><span class="nl">&lt;urn:hash::sha256:b1f9fb63d4cbfcc48c9da35c0526a1aeb394dce9f6fc10368d5ec3d248a8f070&gt;</span><span class="w">
    </span><span class="k">a</span><span class="w"> </span><span class="nn">og:</span><span class="n">SpeechAct</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">dc:</span><span class="n">description</span><span class="w"> </span><span class="s">"Update of Sarah Miller's personal information following her marriage"</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">og:</span><span class="n">add</span><span class="w"> </span><span class="nl">&lt;urn:hash::sha256:3e9347bcd3f39fba5afc5a4b2f132bb2468cf399a4dfa6e34f52aea85e755a6e&gt;</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">og:</span><span class="n">update</span><span class="w"> </span><span class="nl">&lt;urn:hash::sha256:4d33b032442d54f94052b55643240adf79dbea7ae635c5fb167a124dc3d6444a&gt;</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">og:</span><span class="n">speaker</span><span class="w"> </span><span class="nn">:</span><span class="n">JaneSmith</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">startedAtTime</span><span class="w"> </span><span class="s">"2023-07-26T09:30:00Z"</span><span class="p">^^</span><span class="nn">xsd:</span><span class="n">dateTime</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">endedAtTime</span><span class="w"> </span><span class="s">"2023-07-26T09:31:15Z"</span><span class="p">^^</span><span class="nn">xsd:</span><span class="n">dateTime</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">wasInformedBy</span><span class="w"> </span><span class="nn">:</span><span class="n">MarriageCertificateSubmission</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">used</span><span class="w"> </span><span class="nn">:</span><span class="n">MarriageCertificate20230725</span><span class="w">  </span><span class="p">.</span><span class="w">

</span><span class="nl">&lt;urn:hash::sha256:3e9347bcd3f39fba5afc5a4b2f132bb2468cf399a4dfa6e34f52aea85e755a6e&gt;</span><span class="w">
    </span><span class="nn">rtc:</span><span class="n">elements</span><span class="w"> 
        </span><span class="nl">&lt;&lt; :employee39 :emailAddress "sarah.johnson@example.com" &gt;</span><span class="err">&gt;</span><span class="p">,</span><span class="w"> 
        </span><span class="nl">&lt;&lt; :employee39 :marriageDate "2023-07-25"^^xsd:date &gt;</span><span class="err">&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">

  </span><span class="nl">&lt;urn:hash::sha256:4d33b032442d54f94052b55643240adf79dbea7ae635c5fb167a124dc3d6444a&gt;</span><span class="w">
    </span><span class="nn">rtc:</span><span class="n">elements</span><span class="w"> 
        </span><span class="nl">&lt;&lt; :employee39 :familyName "Johnson" &gt;</span><span class="err">&gt;</span><span class="p">,</span><span class="w"> 
        </span><span class="nl">&lt;&lt; :employee39 :maritalStatus :Married &gt;</span><span class="err">&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nn">:</span><span class="n">JaneSmith</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">prov:</span><span class="n">Person</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">foaf:</span><span class="n">name</span><span class="w"> </span><span class="s">"Jane Smith"</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">foaf:</span><span class="n">mbox</span><span class="w"> </span><span class="nl">&lt;mailto:jane.smith@example.com&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nn">:</span><span class="n">MarriageCertificate20230725</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">prov:</span><span class="n">Entity</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">dc:</span><span class="n">title</span><span class="w"> </span><span class="s">"Marriage Certificate for Sarah Miller and Michael Johnson"</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">generatedAtTime</span><span class="w"> </span><span class="s">"2023-07-25"</span><span class="p">^^</span><span class="nn">xsd:</span><span class="n">date</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nn">:</span><span class="n">MarriageCertificateSubmission</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">prov:</span><span class="n">Activity</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">wasAssociatedWith</span><span class="w"> </span><span class="nn">:</span><span class="n">employee39</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">generated</span><span class="w"> </span><span class="nn">:</span><span class="n">MarriageCertificate2023-07-25</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">endedAtTime</span><span class="w"> </span><span class="s">"2023-07-26T09:00:00Z"</span><span class="p">^^</span><span class="nn">xsd:</span><span class="n">dateTime</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>An interesting possibility, which is not yet used in these early versions of Ontogen but is planned for future versions, should be mentioned in conclusion. The speech act-based commit model outlined here allows for some useful validity checks of changes that are not possible in conventional version control systems. For example, a deletion expressed at an earlier point in time can be detected as actually obsolete because a later expressed insert of the same statement was committed earlier, or analogously, an insert expressed at an earlier point in time is actually obsolete because a later expressed deletion of the same statement was committed earlier or was not effectively committed there because the statement did not exist yet. Let’s imagine, for example, that we had imported a dataset X in the very latest version and would later import a more comprehensive dataset Y that includes X, but in an older version. We can, for example, recognize from the more recent deletion of a statement from the already imported dataset that an addition in the now imported older version is already obsolete. Similarly, conflicts can be detected here that arise, for example, when updating a dataset on whose old version we have made some data cleansing.</p>

<h2 id="summary-and-outlook">Summary and Outlook</h2>

<p>In this article, we have delved into the core of Ontogen’s versioning model, exploring how it leverages RDF Triple Compounds (RTC) to create a robust and flexible system for managing changes in RDF datasets. We introduced the concept of <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>s, a novel approach that applies the philosophical notion of speech acts to the domain of RDF data versioning. These <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>s, implemented as subclasses of <code class="language-plaintext highlighter-rouge">prov:Activity</code>, allow us to capture not just the content of changes, but also the context and intention behind them.</p>

<p>We then examined <code class="language-plaintext highlighter-rouge">og:Proposition</code>s, which serve as immutable, abstract sets of statements within our model. By using SHA256 hashes for their URIs, we ensure the integrity and verifiability of our data throughout the versioning process. This approach also allows for efficient storage and referencing of repeated statement sets across different <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>s.</p>

<p>Finally, we discussed how Ontogen implements commits through <code class="language-plaintext highlighter-rouge">og:Commit</code>s, which represent the actual application of <code class="language-plaintext highlighter-rouge">og:SpeechAct</code>s to the repository. We explored the challenges of ensuring reversibility in commits and how Ontogen addresses these through careful management of change sets.</p>

<p>In the <a href="/introduction/part-3">next article</a>, we will expand our focus from the internal versioning mechanisms to the broader architecture of Ontogen. We’ll explore how Ontogen organizes and manages repositories as DCAT catalogs and implements Ontogen instances as DCAT services.</p>]]></content><author><name>Marcel Otto</name></author><category term="introduction" /><category term="blog" /><summary type="html"><![CDATA[This is the second in a series of four blog posts introducing the different parts of the Ontogen version control system and the ideas behind it:]]></summary></entry><entry><title type="html">Introducing Ontogen</title><link href="https://ontogen.io/introduction/part-1" rel="alternate" type="text/html" title="Introducing Ontogen" /><published>2024-08-08T08:00:00+00:00</published><updated>2026-02-25T00:00:00+00:00</updated><id>https://ontogen.io/introduction/introducing-ontogen</id><content type="html" xml:base="https://ontogen.io/introduction/part-1"><![CDATA[<p>After a year of intensive development, I am pleased to introduce Ontogen - a version control system for RDF datasets. My sincere thanks go to the <a href="https://nlnet.nl">NLnet Foundation</a> for their support through the <a href="https://nlnet.nl/assure">NGI Assure</a> fund, which enabled me to dedicate myself full-time to this extensive project.</p>

<p>It’s important to emphasize that Ontogen is still in an early stage of development. Although the system is equipped with a comprehensive test suite and is regularly tested with two different triple stores (Fuseki and Oxigraph), it lacks extensive real-world testing. Therefore, I cannot yet recommend its productive use for critical data. In particular, some important extensions are still pending:</p>

<ol>
  <li>Full RDF dataset support (currently only versioning of individual graphs is supported)</li>
  <li>Branching support</li>
</ol>

<p>These and other extensions will require fundamental changes that will likely invalidate existing version histories.</p>

<p>In the coming weeks, I plan to publish the project’s future roadmap. An application for another round of funding from the NLnet Foundation for at least one year is currently in progress.</p>

<p>Ontogen offers some novel approaches to versioning RDF data (at least to my knowledge). To adequately explain these complex concepts, I have published a series of four blog posts, to introduce the different parts of the system and the ideas behind them:</p>

<ol>
  <li><strong><a href="/introduction/part-1">Introducing Ontogen</a></strong></li>
  <li><a href="/introduction/part-2">Ontogen’s Versioning Model</a></li>
  <li><a href="/introduction/part-3">Ontogen’s Repository and Service Model</a></li>
  <li><a href="/introduction/part-4">Ontogen Configuration with Bog</a></li>
</ol>

<p>In this first post, I’d like to introduce the technical foundations of Ontogen’s approach to versioning RDF datasets.</p>

<h2 id="source-control-management-vs-data-control-management">Source Control Management vs. Data Control Management</h2>

<p>First, however, I’d like to take a step back and discuss the versioning problem with a particular focus on datasets. This distinct perspective is, to my knowledge, rarely taken, and in practice, software versioning solutions, i.e., SCMs like Git, are too often used for versioning datasets.</p>

<p>Datasets, however, are a different type of versioning subject compared to software. While both datasets and software may ultimately always be text, no one would dispute the claim that data is not the same as software. But why then is there no popular versioning solution for datasets of structured data, especially in the age of “data as the new oil,” where data management and analysis play central roles in almost all industries?</p>

<p>Some examples illustrate the inadequacies of an SCM system as a Data Control Management (DCM) system:</p>

<ul>
  <li>Roles: In an SCM, the committer is the crucial role. While SCMs recognize the difference between author and committer, in practice, this is usually of little importance. For a dataset, however, authorship, i.e., the exact source of datasets, is of greater importance, and many other roles are relevant and should be differentiable, such as data processors (people or systems that transform, clean, or enrich raw data), data curators (experts who organize, categorize, and enrich data with metadata), data protection officers, etc.</li>
  <li>Lack of metadata: Datasets often require extensive metadata (e.g., origin, license, timestamps) that are not natively supported in SCMs.</li>
  <li>Granularity of changes: SCMs often work at the file level, while for datasets, individual records or fields may be relevant.</li>
  <li>Database integration: DCMs should ideally be able to interact directly with database systems, which is not provided for in SCMs.</li>
</ul>

<p>Although increasing attention has been focused on the dataset versioning problem in recent years, it must be noted that no mainstream solution has yet emerged. Instead, SCMs are still too often resorted to for dataset versioning (if they are versioned at all).</p>

<p>The situation is particularly precarious in the Knowledge Graph community. Here, there is a lack of mature, specialized solutions for versioning.</p>

<h2 id="problems-with-previous-versioning-systems-for-rdf">Problems with Previous Versioning Systems for RDF</h2>

<p>Attempts to develop solutions for versioning RDF data have existed for a long time. Unfortunately, all solutions developed so far are either academic proof-of-concepts or approaches that have not found broader acceptance in the community for various reasons. To better understand why, I want to briefly survey the most significant approaches and the common limitations they share.</p>

<h3 id="named-graph-based-approaches">Named-graph-based approaches</h3>

<p>The majority of previous RDF versioning systems relied on named graphs as their primary mechanism for organizing versioned data. <a href="https://github.com/plt-tud/r43ples">R43ples</a> (Graube et al., 2014) acts as a SPARQL proxy that stores addition and deletion sets as separate named graphs for each revision. <a href="https://aksw.org/Projects/Quit">Quit Store</a> (Arndt et al., 2018) maps each named graph to a file in a Git repository, delegating the actual versioning mechanics (history, branching, merging) to Git. <a href="https://ceur-ws.org/Vol-996/papers/ldow2013-paper-01.pdf">R&amp;Wbase</a> (Vander Sande et al., 2013) stores deltas as quads, again using named graphs to organize different changesets. Stardog offered a VCS feature using named graphs for its internal history database, but <a href="https://community.stardog.com/t/missing-versioning-documentation/1455">removed it entirely in version 7.0</a>.</p>

<p>The main problem all of these approaches share is that parts of a graph simply cannot be addressed directly within the RDF model. This forces the creation of a separate named graph for every small group of triples that needs to be versioned or annotated with change metadata, which can quickly lead to a flood of graphs and thus an unwieldy RDF dataset. This becomes particularly problematic when named graphs are also used for content purposes, which then become difficult to distinguish among this flood. Even Quit Store, which elegantly avoids named graphs <em>as the versioning mechanism</em> by delegating to Git, still depends on them <em>as the unit of granularity</em> - one cannot version parts of a graph independently without splitting them into separate named graphs first.</p>

<h3 id="alternative-approaches">Alternative approaches</h3>

<p>Other approaches tried to avoid the named graph problem through different mechanisms, but each came with its own significant trade-offs.</p>

<p>Early systems like <a href="https://www.researchgate.net/publication/228716298_Semversion_A_versioning_system_for_RDF_and_ontologies">SemVersion</a> (Völkel et al., 2005) and the <a href="https://vocab.org/changeset/">ChangeSet vocabulary</a> (Tunnicliffe &amp; Davis, 2005) used standard RDF reification to make individual triples addressable for change tracking. This solves the addressability problem but at severe cost: each described triple requires at least four additional triples, causing significant storage overhead and complex SPARQL queries with multiple joins per matched statement.</p>

<p>Delta and patch-based approaches like <a href="https://jena.apache.org/documentation/rdf-patch/">RDF Patch</a> (part of Apache Jena) provide compact change formats, but focus on change <em>logging</em> and <em>replication</em> rather than version <em>querying</em> - there is no built-in way to query the state at version N or compute diffs between arbitrary versions.</p>

<p>Archiving-focused systems like <a href="https://rdfostrich.github.io/article-jws2018-ostrich/">OSTRICH</a> (Taelman et al., 2018) use sophisticated hybrid storage strategies for efficient versioned queries over large RDF archives. However, their linear storage model is optimized for archival query scenarios rather than the collaborative workflows (branching, merging, provenance tracking) that a Data Control Management system requires.</p>

<p>At the HTTP/resource level, the <a href="https://www.rfc-editor.org/rfc/rfc7089">Memento</a> protocol (RFC 7089) and <a href="https://dl.acm.org/doi/10.1145/2814864.2814875">TailR</a> (Meinhardt et al., 2015) provide temporal access to Linked Data resources, but operate at the document level with no triple-level awareness.</p>

<h3 id="rdf-star-as-a-new-foundation">RDF-star as a new foundation</h3>

<p>With RDF-star, which is currently being standardized as RDF 1.2, a new tool is now available that addresses the fundamental addressability problem without the overhead of reification or the constraints of named graphs. RDF-star is an extension of the RDF data model that allows direct annotation of RDF triples by allowing triples to be used as subjects or objects in other triples. This simplifies the representation of metadata about statements and enables a more natural modeling of complex relationships without having to resort to reification or named graphs.</p>

<p>The possibility of making RDF-star meta-statements with other statements as the subject opens up significant new possibilities. In particular, it is now trivial to define virtual, URI-identifiable sets of statements, i.e., partial graphs within a graph, by assigning the statement to a common resource using a property. This makes RDF-star an ideal foundation for versioning RDF graphs.</p>

<h2 id="rtc-as-a-foundation-for-rdf-data-versioning">RTC as a Foundation for RDF Data Versioning</h2>

<p>To provide a well-known property with this assignment semantics for partial graphs and thus create a foundation for tools that exploit this semantics, I published the <a href="https://w3id.org/rtc">RDF Triple Compounds (RTC) vocabulary</a> last year. It consists of only one RDFS class <code class="language-plaintext highlighter-rouge">rtc:Compound</code> and a few properties, including in particular the <code class="language-plaintext highlighter-rouge">rtc:elementOf</code> with the aforementioned semantics and its inverse <code class="language-plaintext highlighter-rouge">rtc:elements</code>.</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">rtc:</span><span class="n">Compound</span><span class="w"> 
	</span><span class="k">a</span><span class="w"> </span><span class="nn">rdfs:</span><span class="n">Class,</span><span class="w"> </span><span class="nn">owl:</span><span class="n">Class</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">rdfs:</span><span class="n">label</span><span class="w"> </span><span class="s">"Compound"</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">rdfs:</span><span class="n">comment</span><span class="w"> </span><span class="s">"A compound is a set of triples as an RDF resource."</span><span class="w"> </span><span class="p">.</span><span class="w">  
  
</span><span class="nn">rtc:</span><span class="n">elementOf</span><span class="w"> 
	</span><span class="k">a</span><span class="w"> </span><span class="nn">rdf:</span><span class="n">Property,</span><span class="w"> </span><span class="nn">owl:</span><span class="n">ObjectProperty</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">rdfs:</span><span class="n">label</span><span class="w"> </span><span class="s">"element of"</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">rdfs:</span><span class="n">comment</span><span class="w"> </span><span class="s">"Assigns a triple to a compound as an element. The subject must be a RDF triple."</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">rdfs:</span><span class="n">range</span><span class="w"> </span><span class="nn">rtc:</span><span class="n">Compound</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">owl:</span><span class="n">inverseOf</span><span class="w"> </span><span class="nn">rtc:</span><span class="n">elements</span><span class="w"> </span><span class="p">.</span><span class="w">
  
</span><span class="nn">rtc:</span><span class="n">elements</span><span class="w"> 
	</span><span class="k">a</span><span class="w"> </span><span class="nn">rdf:</span><span class="n">Property</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">rdfs:</span><span class="n">label</span><span class="w"> </span><span class="s">"elements"</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">rdfs:</span><span class="n">comment</span><span class="w"> </span><span class="s">"The set of all triples of a compound. The objects must be RDF triples."</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">rdfs:</span><span class="n">domain</span><span class="w"> </span><span class="nn">rtc:</span><span class="n">Compound</span><span class="w"> </span><span class="p">,</span><span class="w">
    </span><span class="nn">owl:</span><span class="n">inverseOf</span><span class="w"> </span><span class="nn">rtc:</span><span class="n">elementOf</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>A <em>triple compound</em> of this vocabulary is thus a set of triples assigned to a common resource. The triples are assigned to a compound with an RDF-star statement using the <code class="language-plaintext highlighter-rouge">rtc:elementOf</code> property.</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">PREFIX</span><span class="w"> </span><span class="nn">:</span><span class="w"> </span><span class="nl">&lt;http://www.example.org/&gt;</span><span class="w">
</span><span class="kd">PREFIX</span><span class="w"> </span><span class="nn">rtc:</span><span class="w"> </span><span class="nl">&lt;https://w3id.org/rtc#&gt;</span><span class="w">
 
</span><span class="nn">:</span><span class="n">employee38</span><span class="w"> 
    </span><span class="nn">:</span><span class="n">firstName</span><span class="w"> </span><span class="s">"John"</span><span class="w"> </span><span class="p">{</span><span class="n">|</span><span class="w"> </span><span class="nn">rtc:</span><span class="n">elementOf</span><span class="w"> </span><span class="nn">:</span><span class="n">compound1</span><span class="w"> </span><span class="n">|</span><span class="p">}</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">:</span><span class="n">familyName</span><span class="w"> </span><span class="s">"Smith"</span><span class="w"> </span><span class="p">{</span><span class="n">|</span><span class="w"> </span><span class="nn">rtc:</span><span class="n">elementOf</span><span class="w"> </span><span class="nn">:</span><span class="n">compound1</span><span class="w"> </span><span class="n">|</span><span class="p">}</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">:</span><span class="n">jobTitle</span><span class="w"> </span><span class="s">"Assistant Designer"</span><span class="w"> </span><span class="p">{</span><span class="n">|</span><span class="w"> </span><span class="nn">rtc:</span><span class="n">elementOf</span><span class="w"> </span><span class="nn">:</span><span class="n">compound1</span><span class="w"> </span><span class="n">|</span><span class="p">}</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nn">:</span><span class="n">compound1</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">rtc:</span><span class="n">Compound</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">:</span><span class="n">statedBy</span><span class="w"> </span><span class="nn">:</span><span class="n">bob</span><span class="w"> </span><span class="p">;</span><span class="w"> 
    </span><span class="nn">:</span><span class="n">statedAt</span><span class="w"> </span><span class="s">"2022-02-16"</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>Alternatively, the <code class="language-plaintext highlighter-rouge">rtc:elements</code> property can be used as the inverse of <code class="language-plaintext highlighter-rouge">rtc:elementOf</code>.</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">PREFIX</span><span class="w"> </span><span class="nn">:</span><span class="w"> </span><span class="nl">&lt;http://www.example.org/&gt;</span><span class="w">
</span><span class="kd">PREFIX</span><span class="w"> </span><span class="nn">rtc:</span><span class="w"> </span><span class="nl">&lt;https://w3id.org/rtc#&gt;</span><span class="w">
 
</span><span class="nn">:</span><span class="n">employee38</span><span class="w"> 
    </span><span class="nn">:</span><span class="n">firstName</span><span class="w"> </span><span class="s">"John"</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">:</span><span class="n">familyName</span><span class="w"> </span><span class="s">"Smith"</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">:</span><span class="n">jobTitle</span><span class="w"> </span><span class="s">"Assistant Designer"</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nn">:</span><span class="n">compound1</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">rtc:</span><span class="n">Compound</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">:</span><span class="n">statedBy</span><span class="w"> </span><span class="nn">:</span><span class="n">bob</span><span class="w"> </span><span class="p">;</span><span class="w"> 
    </span><span class="nn">:</span><span class="n">statedAt</span><span class="w"> </span><span class="s">"2022-02-16"</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">rtc:</span><span class="n">elements</span><span class="w">        
       </span><span class="nl">&lt;&lt; :employee38 :firstName "John" &gt;</span><span class="err">&gt;</span><span class="w"> </span><span class="p">,</span><span class="w">
       </span><span class="nl">&lt;&lt; :employee38 :familyName "Smith" &gt;</span><span class="err">&gt;</span><span class="w"> </span><span class="p">,</span><span class="w">
       </span><span class="nl">&lt;&lt; :employee38 :jobTitle "Assistant Designer" &gt;</span><span class="err">&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>Simultaneously with the vocabulary, an Elixir implementation <a href="https://github.com/rtc-org/rtc-ex">RTC.ex</a> was also provided, which, based on this vocabulary, provides a structure that allows working with these virtual graphs in a way that is largely API-compatible with real graphs, as implemented in the <code class="language-plaintext highlighter-rouge">RDF.Graph</code> structure of RDF.ex.</p>

<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># create a new compound with a couple triples</span>
<span class="n">virtual_graph</span> <span class="o">=</span>  
  <span class="p">[</span>  
    <span class="p">{</span><span class="no">EX</span><span class="o">.</span><span class="no">Employee38</span><span class="p">,</span> <span class="no">EX</span><span class="o">.</span><span class="n">firstName</span><span class="p">(),</span> <span class="s2">"John"</span><span class="p">},</span>  
    <span class="p">{</span><span class="no">EX</span><span class="o">.</span><span class="no">Employee38</span><span class="p">,</span> <span class="no">EX</span><span class="o">.</span><span class="n">familyName</span><span class="p">(),</span> <span class="s2">"Smith"</span><span class="p">},</span>  
    <span class="p">{</span><span class="no">EX</span><span class="o">.</span><span class="no">Employee38</span><span class="p">,</span> <span class="no">EX</span><span class="o">.</span><span class="n">jobTitle</span><span class="p">(),</span> <span class="s2">"Assistant Designer"</span><span class="p">},</span>  
  <span class="p">]</span> 
  <span class="o">|&gt;</span> <span class="no">RTC</span><span class="o">.</span><span class="no">Compound</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="no">EX</span><span class="o">.</span><span class="no">Compound</span><span class="p">,</span> <span class="ss">prefixes:</span> <span class="p">[</span><span class="ss">ex:</span> <span class="no">EX</span><span class="p">])</span>  

<span class="c1"># add some triples to the compound</span>
<span class="n">virtual_graph</span> <span class="o">=</span>  
  <span class="no">RTC</span><span class="o">.</span><span class="no">Compound</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">virtual_graph</span><span class="p">,</span>   
    <span class="no">EX</span><span class="o">.</span><span class="no">Employee39</span>  
    <span class="o">|&gt;</span> <span class="no">EX</span><span class="o">.</span><span class="n">firstName</span><span class="p">(</span><span class="s2">"Jane"</span><span class="p">)</span>  
    <span class="o">|&gt;</span> <span class="no">EX</span><span class="o">.</span><span class="n">familyName</span><span class="p">(</span><span class="s2">"Doe"</span><span class="p">)</span>  
    <span class="o">|&gt;</span> <span class="no">EX</span><span class="o">.</span><span class="n">jobTitle</span><span class="p">(</span><span class="s2">"HR Manager"</span><span class="p">)</span>  
  <span class="p">)</span>  

<span class="c1"># add some statements about the compound itself</span>
<span class="n">virtual_graph</span> <span class="o">=</span>  
  <span class="no">RTC</span><span class="o">.</span><span class="no">Compound</span><span class="o">.</span><span class="n">add_annotations</span><span class="p">(</span><span class="n">virtual_graph</span><span class="p">,</span>  
    <span class="p">%{</span><span class="no">EX</span><span class="o">.</span><span class="n">dataSource</span><span class="p">()</span> <span class="o">=&gt;</span> <span class="no">EX</span><span class="o">.</span><span class="no">DataSource</span><span class="p">}</span>  
  <span class="p">)</span>
</code></pre></div></div>

<p>With the <code class="language-plaintext highlighter-rouge">RTC.Compound.graph(virtual_graph)</code> function, the pure set of statements can be produced as an <code class="language-plaintext highlighter-rouge">RDF.Graph</code> at any time, which in this case generates this graph:</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">@prefix</span><span class="w"> </span><span class="nn">ex:</span><span class="w"> </span><span class="nl">&lt;http://example.com/&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nn">ex:</span><span class="n">Employee38
</span><span class="w">    </span><span class="nn">ex:</span><span class="n">familyName</span><span class="w"> </span><span class="s">"Smith"</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">ex:</span><span class="n">firstName</span><span class="w"> </span><span class="s">"John"</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">ex:</span><span class="n">jobTitle</span><span class="w"> </span><span class="s">"Assistant Designer"</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nn">ex:</span><span class="n">Employee39
</span><span class="w">    </span><span class="nn">ex:</span><span class="n">familyName</span><span class="w"> </span><span class="s">"Doe"</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">ex:</span><span class="n">firstName</span><span class="w"> </span><span class="s">"Jane"</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">ex:</span><span class="n">jobTitle</span><span class="w"> </span><span class="s">"HR Manager"</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>Whereas <code class="language-plaintext highlighter-rouge">RTC.Compound.to_rdf(virtual_graph)</code> provides the complete RDF-star graph with the RTC annotations for the compounds.</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">@prefix</span><span class="w"> </span><span class="nn">ex:</span><span class="w"> </span><span class="nl">&lt;http://example.com/&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="kd">@prefix</span><span class="w"> </span><span class="nn">rtc:</span><span class="w"> </span><span class="nl">&lt;https://w3id.org/rtc#&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nn">ex:</span><span class="n">Compound
</span><span class="w">    </span><span class="nn">ex:</span><span class="n">dataSource</span><span class="w"> </span><span class="nn">ex:</span><span class="n">DataSource</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nn">ex:</span><span class="n">Employee38
</span><span class="w">    </span><span class="nn">ex:</span><span class="n">familyName</span><span class="w"> </span><span class="s">"Smith"</span><span class="w"> </span><span class="p">{</span><span class="n">|</span><span class="w"> </span><span class="nn">rtc:</span><span class="n">elementOf</span><span class="w"> </span><span class="nn">ex:</span><span class="n">Compound</span><span class="w"> </span><span class="n">|</span><span class="p">}</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">ex:</span><span class="n">firstName</span><span class="w"> </span><span class="s">"John"</span><span class="w"> </span><span class="p">{</span><span class="n">|</span><span class="w"> </span><span class="nn">rtc:</span><span class="n">elementOf</span><span class="w"> </span><span class="nn">ex:</span><span class="n">Compound</span><span class="w"> </span><span class="n">|</span><span class="p">}</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">ex:</span><span class="n">jobTitle</span><span class="w"> </span><span class="s">"Assistant Designer"</span><span class="w"> </span><span class="p">{</span><span class="n">|</span><span class="w"> </span><span class="nn">rtc:</span><span class="n">elementOf</span><span class="w"> </span><span class="nn">ex:</span><span class="n">Compound</span><span class="w"> </span><span class="n">|</span><span class="p">}</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nn">ex:</span><span class="n">Employee39
</span><span class="w">    </span><span class="nn">ex:</span><span class="n">familyName</span><span class="w"> </span><span class="s">"Doe"</span><span class="w"> </span><span class="p">{</span><span class="n">|</span><span class="w"> </span><span class="nn">rtc:</span><span class="n">elementOf</span><span class="w"> </span><span class="nn">ex:</span><span class="n">Compound</span><span class="w"> </span><span class="n">|</span><span class="p">}</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">ex:</span><span class="n">firstName</span><span class="w"> </span><span class="s">"Jane"</span><span class="w"> </span><span class="p">{</span><span class="n">|</span><span class="w"> </span><span class="nn">rtc:</span><span class="n">elementOf</span><span class="w"> </span><span class="nn">ex:</span><span class="n">Compound</span><span class="w"> </span><span class="n">|</span><span class="p">}</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">ex:</span><span class="n">jobTitle</span><span class="w"> </span><span class="s">"HR Manager"</span><span class="w"> </span><span class="p">{</span><span class="n">|</span><span class="w"> </span><span class="nn">rtc:</span><span class="n">elementOf</span><span class="w"> </span><span class="nn">ex:</span><span class="n">Compound</span><span class="w"> </span><span class="n">|</span><span class="p">}</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>The RTC property to be used can be configured either globally via application configuration or individually as an option of the <code class="language-plaintext highlighter-rouge">RTC.Compound.to_rdf/2</code> function.</p>

<h2 id="summary-and-outlook">Summary and Outlook</h2>

<p>In this article, we have laid the foundations for Ontogen’s approach to versioning RDF datasets. We have discussed the inadequacies of existing solutions and introduced RDF Triple Compounds (RTC) as the technical basis of Ontogen. RTC utilizes the capabilities of RDF-star to enable flexible and efficient groupings of RDF triples without having to accept the disadvantages of named graphs.</p>

<p>In the <a href="/introduction/part-2">next article</a>, we will take a closer look at the application of RTC in Ontogen. We will show how RTC compounds serve as building blocks for a versioning system that takes into account the various roles and aspects of data management. We will explain how Ontogen uses RTC to enable fine-grained version control while maintaining the clarity of the dataset.</p>

<hr />
<blockquote>
  <p>Updated February 2026: Added a survey of specific previous RDF versioning approaches in the “<a href="#problems-with-previous-versioning-systems-for-rdf">Problems with Previous Versioning Systems for RDF</a>” section.</p>
</blockquote>]]></content><author><name>Marcel Otto</name></author><category term="introduction" /><category term="blog" /><summary type="html"><![CDATA[After a year of intensive development, I am pleased to introduce Ontogen - a version control system for RDF datasets. My sincere thanks go to the NLnet Foundation for their support through the NGI Assure fund, which enabled me to dedicate myself full-time to this extensive project.]]></summary></entry></feed>