--- jupytext: formats: md:myst text_representation: extension: .md format_name: myst format_version: 0.13 jupytext_version: 1.16.4 kernelspec: display_name: Python 3 (ipykernel) language: python name: python3 --- # DCMI OpenWEMI This text: 2024-12-18 * first version: 2024-06-25 * a litte update 2024-07-02 at the very end of the text Wonderful news (2023-11-29): *DCMI goes WEMI!* * * Coyle, Karen. Works, Expressions, Manifestations, Items: An Ontology. Code4lib Journal, Issue 53, 2022-05-09. {cite}`coyle2022-wemiOntology` ```{figure} ../../images/github.com_dcmi_openwemi_2024-06-21.png Source: ``` +++ ## Discussion ### Multiple rdfs:range e.g. for openwemi:instantiates? The semantics of a single open arrow is specified in in normal language with *domain* and *range*, and concretized in the turtle file with `rdfs:domain` and `rdfs:range`. This is not a problem if no more than one domain or range is specified per property. Question: Is it wise to specify *several* arrows for a property such as `openwemi:instantiates` in the informal mapping? What is the concrete semantics of OR in the normal language explanation, and how is this implemented in RDF(S)? Let's take a look at the ttl code. * Backlink: * File: [openWEMI.ttl](https://raw.githubusercontent.com/dcmi/openwemi/main/docs/ns/openWEMI.ttl) +++ ``` openwemi:instantiates a rdf:Property ; rdfs:label "instantiates"@en ; rdfs:comment "An Endeavor that instantiates a Manifestation, an Expression or a Work."@en ; rdfs:isDefinedBy openwemi: ; rdfs:subPropertyOf dct:relation ; rdfs:domain openwemi:Item ; dct:description "A relationship asserted from an Item to a Manifestation, an Expression, or a Work."@en ; rdfs:range [ a owl:Class ; owl:unionOf ( openwemi:Work openwemi:Expression openwemi:Manifestation ) ] . ``` +++ The `rdfs:comment` for the property `openwemi:instantiates` explains that an `openwemi:Item` openwemi-instantiates "a manifestation, an expression or a work". (A layman would read XOR here, but as logicians we of course read a non-exclusive OR). Together with the semantics of `rdfs:range`, the `rdfs:comment` indicates that *one or more classes from* `openwemi:Work`, `openwemi:Expression` or `openwemi:Manifestation` are openwemi-instantiated. -- Explicitly: "one or more classes from ..." here means more exactly "one or more elements out of the class, which consists of the classes ...". However, something else is actually specified in the ttl file: The `rdfs:range` of `openwemi:instantiates` is specified as an anonymous class *consisting of the union of* `openwemi:Work`, `openwemi:Expression` and `openwemi:Manifestation`. -- Explicitly: "consistsing of the union of ..." here means "all elements which are contained in at least one of the the classes ... ". +++ The consequence: `rdfs:range` can *no longer* be used to derive a specific class from `openwemi:Work`, `openwemi:Expression` and `openwemi:Manifestation`. Say we have the triple `:myItem_123 openwemi:instantiates :myManifestation_123`: * Using the above *domain* information, it is in fact possible to infer that `:myItem_123 a openwemi:Item`; * however, using the above *range* information, it is *not* possible to infer that `:myItem_123 a openwemi:Manifestation`. +++ ## NEW 2024-12-18 The above explanation is not clear enough, we'll try again with a minimalistic sandbox example. see {doc}`rdfs-range-bunny-carrot`. +++ ### Possible solutions (1) good solution, mentioned in : Do it like FaBiO ... and discuss the term "disadvantage": > The obvious disadvantages are: \[...\] people need to think about when creating their metadata ("is this an item of a manifestation or an item of an expression?" () IMHO this is an advantage (philosophical discussion: IMHO the WEMI classes are strongly disjoint, from an ontological point of view. Thus an item may only be connected to a manifestation, but not to an expression or even work. *item instantiates work* does not fit to the original FRBR WEMI and not to RDA and CIDOC. But that's not the point of discussion here.) +++ (2) > what if the object is a combination work/expression à la Bibframe? If we go to WEMI we anyhow need to disambiguate work/expression-combinations into two different nodes. Long discussion: , (in DE, but machine translation may help?). +++ (3) Another solution: Create OpenWEMI as a radical prune of CIDOC. RDA was merged with CIDOC, see > ; there see Figure: [4.2. Overview of the Model: Illustration 1, page 6](../../images/LRMoo_V1.0_illustration1.png) in [LRMoo_V1.0.pdf](https://www.cidoc-crm.org/frbroo/sites/default/files/LRMoo_V1.0.pdf) Wouldn't it be nice to align OpenWEMI with CIDOC LRMoo, at least in principle? In any case, one should avoid introducing axioms that are not compatible with CIDOC. Some (including me) consider CIDOC and LRMoo to be "rather fat": If OpenWEMI is intended to provide a lightweight alternative, then OpenWEMI could be created as a minimal lightweight version of the respective CIDOC classes? +++ (4) IMHO best solution: Do not model domain and range at all. IMHO best solution: Do not model domain and range at all, at least not with rdfs:domain and rdfs:range. This is because these language elements from RDFS (and OWL) have semantics that are misunderstood by many people. NEW 2024-07-02: some explanations see below, {ref}`rdfs-entailment-not-validation-or-constraint`. +++ ## experiment with rdflib and owlrl We show the domain and range inferencing with a minimalistic example, based on the well known Python libraries * * ```{code-cell} ipython3 import rdflib import owlrl ``` Exampe from above, plus one triple of example instances: ```{code-cell} ipython3 wemi_ttl = """ @prefix : . @prefix rdf: . @prefix rdfs: . @prefix owl: . @prefix dct: . @prefix openwemi: . openwemi:instantiates a rdf:Property ; rdfs:domain openwemi:Item ; rdfs:range [ a owl:Class ; owl:unionOf ( openwemi:Work openwemi:Expression openwemi:Manifestation ) ] . :myItem_123 a . a owl:Class . :myManifestation_123 a . a owl:Class . :myItem_123 openwemi:instantiates :myManifestation_123 . """ ``` Graph `g1` is the graph before inferencing: ```{code-cell} ipython3 g1 = rdflib.Graph().parse(data= wemi_ttl) print(f"Initially g1 has {len(g1)} triples") ``` ```{code-cell} ipython3 print(g1.serialize()) ``` We also allocate `g2`. It will be modified by `owlrl.DeductiveClosure()`. ```{code-cell} ipython3 g2 = rdflib.Graph().parse(data= wemi_ttl) print(f"Initially g2 has {len(g2)} triples") ``` ```{code-cell} ipython3 owlrl.DeductiveClosure(owlrl.OWLRL_Semantics, axiomatic_triples = False).expand(g2) print(f"After inferencing g2 has {len(g2)} triples") ``` ```{code-cell} ipython3 g2_ttl = g2.serialize(format="ttl") # print(g2_ttl) ``` ### What's the domain of our exemplar? ```{code-cell} ipython3 q_domain = """ PREFIX : PREFIX openwemi: PREFIX owl: PREFIX rdf: PREFIX rdfs: PREFIX xsd: SELECT ?S ?SClass WHERE { ?S a ?SClass . ?S openwemi:instantiates ?O . } """ ``` Graph `g1` reflects the situation before inferencing: ```{code-cell} ipython3 for row in g1.query(q_domain): print(row) ``` Graph `g2` reflects the situation *after* inferencing: ```{code-cell} ipython3 for row in g2.query(q_domain): print(row) ``` As we can see, our example item `:myItem_123'` is now an instance of `openwemi:Item`. +++ ### What's the range of our exemplar? ```{code-cell} ipython3 q_range = """ PREFIX : PREFIX openwemi: PREFIX owl: PREFIX rdf: PREFIX rdfs: PREFIX xsd: SELECT ?O ?OClass WHERE { ?O a ?OClass . ?S openwemi:instantiates ?O . } """ ``` Graph `g1` reflects the situation before inferencing: ```{code-cell} ipython3 for row in g1.query(q_range): print(row) ``` Graph `g2` reflects the situation *after* inferencing: ```{code-cell} ipython3 for row in g2.query(q_range): print(row) ``` As we can see, our example manifestation `:myManifestation_123'` is now NOT instance of `openwemi:Manifestation`. Instead it is an instance of a BNODE with an internal ID. The full set of triples of `g2` after inferencing can be seen here: ```{code-cell} ipython3 print(g2.serialize()) ``` (rdfs-entailment-not-validation-or-constraint)= ## Semantics of rdfs: Entailment, not validation or constraint update 2024-07-02 https://github.com/dcmi/openwemi/issues/94 states: > We want to make that a validation point ... > it should be possible to detect that as an inconsistency ... (1) There is a misunderstanding that is as common as it is profound about RDF(S): rdfs:domain and rdfs:range definitely do *not* help you to validate an RDF graph or detect inconsistencies in an RDF graph. https://lists.w3.org/Archives/Public/semantic-web//2006May/0118.html cites a DCMI text: > > 6. Using domains and ranges: RDF supports using "domain" and "range" constraints on RDF properties, for limiting the kinds of resources that a property apply to, ... > I would think that the above paragraph reveals a deep misunderstanding about the nature of rdfs:range and rdfs:domain ... correct? Correct! The semantics of rdfs:domain (and range) are given in https://www.w3.org/TR/rdf11-mt/#rdfs-entailment, rdfs2 and rdfs3. It's clearly stated there that you can *entail* new relationships between two nodes based on the domain information if they don't exist anyway. (for a more detailes explanation c.f. ) (2) However, there seems to be a backdoor to use rdfs:damain and rdfs:range as constraints, see https://www.w3.org/TR/rdf-schema/#ch_domainrange : > RDF Schema provides a mechanism for describing this information, but does not say whether or how an application should use it. For example, while an RDF vocabulary can assert that an author property is used to indicate resources that are instances of the class Person, it does not say whether or how an application should act in processing that range information. Different applications will use this information in different ways. For example, data checking tools might use this to help discover errors in some data set, an interactive editor might suggest appropriate values, and a reasoning application might use it to infer additional information from instance data. I don't think that a ttl model should use this backdoor. If you want to provide a validator that explicitly does not want to use RDFS semantics with information about permitted domains and ranges, https://schema.org/rangeIncludes and https://schema.org/domainIncludes would be more appropriate. Or obe could use SHACL, but that's also a rather complex technology.