sheet 2 RDF#

GenDifS wurde entworfen zur grafischen Modellierung einer Taxonomie, die dann als OWL T-Box und als SKOS-Thesaurus exportiert werden kannn.

Konzeptionell nicht vorgesehen ist GenDifS dagegen zur Modellierung einer A-Box, d.h. eines Netzwerks von Instanzen (gelegentlich auch Knowledge Graph genannt). Wenn gewünscht ist, dass eine in GenDifS modellierte Taxonomie auch mit Instanzen angereichert wird, muss diese Anreicherung auf einem anderen Weg erfolgen.

In der Praxis üblich ist es, Daten in (meistens Excel-, in der LOD-Szene besser CSV-) Tabellen vorzuhalten. Jede Zeile wird als ein “Datensatz” interpretiert; die Spalten sind Attribute des Datensatzes; und die Zellen sind die Attributwerte.

Es stellt sich die Frage, wie eine CSV-Tabelle mit einem GenDifS-Modell verbunden werden kann. Im folgenden ein paar explorative Bewegungen.

import sys
sys.path.append('../py/')
from gd06 import GenDifS_Map

Im folgenden ein Beipsiel aus https://w3c.github.io/csvw/csv2rdf/#examples

# Dateiname für Mindmap, sheet etc.
example_3 = "countries"

backup_mindmap = False
update_mindmap = True

# Mindmap und Sheet einlesen
o = GenDifS_Map(f"../mm/{example_3}.mm", 
                sheet = f"../sheets/{example_3}.csv",
                pattern= ["owl", "a-box"],
                # use_rdflib = False,
                # remove_attributes= False, # default: true
                verbose = 2)

GenDifS 0.63 (2023-06-19)
cwd into input_dir: /home/dsci/a/l/LA_2023_ss/gendifs/mm
reading /home/dsci/a/l/LA_2023_ss/gendifs/mm/countries.mm: 6 nodes
read /home/dsci/a/l/LA_2023_ss/gendifs/sheets/countries.csv, Index(['index', 'countryCode', 'latitude', 'longitude', 'name', 'index_str'], dtype='object')

# Lexikalische Analyse, Parser, OWL-Code erzeugen
o.compile()

lexing mm with 6 nodes
Lexer: found #1 start nodes: TAXONOMY
XXX 331 myText='TAXONOMY', myText_dict={'TAXONOMY': ''}
XXX 331 myText='DP latitude COL', myText_dict={'DP': 'latitude', 'COL': ''}
XXX 331 myText='DP longitude COL', myText_dict={'DP': 'longitude', 'COL': ''}
XXX 331 myText='DP name', myText_dict={'DP': 'name'}
parsing mm with 6 nodes
parser, self.mentiones: {'topConcept': {'461_604_367', '482_860_992', '808_282_471', '1_904_934_412'}, 'countryCode': {'461_604_367', '482_860_992', '808_282_471', '1_904_934_412'}}
calling codegen() with #1 start nodes
code_differentia, myTag: TAXONOMY, TAXON: topConcept
1954: COL='countryCode'
code_differentia, myTag: DP, TAXON: countryCode
838: DP_COL='latitude'
860: TAXON_COL='countryCode'
GP: topConcept , TAXON: countryCode, DP: latitude

803 TAXON_COL: 0    AD
1    AE
2    AF
Name: countryCode, dtype: object

866 sheet:AD :latitude "42.5" .
sheet:AE :latitude "23.4" .
sheet:AF :latitude "33.9" .
code_differentia, myTag: DP, TAXON: countryCode
838: DP_COL='longitude'
860: TAXON_COL='countryCode'
GP: topConcept , TAXON: countryCode, DP: longitude

803 TAXON_COL: 0    AD
1    AE
2    AF
Name: countryCode, dtype: object

866 sheet:AD :longitude "1.6" .
sheet:AE :longitude "53.8" .
sheet:AF :longitude "67.7" .
code_differentia, myTag: DP, TAXON: countryCode
838: DP_COL=None
860: TAXON_COL='countryCode'
GP: topConcept , TAXON: countryCode, DP: name
codegen: generated 17 entries in `t_and_a_box_records`
Collected 9 code lines, pattern: ['owl', 'a-box', 'ALL']
rdflib: 18 triples.

So sieht die eingelesene CSV-Datei aus. Wir sehen, dass das eine “Tabelle” ist, genauer: Eine zeilenweise notiere Relation:

o.sheet_df

	index	countryCode	latitude	longitude	name	index_str
0	0	AD	42.5	1.6	Andorra	0
1	1	AE	23.4	53.8	United Arab Emirates	1
2	2	AF	33.9	67.7	Afghanistan	2

Natürlich kann man diese Tabelle auch im Format json serialisieren:

o.sheet_df.to_json()

'{"index":{"0":0,"1":1,"2":2},"countryCode":{"0":"AD","1":"AE","2":"AF"},"latitude":{"0":"42.5","1":"23.4","2":"33.9"},"longitude":{"0":"1.6","1":"53.8","2":"67.7"},"name":{"0":"Andorra","1":"United Arab Emirates","2":"Afghanistan"},"index_str":{"0":"0","1":"1","2":"2"}}'

o.sheet_df.to_json(orient="index")

'{"0":{"index":0,"countryCode":"AD","latitude":"42.5","longitude":"1.6","name":"Andorra","index_str":"0"},"1":{"index":1,"countryCode":"AE","latitude":"23.4","longitude":"53.8","name":"United Arab Emirates","index_str":"1"},"2":{"index":2,"countryCode":"AF","latitude":"33.9","longitude":"67.7","name":"Afghanistan","index_str":"2"}}'

So sieht unsere Mindmap aus:

o.display_markdown()

TAXONOMY
- countryCode COL
  - DP latitude COL
  - DP longitude COL
  - DP name

Das ist der Graph (die T-Box und die A-Box) im Format TTL, das von GenDifS erzeugt wird:

print(o.ttl_code)

# __init__
# ALL: 
@prefix ex: <http://example.net/namespace/ex#> .
@prefix cpt: <http://example.net/namespace/cpt#> .
@prefix sheet: <http://example.net/namespace/sheet#> .
@prefix : <http://example.net/namespace/default#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix gendifs: <http://jbusse.de/gendifs#> .


# __init__
# ALL: 
[ rdf:type owl:Ontology ] .



# SUBTAXON.TAXONOMY.a
# owl: declare class *countryCode* being a subclass of topConcept
:countryCode
   a owl:Class ;
   rdfs:subClassOf :topConcept .


# A-Box.1059.TAXONOMY.d
# a-box: 
sheet:AD a :countryCode .
sheet:AE a :countryCode .
sheet:AF a :countryCode .


# DP.a
# owl: declare *latitude* as a data property.
:latitude
   rdf:type owl:DatatypeProperty ;
   rdf:label "latitude" .


# A-Box.DP.c
# a-box: 
sheet:AD :latitude "42.5" .
sheet:AE :latitude "23.4" .
sheet:AF :latitude "33.9" .


# DP.a
# owl: declare *longitude* as a data property.
:longitude
   rdf:type owl:DatatypeProperty ;
   rdf:label "longitude" .


# A-Box.DP.c
# a-box: 
sheet:AD :longitude "1.6" .
sheet:AE :longitude "53.8" .
sheet:AF :longitude "67.7" .


# DP.a
# owl: declare *name* as a data property.
:name
   rdf:type owl:DatatypeProperty ;
   rdf:label "None" .

with open(f"../ttl/{example_3}_mm.ttl", "w") as file:
    file.write(o.ttl_code)

if update_mindmap:
    update_mindmap_filename = f"../mm/{example_3}.mm"
    o.mindmap.write(update_mindmap_filename,pretty_print=True)
    print(f"updated mindmap {update_mindmap_filename}")

updated mindmap ../mm/countries.mm

Selbstverständlich kann man diesen Graphen auch als json exportieren:

print(o.rdflib_graph.serialize(format="json-ld"))

[
  {
    "@id": "http://example.net/namespace/sheet#AD",
    "@type": [
      "http://example.net/namespace/default#countryCode"
    ],
    "http://example.net/namespace/default#latitude": [
      {
        "@value": "42.5"
      }
    ],
    "http://example.net/namespace/default#longitude": [
      {
        "@value": "1.6"
      }
    ]
  },
  {
    "@id": "http://example.net/namespace/default#latitude",
    "@type": [
      "http://www.w3.org/2002/07/owl#DatatypeProperty"
    ],
    "http://www.w3.org/1999/02/22-rdf-syntax-ns#label": [
      {
        "@value": "latitude"
      }
    ]
  },
  {
    "@id": "_:na10c3a347ee14d879389ce4df527d830b1",
    "@type": [
      "http://www.w3.org/2002/07/owl#Ontology"
    ]
  },
  {
    "@id": "http://example.net/namespace/sheet#AE",
    "@type": [
      "http://example.net/namespace/default#countryCode"
    ],
    "http://example.net/namespace/default#latitude": [
      {
        "@value": "23.4"
      }
    ],
    "http://example.net/namespace/default#longitude": [
      {
        "@value": "53.8"
      }
    ]
  },
  {
    "@id": "http://example.net/namespace/sheet#AF",
    "@type": [
      "http://example.net/namespace/default#countryCode"
    ],
    "http://example.net/namespace/default#latitude": [
      {
        "@value": "33.9"
      }
    ],
    "http://example.net/namespace/default#longitude": [
      {
        "@value": "67.7"
      }
    ]
  },
  {
    "@id": "http://example.net/namespace/default#longitude",
    "@type": [
      "http://www.w3.org/2002/07/owl#DatatypeProperty"
    ],
    "http://www.w3.org/1999/02/22-rdf-syntax-ns#label": [
      {
        "@value": "longitude"
      }
    ]
  },
  {
    "@id": "http://example.net/namespace/default#name",
    "@type": [
      "http://www.w3.org/2002/07/owl#DatatypeProperty"
    ],
    "http://www.w3.org/1999/02/22-rdf-syntax-ns#label": [
      {
        "@value": "None"
      }
    ]
  },
  {
    "@id": "http://example.net/namespace/default#countryCode",
    "@type": [
      "http://www.w3.org/2002/07/owl#Class"
    ],
    "http://www.w3.org/2000/01/rdf-schema#subClassOf": [
      {
        "@id": "http://example.net/namespace/default#topConcept"
      }
    ]
  }
]