Abstract

We discuss an implementation of Donald Knuth's idea of Literal Programming within the field of ontology engineering. Using freemind as authoring frontend the script semAuth (semauth.sourceforge.net) extracts a formal ontology from a from a semantically annotated mindmap. As opposed to well known ontology engineering workbenches like Protegé or OntoStudio our approach allows for a seemless transition from early informal phases of ontology engineering to the late phase, where we formally define classes, instances etc. Having a lightweight and simple to use tool at hand which can be used in early and late phases of a semantic project also allows for more agile ontology engineeering methods including a closer involvement of end users.

Starting point

A first class method to nail down results of cognitive work is to write it down, to externalize it into written text. This is not also true for traditional learning and studying, but also for doing research, documenting discussions or taking notes about new research findings. Using the method "write down your findings early and often" typically involves taking notes, making core concepts explicit, carving out relationships between core concepts and generating an overall structure over a complex field of different relationships. Depending on which of these tasks should be focused on there are several methods (and tools) available, like e.g. concept mapping, concept clustering, laddering or mindmapping etc. All this is in especially true for ontology engineering projects.

Authors who start such cognitive processes with prose text typically write down their content in standard document processing ("Office", "writer") tools. While much of such author's contents are communicated in an informal way, following the logics of natural language communication, it can be observed that authors already make extensive and creative use of the formatting capabilities of modern WYSIWYG text processors in order to structure their text. Why do they add so much structure to simple prose text? If asked they aswered us: "Some of our content has the character of nearly formalized stuff. We want to express things like references to known enities ('issue 4711'), typical formal text structures ('summary', 'example', 'finding') or typical structures know from ontology modelling ('issue 4711 is caused by weakness 08/15'). Pure content without structuring layout is like a flabby jelly poured on the floor." This answer suggests to help an author to express more formal (in our case: "semantic") aspects of her idea can be done by giving her a text structuring and annotation tool at hand.

Other authors already have sophisticated conceptual structures in mind. They already have outlined knowledge structures informally and refined them step by step. Now they feel the need to author a well defined conceptualization of their domain of interest. If they have not got a special training these authors typically use of commonly used editing environments like spreadsheets, mindmaps or graph software. If they are trained ontology engineers, they probably use form based ontology engineering workbenches like Protégé, Swoop, OntoStudio, NeonToolkit or TopBraid Composer etc. These tools share the characteristic that they are form based workbenches which give much control over crafting syntactically correct ontologies. Authors who craft formal systems typically feel the need to comment the results, be it because they simply want to help readers to understand the ontology, be it because they even want to express details which go beyond a formal language's expressiveness. Why do they add so much prise text to already highly structured content? If asked these authors typically answer: "Our formal system needs in order to be communicable and comprehensible. A formal system whick lacks explanatory pointers into the reality is like a bare skeletton without flesh." This answer suggests that to help an ontology engineer to communicate her work can be done by allowing her to attach rich text comments to each single object of the ontology.

Authors who use a form based ontology engineering workbench additionally face some other difficulties: Changing structure or the meta-Type (class, instance, relation) of a given ontology normally calls for complex and error prone refacturing procedures. Restructuring a conceptualization only can be done by changing directly the ontology, which easily may result in an incoherent or otherwise ill formed ontology. Transforming e.g. a class hierarchy into a hiearchy of instances or vice versa requires to make use of complex (and error prone) refacturing functions. If a workbench makes use of specific and highly optimized visualization and interaction paradigms for each single ontology engineering task like creation of taxonomies, definition of class axioms, seeding the ontology with facts, defining rules or running queries, things even become more complex: In order to turn e.g. a class tree into an instance tree the user has to "tunnel" the structure from one view to the other without having control about the refacture process itself. An ontology engineer should not use a workbench like Protege or OntoStudio in ontology engineering projects before she has developed a full understanding of a conceptualization.

What happens there? Both groups of authors start from two antagonist positions and step to the middle.

  • In the first case authors start from natural language and ordinary text processors, adding more and more structure to it. They annotate too weakly structured text with structure in order to express formal meaning. To "annotate" structured text means to add references to formal structures which cannot be expressed in the prose text itself. Because the original text is already understandable by a human reader, the adressee of the annotation is a software system.
  • In the second case authors start with highly structured representations of knowledge. Then they add more and more natural language text to it. To "annotate" such structures means to explain them in an informal, verbose manner. Because the original formalism is already 'understandable' by a software system, the adressee of the annotation is the human reader.

These obersavations led to the idea to have a an authoring tool which starts in the middle. We support the author to write very highly and very explicitly structured text. Semantics within the text is expressed by annotating both (a) single text chunks and (b) the structure of the text. Text structure and the terminological structure of the described domain are synchronized in order to allow for extracting both a traditional office document and a formal ontology automatically from the text.

The basic idea of the semAuth approach

Our approach revives Donald Knuth's concept of literal programming (c.f. http://www.literateprogramming.com/index.html). Instead of authoring an ontology first and documenting it afterwards the idea of semantic authoring toggles these two steps: An author first writes a properly annotated documentation of a conceptualization of a domain such that a tool can gather the respective ontology automatically from it. We call this authoring method semantic authoring.

In order to provide such a semantic authoring tool we have to ask for a concession: After some (pretty expensive and frustrating) trials we learned that we cannot build ona standard WYSIWYG office tool (like MS Office or OpenOfficeOrg) as the basic cognitive and technical platform for our semantic authoring approach. While plain text editors or traditional WYSIWYG office tools are fine for plain (and only very weakly structured) text, they are only sub-optimally suited for very highly structured conceptual modelling. We have to interpret the concept of text in a slighly wider sense: 'Text' is not identical with 'office writer'. There are other ways to represent and author text.

Analyzing the major ontology engineering workbenches and other basic text structure visualization techniques finally led to the decision to take the cognitive technology "mindmapping" as a basis for the new tool. Working with mindmaps is one of the most established brainstorming and information structuring methodologies we have in knowledge management. The structure "tree" forms one of the most basic and widely used structuring paradigms we know.

Consequently we accept not only office documents, but also tree structured collections of text chunks - in our case freemind maps - being a "text". This perfectly fits to a wider concept of "text" in social sciences, where also a picture or even a performance sometimes is called "text".

The simple approach of maintaining content within a tree of elements (technically with a mindmap tool like freemind) serves as the fundamental overall interaction paradigm. It is the very tree itself which mirrors the structure of the text, while we additionally use node taggings which define the semantics of a certain text regions formally. We developed a mindmap tagging system which allows for representing unstructured text content, text structures, schema, fact base and rules completely within one single generic authoring environment. The idea is to have a double-layered text annotation notation which allows for an isomorphism between text structure and semantic.

In fact this mindmap annotation methodology defines a new, sound, powerful and simple interaction paradigm. We believe to our best knowledge that this is a genuinely new and innovative modelling technique, which is not covered by any current major ontology engineering workbench nor by the major well known office tools.

Having decided to use a mindmap as the overall representation paradigm of text and semantics led to the decision to fathom how far we could go purely with that approach. While the mindmapping user interface of semAuth supports the author with an editable tree visualization, the html export serves as an reader friendly documentation port. Would we make it in practice to integrate visualization, structuring and authoring of text and it's semantics such that authoring schema and facts of an ontology could be driven by the same interaction paradigm?

(Note: semAuth makes use of annotated text, but doesn't provide an user interface for text annotations by itself. This has advantages and disadvantages from technical and UI point of view, but doesn't affect our argumentation in detail. However, the prototypes built in X-Media are implemented with freemind.sourceforge.net as the user interface and use a subset of freemind's (version 0.9) XML format as the underlying text representation. In the following we assume "semAuth" being the combination of an arbitrary (free or commercial) mindmap tool plus the script semAuth.xsl. Our example screenshots show freemind 0.9.)

Background: Some ontology engineering requirements from X-Media

Analyzing the X-Media requirements w.r.t. ontology engineering and ontology population lead to some wishes towards an integrated authoring and ontology engineering tool.

The requirements towards a text annotation tool suggest that there is no clear methodological borderline between annotation as ontology population (i.e. adding facts to the A-box) and annotation as terminology refinement (i.e. adding new concepts to the T-Box). Ideally there should be no tool change required between T-Box definition and A-box population. Consequently raising a terminology (with a focus on the schema part) and populating an ontology (with a focus on the A-box part) doesn't necessarily call for distinct tools. Similar structures in A- and in T-box - i.e. terminology trees - should be manageable under the same unser interaction paradigm.

Terminological knowledge should be outlined, fully formulated and refactored within the same tool. As opposed to Protégé or OntoStudio an author can grow her understanding of a conceptualization step by step in time. Our new annotating approach exploits the fact that a mindmap is able to represent text and text structures independent from edge tags or semantic icon tags. The ease of playing around with semi formal and more formal aspects of a conceptualization suggests to make use of mindmaps both in early (brainstorming) phases and late (formalization) phases of terminology projects - completely within one single tool, without having to change work and tool contexts while proceeding in a project. While earlier the need for changing tools resulted often in an inflexible and expensive waterfall approach of ontology engineering, our new approach is suited for cyclic or agile ontology engineering methods. In effect our approach results in an a simple and powerful solution to tie early and late, informal and formal, explorative and productive phases in ontology engineering seamlessly together.

Ordinary users should be empowered to suggest ontology refinements without having to change the ontology. In X-Media the use case ontologies are already developed by joint work between ontology engineering expert from research institutes, technology providers and domain experts of the end user partners. Users should be empowered to suggest refinements to an ontology and communicate this refinements to the ontology engineers without having to commit them immediately. While in traditional ontology engineering workbenches refining an ontology only can be done by changing directly the ontology (which easily may result in an incoherent or otherwise ill formed ontology), the new annotation approach allows for communicating refinement suggestions in a much less viral way: users suggest how to extend the ontology simply by adding text content to the mindmap without tagging it semantically, i.e. annotating the ontology with tree structured text. This works fine with semAuth because we use freemind solely as text editor. The author is free to maintain semantically "ill" intermediate states in order to simplify informal knowledge management and rich re-engineering at all times.

The ease of refining and communicating understandings is also valid from a technical point of view. The tool mm2flo interprets the native format of the lightweight and very common open source tool freemind.sourceforge.net. Having an open and simple format at hand an ontology may be visualized and documented through several channels. While the html export of semauth generates a sophisticatedly linked text document, the mindmap version acts as an editable bird's eye visualization. Conjoint editing and visualization of ontologies and their documentation can be performed even between organizations with very heterogeneous technology stacks and software usage policies. The freemind map is editable without the need of having the script semAuth.xsl installed; even a minimally configured computer while offline can be used to edit an ontology.

  • TBD:
    • The user should be supported in outlining RDFS, OWL and F-logic ontologies without having to care for specific language choices in early phases of ontology development.
      • one tool for many languages
      • modell semantic graphs
      • allow for expressing basic patterns without semantic / with semantic 2b added ex post

Specifics of ontology engineering with semAuth

The most important differences compared to other OE workbenches stem from the semantic authoring (formerly introduced by Donald Knuth as literate programming) approach: semAuth analyses text annotations in order to extract an ontology from text. semAuth distinguishes between (1) text, (2) text structure and (3) semantic annotations of (3.1) text and (3.2) text structure . Authoring T-Box, A-Box and rules is done within one single (well known, commonly used and in personal knowledge management widely established) "cognitive" technique.

Adding structure to the mindmap and thus carving out the semantics of the various ordering principles used is performed twofold in semAuth: Inserting annotated edges will define text structure like sections or paragraphs , nested lists, links to other text segments and even textual and-or trees (like this one) . And annotating text nodes with icons will define semantics like classes and instances , relationships or rules .

OE engineering with semAuth basically means to work with a mindmap in several modes: In all phases of an knowldge modelling project there arises the need to "dump your brain" fast and effectively. Doing this with a mindmap is an effective and structured method which focuses on making explicit terms and some simple relationships between them. Such relationships can be expressed in a mindmap easily and independendly of a specific ontology language.

An important specifics of the mind mapping technique is that the the topology of map matters. While neighbourship of nodes is important to express similarities, at the same time different subtrees may be clustered or ordered by very different characteristics. Because maintaining trees and subtrees is the one and only overall ordering principle, this ordering principle may represent many different structures at the same time.

Again: The ability of maintaining the text structure and the semantic annotations of the mindmap independently from each other - and independendly from a specific ontology modelling language - perhaps is the most important feature of the semAuth approach: You may change text structure(i.e. trees represented as nested lists) without having to change the icons (i.e. the semantic declaraions) of text nodes. And you may change the icons of the text nodes without having to touch the structure of the text and the text itself. In effect you can represent "pre-semantic" structures without having to decide with witch ontology design pattern (and even ontology language) you will implement them finally. As long as you outline a terminology you don't have to care about object, data or annotation properties, classes and instances.

Embedding ontology modelling into a knowledge management project

In order to discuss the specifics of "ontology engineering with semauth" we need a well defined coordinate system. This is necessary because ontology engineering always is embedded into an embracing knowledge management process. One of the most developed and sophisticated methodologies for knowledge engineering and management is CommonKADS. While following CommonKADS in detail would require deliberate ressources in practice, it is perfectly suited as the methodological background for positioning ontologies within a larger context. (A specific interpretation of CommonKADS w.r.t. engineering of single (stand alone) ontologies is given with the method OntoKnowledge. , A more recent applicaion to networked ontology engineering is elaborated in the Neon-Project TBD: link to the NeonMethdology. ) Selecting a sophisticated knowlede modelling model like CommonKADS as such a context we have a sound background to point out by example how specific semAuth characteristics are tangled with certain details of a KM method, here CommonKADS.

CommonKADS speaks of not less than six models within a knowledge management project, modelling (a) project contexts like the organization, it's tasks and it's agents, (b) the design of the overall knowledge system, (c) conceptual knowledge like the knowledge and the communication model.

The CommonKAS methodology insists on decoupling data an function very clearly. As a result the typical CommonKADS template knowledge models are notated using and UML flow diagrams for expressing inference structures UML object diagrams for expressing conceptual knowledge (or using other notations which are adequate for expressing dynamic and static knowledge structures respectively.) W.r.t. CommonKADS ontologies (schema, facts, rules) should be used mainly to model static knowledge, as opposed to earlier knowledge engineering approaches (i.e. the expert systems of the 1980s) which also tried to model the dynamics of inferencing processes itself with rules. We consider this being a wise recommendation, for it reduces complexity and enhances modularity of knowledge models.

We show a sand box example of how to craft a small and tiny ontology in section pdkSandboxOntologies and relate it to aspects of the CommonKADS methodology.

Summary

The need to document and annotate specific modelling choices which occurs often and in all phases of a project can easily be met by simply adding free text nodes to the mindmap which are not interpreted being a semantic element. Having free text documentation of conceptualizations within the same tool and according to the same authoring paradigm like "real" conceptualizations flattens often rather arbitrary layerings between data and metadata (i.e. OWL data or object properties and annotation properties): Text which was originally intented as an explanation only easily can be tagged semantically, thus be drawn from the region of informal (but alread explicit) background knowldge to the light of formally well defined semantically modelled knowledge.

It's important to understand that structuring, semantic tagging and providing explanatory text can be done iteratively, step by step and within the same tool without having to change tools/GUIs and/or knowledge representation paradigms. In effect semAuth is a both a cross section tool (suited for several user profiles) and longitudinal section tool (to be used in several phases of a process) within ontology enginering projects.

In contrast to tools like Protégé semAuth is suited both for end users and experienced ontology engineers. While both user groups of course have different needs and show different types of use, crafting an ontolgy with semAuth allows to involve end users earlier and in a more self organized way in ontology engineering. This is because the same technical frontend and the same knowledge representation method is used in all phases of an ontology engineering project: In early phases the mindmapping approach is a first class choice to document results of brainstorming, outlining and pre-structuring processes. The mindmapping approach perfectly supports the knowledge elicitation method laddering. In later phases the semantic tagging approach is a first class approach to make explicit technical, ontology language specific choices without having to change the modelling tool.

Acknowledgement

Part of this work was developed in the project x-media-project.org which is partly sponsored by the European Commission under contract no. FP6-26978.