Practical ontology

Igor' Arkhipov
May 28, 2020
9 min read

Updated: May 15

A tool for making sense of new domains and knowledge areas

As analysts, we often find ourselves delving deep into new subject matters, domains and businesses. Quite often we are expected to become the subject matter experts, supporting our teams in navigating through a new project environment and being able to answer any questions (or at least know where to find the answers).

Structuring knowledge about a new domain becomes a critical task. When done wrong, it may cause heaps of issues later in the project. Done well, it makes life a lot easier!

* * *

Intro to ontology

In information science, a technique that encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts within a domain is called ontology [1].

More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of concepts and categories that represent the subject.

This way, you can end up with a formal model of the subject matter that may be used as a reference point and an input for many other activities. When applying an ontology model for a specific area of life, be it an industry or a business, we call it a “domain ontology”.

A simple example of ontology modelling

Imagine you are building an app to automate a business process of delivering packages. To do so, you need to understand how the process flows. And to understand that, you need to have a clear picture or snapshot of the domain — what are the key entities that constitute it and how they are related.

First, you start with identifying and defining the entities. In this example, we can start with two very basic ones: you have a package that you need to deliver and a courier who will be delivering it. On a simple ontology model it will appear like this:

Diagram showing a package labeled "Package" connected by an arrow to "Courier" with the text "delivered by" in between.

Notice how we have added an arrow between the entities and a label to it. This makes it easy to understand how the entities are related. The connection reads like a sentence. You start from the entity that is at the start of the arrow and read the entity name, then the label on the arrow, then the target entity name: “Package delivered by Courier”.

Making sure these connections read well will make it easier for stakeholders to use the map.

As next steps, you keep plotting new entities that you come across, such as the address where the package will be delivered to, the addressee who the package is intended for, and the price list that the courier may have.

Flowchart with boxes: Package, Courier, Price list, Address, Addressee. Arrows show relationships: delivered by, has a, sent to, intended for.

Some entities may have multiple connections as well. Imagine, we add an Order form — a document listing the package and the price of delivery, which is signed by the addressee.

Flowchart showing a courier system with elements: Package, Courier, Price list, Order form, Address, Addressee. Arrows denote connections.

So, you plot these entities on a canvas and draw meaningful relationships between them; and you keep doing it until you’ve had enough. Or at least, just enough to start understanding who is who in the zoo.

However, this is not enough to finish the job. Sometimes, you need to capture some more specific features of your entities. For example, the price from the list may depend on the package weight. As such, you will need to capture that weight is an important attribute of a package. It may also be possible the business is tracking the package status, so it becomes another important attribute.

It is also important that one package can be sent to one and only one address, but it may list multiple addressees who can accept it — we capture this as cardinality on the lines.

Flowchart diagram featuring rectangles labeled Package, Courier, Price list, Order form, Address, and Addressee, connected by arrows.

Cardinality is captured as a number on the arrow. The asterisk (*) may stand for “many”, allowing for any amount of connections. A number means a specific amount of connections. An ellipsis means “any amount in between”. So “1 … *” next to Addressee means a package can be intended for any amount of people starting from 1 and above (excluding 0 — you cannot send a package without at least one addressee). This way you can include restrictions and constraints to the model.

You can also enrich the diagram with time-bound items, such as processes or events. We could include the Delivery event to show what triggers the process to sign the delivery form.

Flowchart diagram with labeled boxes: Delivery, Package, Address, Addressee, Courier, Price list, and Order form, depicting package delivery process.

Lastly, items may represent two different things:

individual objects — e.g. Price list in the example above stands for one individual price list that is applied to all deliveries.
classes of items — e.g. Package on the diagram represents a class of objects. Each individual package being delivered by our company will have the same set of attributes, relationships, and restrictions as defined.

Classes may assume a hierarchy, e.g. we may include items such as Letter and Box as more specific examples (child elements) of a Package. Classes may also belong to bigger classes of objects (meta classes), effectively inheriting their attributes. For simpler ontologies in the business context I find simple colour coding works really well:

Flowchart illustrates delivery process: package linked to courier, address, addressee. Includes documents like order form, price list.

A model like this will serve as a great starting point to uncover a new project domain, get on-boarded into a new industry, or start more detailed analysis.

* * *

Isn’t it just data modelling?

If you are a data analyst, you may rightfully wonder, how is it different from data modelling and ER diagrams? True enough, there are similarities between this process and the first steps to build a data model.

Data modelling typically consists of three layers, one build based on the insights from the previous one[2]:

Conceptual: understand what the system contains
Logical: understand how the system should be implemented, agnostic of a specific technical platform
Physical: understand how the system will be implemented in the context of a specific platform

The ontology model as described above can easily serve as an input into the conceptual data modelling or replace it.

It can also serve as an input into business process modelling. Or organisational chart modelling.

Or content architecture.

Or anything else really, because when done properly it captures the knowledge about the domain, which can be used and repurposed for any need going forward.

In the example above I’ve used a very simple notation of boxes and arrows based on UML class diagram notation [3] , which I’m sure you have noticed ;) .

It is very easy to pick up for any stakeholder to read or even contribute to it.

Boxes and arrows are cool, but we are analysts here, so… is there a formal notation?

Ontology modelling when done properly may consume a serious amount of time. Originally created to capture shared understanding about subject matter, it is widely used in Information Technology as well as in academia to retain and reuse knowledge.

Especially for our interconnected world, a Web Ontology Language (OWL) was created. [4]

The OWL is a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things. OWL is a computational logic-based language such that knowledge expressed in OWL can be exploited by computer programs, e.g. to verify the consistency of that knowledge or to make implicit knowledge explicit.

OWL sees the world as a collection of Classes. Each class encapsulates some attributes and descriptions, that all the objects within this class (called Individuals) have. Classes may have sub-classes and super-classes. A sub-class is a child of a super-class, inheriting all the attributes from the parent and adding its own. If you are familiar with object-oriented programming, it is the same concept. OWL, however, allows for multiple inheritances through individuals. This means an individual may belong to multiple classes at the same time, even if those classes belong to different super-classes.

To build a model, the first thing you do is identify your classes and the hierarchy between them. In our example, it may look something like this:

Categories: People (Addressee, Courier), Documents (Forms, Order_forms, Lists), Objects (Address, Package) with yellow dots. — A simple hierarchy of classes, as modelled in Protege.

For the modelling environment, I’ve used Protege here — a tool developed and made open-source by the Stanford university [5].

Each class can be associated with attributes belonging to this class, e.g. People may have a name, while Couriers may also have individual IDs in addition to names. At the same time, OWL supports more advanced structures. For example, it may specify which classes are synonyms — so when an object belongs to one, it automatically belongs to another; and which classes are disjointed — meaning an object belonging to one of them cannot belong to another.

When the hierarchy of classes is defined, you can start exploring relationships between the classes. Those are modelled as object attributes. Each relationship may have a scope — meaning you can specify the classes that can appear on either side of the relationship. You can also define if a relationship is symmetric or not, and, if not, what the inverse relationship is called. In our example, if a Package is delivered_by a Courier, it means the same Courier is_assigned_courier for the same Package.

Ontology interface showing a "delivered_by" property with Asymmetric characteristic checked. Includes Package and Courier domains. — Defining a relationship in Protege

Finally, you can assign restrictions on relationships between the classes, such as an expected cardinality.

Yellow header with "Description: Package" text. Below, bullets: "intended_for exactly 1 Addressee" in black and magenta text. — Cardinality modelling

Once the structure is set up, you can start populating it with Individuals if you need to classify the real life objects. It is somewhat similar to populating the database with real data once the schema is defined.

OWL is an extremely powerful and advanced notation. If you get into it, you will see the beauty of how it helps organise data into structured knowledge.

Do not, however, expect your business stakeholders to appreciate it!

That is, unless you work in the knowledge management industry or academia. Which brings us to the next question:

When do I use a formal ontology spec instead of a casual concept map?

There are just a few use cases that require a formal ontology model [6]:

To share common understanding of the structure of information between people and software agents
To enable reuse of domain knowledge between dispersed non-related teams of researchers working on it
To automate a knowledge driven process, e.g. when you develop an ontology of components and characteristics to apply an algorithm to automatically configure sensible combinations of those components, such as configuring a PC or performing a web-search or translation
To analyse domain knowledge when this is the sole goal of your work

These four are probably the only cases when you are likely to face a monster notation like OWL. In all other cases, a formal ontology is most likely overkill. A simple concept map (like the one described at the start of this article), however, has a much broader list of use cases:

You start a project and try to understand the new industry domain
You start working with a new employer/client and need to make sense of their organisation and processes
You replace a legacy software system and map the current landscape of dependencies
You automate a business process and map the organisation context around it
You have a big group of stakeholders and you need to get them to speak the same language
You need to design data storage in a sensible way

Basically, every time you need people to speak a common language, an ontology map can be a viable first step.

* * *

How do I get started?

There is no one correct way to model a domain — there are always viable alternatives. The best solution almost always depends on the application that you have in mind and the extensions that you anticipate [6].

It is important to understand that ontology modelling is always an iterative process, and in my experience it works best in a collaborative fashion.

Concepts in the ontology are typically close to objects (physical or logical) and relationships in your domain of interest. These are most likely to be nouns (objects) or verbs (relationships) in sentences that describe your domain.

The first step I usually take is to prepare for a workshop with subject matter experts. In preparation, I read through existing project materials and list out all the nouns that are likely to be meaningful in this context.

I also prepare a persona for the workshop — a fictional character smart enough to comprehend the subject matter but coming from a completely different background. Here is an example of a persona that I used in a real session to map out the structure of a university’s course catalogue:

Man operates saw outdoors, wearing ear protection and visor, smiling. Text reads "Meet Jeff." — Jeff is a mechanical engineer with 15 years of experience maintaining and upgrading farming equipment for his family business. He is highly literate and technically savvy, but has no idea about how universities manage their course catalogue data. Also, he is completely fictional.

In the session, I show the terms from my list to the stakeholders one by one and ask them to explain those in a way that our functional persona may understand. This approach helps eliminate the bias of previous knowledge and also avoid unpleasant concerns about my own or my company’s (if I come from an agency) competency. As the terms are discussed, we will plot them on a whiteboard together, drawing lines and labelling squares to land on a conceptual map like the one described above.

After the session, I would normally digitise this map and also prepare a glossary style list of definitions — and take it from there to the next project steps.

Practical ontology

Intro to ontology

A simple example of ontology modelling

Isn’t it just data modelling?

Boxes and arrows are cool, but we are analysts here, so… is there a formal notation?

When do I use a formal ontology spec instead of a casual concept map?

How do I get started?

References

Recent Posts

Comments