Showing posts with label ontology. Show all posts
Showing posts with label ontology. Show all posts

13 August 2025

Protégé

Protégé has long been the gold standard for creating and editing ontologies, the foundational building blocks of the Semantic Web. Its robust feature set and adherence to standards like OWL have made it an indispensable tool for researchers and developers. However, in an era defined by user-centric design and rapid development, Protégé's traditional approach is beginning to show its age. The editor, while powerful, presents a steep learning curve and a workflow that can be cumbersome for those without a deep background in knowledge representation. The time is ripe for a new evolution, one that integrates the power of Generative AI (GenAI) to unlock a more intuitive and efficient ontology creation process.

The core challenge with Protégé, and indeed many traditional ontology editors, is that they are built for experts. The interface, a maze of tabs, views, and axiom builders, is an accurate reflection of the complexity of the underlying OWL language. While this fidelity is a strength for experienced ontologists, it becomes a significant barrier to entry for a wider audience, including domain experts who understand the content but not the formalisms. The process of manually defining classes, properties, and complex axioms is meticulous and prone to human error. Even with reasoners, tracking down inconsistencies can be a time-consuming and frustrating debugging exercise.

This is where GenAI can be a game-changer. Imagine a Protégé editor where a user could describe a new concept in natural language. Instead of manually creating a class, adding properties, and building complex logical expressions, a user could simply type, "Define a 'MedicalCondition' class that is a subclass of 'Disease' and has a 'hasSymptom' property with a range of 'Symptom' and a 'hasTreatment' property." A GenAI feature could then instantly generate the corresponding OWL axioms, complete with logical constraints and relationships. This would drastically reduce the cognitive load and accelerate the initial stages of ontology development.

Furthermore, GenAI could revolutionize the process of data annotation and instance creation. Ontologies are only as useful as the data they describe. Populating an ontology with individuals is often a manual, tedious process. GenAI could be used to analyze unstructured text, such as a medical journal article, and automatically identify and suggest new instances, properties, and relationships. It could even propose new classes and axioms based on patterns it identifies in the text, effectively acting as an intelligent partner in the knowledge acquisition process.

While the existing Protégé community has built a rich ecosystem of plugins and extensions, a native GenAI integration would represent a fundamental shift. It would move the tool from a passive editor to an active assistant, providing intelligent suggestions, automated axiom generation, and a more natural, conversational interface. This would not only make the tool more accessible to a broader user base but also empower seasoned ontologists to work more efficiently and focus on the high-level modeling challenges rather than the low-level syntax. By embracing GenAI, Protégé could solidify its position at the forefront of the semantic web, not just as a tool for experts, but as a catalyst for a more inclusive and productive knowledge-driven future.

16 July 2025

Simplicity of SKOS

In the vast and interconnected landscape of the Semantic Web, the Simple Knowledge Organization System (SKOS) stands out as a remarkably widespread and effective standard for representing knowledge organization systems. Developed by the World Wide Web Consortium (W3C), SKOS provides a common model for sharing and linking thesauri, classification schemes, subject heading lists, taxonomies, and other similar controlled vocabularies. Its pervasive adoption isn't accidental; it stems from a design philosophy that prioritizes simplicity, interoperability, and practical utility.

SKOS's widespread use can be attributed primarily to its intuitive and lightweight nature. Unlike more complex ontological languages, SKOS doesn't demand deep philosophical understanding of formal logic or advanced semantic reasoning. It offers a straightforward vocabulary for describing concepts and their relationships (e.g., skos:broader, skos:narrower, skos:related), making it accessible to librarians, information architects, and domain experts who might not be trained ontologists. This low barrier to entry has enabled countless organizations to publish their existing vocabularies as Linked Data, significantly enhancing their discoverability and reusability across the web. Its alignment with RDF (Resource Description Framework) principles also means SKOS vocabularies can be easily integrated with other datasets, fostering a more interconnected web of knowledge.

Despite its strengths, SKOS is not without its shortcomings. Its very simplicity, while a major advantage, also represents its primary limitation. SKOS is designed for "simple" knowledge organization, meaning it lacks the expressive power for complex ontological modeling. It cannot define new properties, nor does it support intricate logical axioms or sophisticated reasoning capabilities. For instance, while SKOS can state that "Dog" is skos:broader than "Golden Retriever," it cannot formally infer that all Golden Retrievers are animals, nor can it define the properties that distinguish a Golden Retriever from other breeds. Furthermore, its relationships are largely informal; skos:broader implies a hierarchical relationship but doesn't specify the exact nature of that hierarchy (e.g., part-of, type-of, etc.). This lack of formal semantics means that complex inferences or consistency checking, common in more robust ontologies, are beyond SKOS's native capabilities.

Given these limitations, there are clear scenarios when SKOS is not the appropriate choice. If your goal involves defining complex domain models, establishing precise relationships between entities, performing automated reasoning (e.g., inferring new facts from existing ones), or ensuring logical consistency across a highly structured knowledge base, then SKOS will fall short. It's not suitable for building a full-fledged ontology that captures the intricate nuances of a domain, including property characteristics, restrictions, or complex class definitions.

In such cases, other approaches offer the necessary expressivity:

  • RDF Schema (RDFS): For slightly more complex but still lightweight modeling than plain RDF, RDFS allows you to define classes and properties, establish class hierarchies (rdfs:subClassOf), and property hierarchies (rdfs:subPropertyOf). It's a good step up from SKOS when you need to define your own basic vocabulary but don't require formal reasoning. For example, you could define ex:Person as rdfs:subClassOf ex:Agent.

  • Web Ontology Language (OWL): This is the go-to standard for building rich, complex ontologies. OWL provides powerful constructs for defining classes, properties, individuals, and complex relationships with formal semantics. It supports logical reasoning, allowing systems to infer new knowledge, check for inconsistencies, and classify instances automatically. For example, in OWL, you could define that "A person can only have one biological mother" or "If X is the parent of Y, and Y is the parent of Z, then X is the grandparent of Z." This level of expressivity is crucial for AI applications, expert systems, and complex data integration.

SKOS is a widely adopted and invaluable tool for publishing and linking lightweight knowledge organization systems like thesauri and taxonomies. Its strength lies in its simplicity and accessibility, acting as a crucial bridge for making controlled vocabularies available as Linked Data. However, for tasks demanding sophisticated domain modeling, formal reasoning, or complex logical inferences, more expressive languages like RDFS or, more commonly, OWL, are indispensable. Choosing the right tool depends on the specific requirements of the knowledge representation task at hand.

29 June 2025

The Ontologists' Odyssey: A Quest for Being

Three neurodivergent ontologists, Dr. Alistair Finch (whose special interest was the nature of abstract concepts), Professor Beatrice "Bea" Hawthorne (a connoisseur of mereology and the problem of universals), and young Elara Vance (an enthusiastic, if sometimes literal, scholar of identity and change), walked into "The Gastronomic Void," a trendy new restaurant notorious for its minimalist decor and inscrutable menu.

Alistair immediately began to categorize the patrons. "Observe," he muttered, adjusting his spectacles, "the inherent 'treeness' of the table, yet its particular manifestation as 'this specific table.' Is the universal 'table' instantiated here, or is this merely a collection of particles organized as if it were a table?" He pulled out a small notebook.

Bea, already deep in thought, tapped her chin. "And what of the menu, Alistair? It purports to offer 'artisanal simplicity.' Is simplicity itself an artisanable quality, or is it an absence of complexity? And if the latter, can an absence be crafted?" She frowned at a dish simply labeled "Existence."

Elara, meanwhile, was meticulously arranging her cutlery into a perfect linear sequence, forks descending in size, then spoons, then knives. "But if this fork is the fork, and then I use it to eat, does it cease to be the fork and become a 'fork-in-use'? Does its identity shift with its function?" She looked earnestly at a passing waiter, who wisely avoided eye contact.

The waiter, a harried young man named Kevin, finally approached. "Good evening," he said, trying for a cheerful tone. "May I take your order?"

Alistair looked up, startled. "Order? Ah, yes. The imposition of structure upon a chaotic reality. Before we address the 'what,' Kevin, perhaps we should address the 'how.' What is the ontological status of a menu item before it is ordered? Is it merely potentiality, or does it possess a latent being?"

Kevin blinked. "It's, uh, just food, sir. We have specials."

Bea leaned forward. "Kevin, let's consider the 'special.' Is its 'specialness' an intrinsic property, or is it relational, contingent upon its deviation from the 'non-special'? And if all items are 'special' in their unique particularity, does the term then lose its meaning, thus collapsing the distinction?"

Elara had finished arranging her cutlery and now began to re-arrange it into concentric circles. "If I order the 'Soup of the Day,' and tomorrow it's a different soup, is it still the same 'Soup of the Day' conceptually, or has it become a new 'Soup of the Day' entirely, despite the shared designation?"

Kevin sweat. "Look, folks, do you want to, like, eat?"

Alistair nodded gravely. "Indeed. The act of consumption, a transformation of being. But is the 'burger' I consume still a 'burger' qua burger after it enters my digestive system, or does it become 'digested food,' or even 'nutrients'? At what precise point does its 'burger-ness' cease to be?"

Bea sighed contentedly. "Ah, the Ship of Theseus applied to a patty! Exquisite!"

"I'll have the 'Existence'," Elara declared suddenly, pointing to the menu. "But only if it's truly there."

Kevin stared at the menu. "'Existence' is just, like, a plain bun with nothing on it. It's ironic."

Alistair beamed. "A profound statement on essence and void! I'll take the 'Unmanifested Potential' – hold the manifestation, of course."

Bea, ever practical, pointed to another item. "And I shall have the 'Phenomenological Fry Platter.' I wish to observe the inherent 'fry-ness' firsthand, before it dissolves into the realm of the consumed."

Kevin, utterly defeated, scribbled their orders. As he walked away, he heard Alistair muse, "And what of Kevin's 'being'? Is he primarily 'waiter,' 'individual,' or 'a series of transient states performing a service'?"

Bea chuckled. "Perhaps he is simply 'a very patient man in a terrible situation'."

Elara, having finished her cutlery arrangements, began to stack the salt and pepper shakers into a precarious tower. "But if the tower falls, does its 'tower-ness' cease, or does it merely transform into a pile of shakers with a history of being a tower?"

Kevin returned with their "food": a plain bun for Elara, an empty plate for Alistair, and a single, perfectly golden fry for Bea. The ontologists, however, were too engrossed in their philosophical debate to notice the lack of actual sustenance. They had found their meaning not in the meal, but in the delicious, infinite permutations of its being.

24 June 2025

Thing vs Concept

The distinction between a "thing" and a "concept" lies at the heart of how we understand and categorize the world. A "thing" typically refers to a concrete, tangible entity that exists in reality, possessing specific properties and occupying space and time. A tree, a car, a human being – these are things. A "concept," on the other hand, is an abstract idea, a mental construct, or a generalization derived from observed things. "Forest," "transportation," "humanity" – these are concepts. The philosophy underpinning this difference is crucial when designing taxonomies and ontologies, which are structured systems for organizing knowledge.

In the realm of knowledge representation, particularly in domains like data science, artificial intelligence, and information management, deciding when to represent something as a concrete "thing" versus an abstract "concept" is not merely an academic exercise; it has profound practical implications. Taxonomies, which are hierarchical classifications, often start with concrete things and group them under broader concepts. For instance, a "Golden Retriever" (a thing, a specific breed) is classified under "Dog" (a more general concept), which falls under "Canine" (an even broader concept).

Ontologies, which provide a richer representation of knowledge by defining classes, properties, and relationships, demand an even more nuanced approach. Here, the interplay between "things" and "concepts" becomes vital. When constructing an ontology, one must determine whether an entity should be modeled as an individual instance (a "thing") or a class/category (a "concept"). For example, "my car" is a specific instance of a "Car," which is a class. The class "Car" is a concept, while "my car" is a thing.

It makes sense to use abstractions (concepts) when:

  1. Generalization is needed: To group similar things, allowing for easier reasoning and querying across diverse instances. For example, treating "Sedan," "SUV," and "Hatchback" as specific types under the abstract concept of "Car."
  2. Focus is on properties and relationships common to a group: If you want to define that all "Books" have "Authors" and "Titles," you define these properties on the concept "Book," not on every individual book.
  3. Scalability is a concern: Storing properties for every individual thing can be inefficient. Abstractions allow for a more compact and manageable knowledge base.
  4. Semantic clarity is paramount: Concepts provide the vocabulary and framework for understanding a domain, ensuring consistency in meaning.

Conversely, it is right to use concrete "things" (instances) when:

  1. Specificity is essential: When you need to refer to a particular entity with unique attributes, like "the Eiffel Tower" or "the specific transaction ID 12345."
  2. Tracking individual states or histories: If "my car" needs to track its mileage, service history, or current location, it must be represented as a distinct thing.
  3. Events or actions involving specific entities: "John bought a book" involves specific individuals ("John," a type of "person") and a specific item ("a book," an instance of the concept "Book").

The "rightness" of using an abstraction versus a concrete instance depends on the granularity required by the system and the questions it needs to answer. Over-abstracting can lead to a loss of valuable detail, making it impossible to query specific instances. Under-abstracting can lead to a bloated, unmanageable knowledge base that struggles with generalization. The challenge in taxonomy and ontology is to find the optimal balance, building robust models that allow for both generalized reasoning and detailed instance tracking, ensuring the structured knowledge reflects the complex interplay between the abstract and the tangible in our world.

18 May 2022

Knowledge Graph Libraries

  • PyKeen
  • LibKGE
  • OpenKE
  • Graphvite
  • PyKg2Vec
  • DGL
  • DGLKE
  • DeepGraph
  • PBG
  • Ampligraph
  • Graphscore
  • KarateClub
  • Scikit-KGE
  • Stellargraph
  • KGTK
  • MyDig
  • Limes
  • Dedupe
  • Karma
  • Silk

28 May 2021

Drawbacks of SHACL

SHACL is a constraint language for validating RDF graphs against a set of conditions. It works in a closed-world assumption when the end goal is an open-world assumption. The validation is only really defined against those set conditions of constraint. There will be cases when validation cases are missed at point of inference. The entire search space can not be tested against constraints defined over a set of conditions. Approach can be seen as rather opposite to the intended goal. What follows from a SHACL validation can lead to a form of reverse engineering leading to a partially closed-world assumption criteria. It may be better to introduce SHACL earlier in the process rather than later in the process so as to avoid conflict of outcomes. SHACL validation tests can quickly go out of hand if acceptance tests against requirements that are defined in a question/answer form become a form of integration validation tests where constraints have overlapping dependencies. One can notice how quickly SHACL tests form unmaintainable targets and impervious constraints that are derived from set of conditions. In all fairness one may want to validate the graph at point of when it is in a state of a closed-world assumption and not after it has been generalized to an open-world assumption.

17 March 2021

TDD and BDD for Ontology Modelling

The end goal of most ontologies is to meet the semantic representation in a specific context of generalization that follows the open-world assumption. However, such model building approaches can manifest in different ways. One approach is to apply test-driven development and behavior-driven development techniques towards building domain-level ontologies where constraint-based testing can be applied as part of the process. The process steps are elaborated below.

  • Create a series of high-level question/answering requirements which can be defined in form of specification by example
  • Create SHACL/SHEX tests as granular to individual specification examples in context. Each, SHACL/SHEX validation basically tests 'ForSome' case as part of predicate logic per defined question where subset of domain/ranges can be tested.
  • Create BDD based acceptance tests and programmatic unit tests that can test logic constraints
  • At this stage all tests fail. In order to make them pass implement 'ForSome' closed-word assumption defined in the SHACL/SHEX validation i.e implement the representation so SPARQL query can answer the given contextual question for subset cases. Then make the test pass.
  • Keep repeating the test-implement-refactor stages until all tests pass the given set of constraints. Incrementally, refactor the representation ontology. The refactoring is more about building working generalizations that can transform the closed-world assumption of asserted facts to the partial open-world assumption of unknowns for the entire set.
  • Finally, when all tests pass, refactor the entire ontology solution so it conforms to the open-world assumption for the entire set i.e 'ForAll, there exists' which can further be tested using SPARQL against the subsumption hypothesis.
  • If the ontology needs to be integrated with other ontologies build a set of specification by examples for that and implement a set of integration tests in a similar manner.
  • Furthermore, in any given question/answer case identify topical keywords that provide bounded constraints for a separate ontology initiative, it maybe helpful here to apply natural language processing techniques in order to utilize entity linkage for reuse.
  • All tests and implementations can be engineered so it follows best practices for maintainability, extensibility, and readability. The tests can be wired through a continuous integration and a maintainable live documentation process.
  • Expose the ontology as a SPARQL API endpoint
  • Apply release and versioning process to your ontologies that complies with the W3C process
  • It is easier to go from a set of abstractions in a closed-world assumption to an open-world assumption than from an open-world assumption to a closed-world assumption. One can use a similar metaphor of going from relational to graph vs graph to relational in context. 
  • Focus on making ontologies accessible to users
  • OWA is all about incomplete information and the ability to infer on new information, constraint-based testing may not be exhaustive in the search space, but one can try to test against a subsumption hypothesis