- Definition and Hypothesis (business problem cases and identify value targets)
- Data Acquisition and Exploration
- Model Building Pipeline and Evaluation
- Interpretation and Communication
- Automation and Deployment Operations
26 April 2021
Five Phases of AI Project
8 April 2021
1 April 2021
31 March 2021
Three Approaches to Word Similarity Measures
- Geometric/Spatial to evaluate relative positions of two words in semantic space defined as context vectors
- Set-based that relies on analysis of overlap of the set of contexts in which words occur
- Probabilistic using probabilistic models and measures as proposed in information theory
23 March 2021
17 March 2021
TDD and BDD for Ontology Modelling
The end goal of most ontologies is to meet the semantic representation in a specific context of generalization that follows the open-world assumption. However, such model building approaches can manifest in different ways. One approach is to apply test-driven development and behavior-driven development techniques towards building domain-level ontologies where constraint-based testing can be applied as part of the process. The process steps are elaborated below.
- Create a series of high-level question/answering requirements which can be defined in form of specification by example
- Create SHACL/SHEX tests as granular to individual specification examples in context. Each, SHACL/SHEX validation basically tests 'ForSome' case as part of predicate logic per defined question where subset of domain/ranges can be tested.
- Create BDD based acceptance tests and programmatic unit tests that can test logic constraints
- At this stage all tests fail. In order to make them pass implement 'ForSome' closed-word assumption defined in the SHACL/SHEX validation i.e implement the representation so SPARQL query can answer the given contextual question for subset cases. Then make the test pass.
- Keep repeating the test-implement-refactor stages until all tests pass the given set of constraints. Incrementally, refactor the representation ontology. The refactoring is more about building working generalizations that can transform the closed-world assumption of asserted facts to the partial open-world assumption of unknowns for the entire set.
- Finally, when all tests pass, refactor the entire ontology solution so it conforms to the open-world assumption for the entire set i.e 'ForAll, there exists' which can further be tested using SPARQL against the subsumption hypothesis.
- If the ontology needs to be integrated with other ontologies build a set of specification by examples for that and implement a set of integration tests in a similar manner.
- Furthermore, in any given question/answer case identify topical keywords that provide bounded constraints for a separate ontology initiative, it maybe helpful here to apply natural language processing techniques in order to utilize entity linkage for reuse.
- All tests and implementations can be engineered so it follows best practices for maintainability, extensibility, and readability. The tests can be wired through a continuous integration and a maintainable live documentation process.
- Expose the ontology as a SPARQL API endpoint
- Apply release and versioning process to your ontologies that complies with the W3C process
- It is easier to go from a set of abstractions in a closed-world assumption to an open-world assumption than from an open-world assumption to a closed-world assumption. One can use a similar metaphor of going from relational to graph vs graph to relational in context.
- Focus on making ontologies accessible to users
- OWA is all about incomplete information and the ability to infer on new information, constraint-based testing may not be exhaustive in the search space, but one can try to test against a subsumption hypothesis
Labels:
big data
,
data science
,
linked data
,
machine learning
,
natural language processing
,
ontology
,
semantic web
Subscribe to:
Posts
(
Atom
)