Showing posts with label sparql. Show all posts
Showing posts with label sparql. Show all posts

5 March 2018

Types of RDF Storage

Native
  • Main Memory-based
  • Disk-based
Non-native
  • RDBMS
    • Schema-based
      • Vertical partitioning
      • Hierarchical property table
      • Property table
    • Schema-free
      • Triple table
  • NoSQL
    • Key-value
    • Column Family
    • Document store
    • Graph database

20 April 2017

Linked Data Patterns

Linked Data Patterns

Semantic Web Meetup Course

semantic web london
metadataconsulting

  • Introduction to Semantic Web standards and Linked data technologies 
  • Resource Description Framework 
  • Graph-based data model representation and core concepts 
  • Terse RDF Triple Language 
  • Advanced RDF features 
  • Best practices on publishing RDF data 
  • RDF Schema (RDFS) 
  • Discussion of the added value of a schema driven by examples 
  • Syntax of the core features: classes, properties and their characteristics 
  • Relationships between RDFS vocabulary elements 
  • Computing answers to typical queries over RDFS datasets 
  • Using Protege for modeling and querying RDFS datasets 
  • Limitations of RDFS 
  • Querying Semantic Web with SPARQL 
  • Core concepts 
  • Basic graph patterns 
  • Querying datasets with the SPARQL engine StarDog 
  • Filters and SPARQL expressions 
  • Property path expressions 
  • Complex graph patterns with advanced features such as optional parts, aggregation and ordering Other query types 
  • Updating with SPARQL 
  • OWL Web Ontology Language  
  • Core concepts and differences to RDFS 
  • Overview of OWL modeling constructs
  • Modeling and assessing the benefits of alternative models in a particular application context Substitutability of modeling constructs 
  • Discussion of the trade-off between the expressivity of modeling languages and the computational efficiency of querying 
  • OWL profiles 
  • Limitations of the expressive power of OWL 
  • Applications of Semantic Technologies in Practice

Alternatives:

22 February 2017

Outstanding Ontologies

There are different types of ontologies ranging from knowledge representation ontologies, domain ontologies, linguistic ontologies, and top-level ontologies. A selection of a few examples from different types are provided below.

Knowledge Representation Ontologies:
Frame Ontology
OKCB

Top-Level Ontologies: 
Cyc
SOWA
Standard Upper Ontology

Linguistic Ontologies: 
Wordnet
Generalized Upper Model
Sensus
Eurowordnet
Mikrokosmos

Ecommerce Ontologies (Domain Ontology): 
United Nations Standards Products and Services Codes
North American Industry Classification System
Standard Classification of Transported Goods
E-Cl@ss
RosettaNet

Medical Ontologies (Domain Ontology):
GALEN
UMLS
ON9

Engineering Ontologies (Domain Ontology):
EngMath
PhysSys

Enterprise Ontologies (Domain Ontology):
Enterprise Ontology
TOVE

Chemistry Ontologies (Domain Ontology):
Chemicals
Ions
Environmental Pollutants

Knowledge Mgmt Ontologies (Domain Ontology):
KA Ontology - Project, Organization, Person, Publication, Event, Research-Topic, Research-Product

Nature.com Subjects Ontologies

5 September 2016

SKOS

SKOS is a very common data model for representing knowledge in form of thesauri or controlled vocabularies which can provide for interlinked knowledge graphs as a form of linked data. SKOS is a lightweight and flexible OWL ontology representation format available in various RDF syntax. OWL on the other hand is an ontology language. It is possible to convert from SKOS to OWL and even to combine them. The below links provide some related tools and libraries for working with SKOS models. 

JSKOS
SKOSAPI
OWLAPI
SKOSEd
OpenSKOS
TemTres
THManager
PoolParty
TopBraid
Thesaurus Master
Lexaurus
Fluent Editor
Intelligent Topic Manager
SKOS2OWL
Protege
SKOSIFY
Poolparty Consistency Checker
KEA
SKOSMOS
SILK

W3C SKOS
SKOS: A Guide for Information Professionals
SKOS Taxonomy
The Accidental Taxonomist
Knowledge Engineering with Semantic Web Technologies
LinkedData Engineering
PoolParty Academy
Gate
Ontotext
Knowledge Extraction
Taxonomy Warehouse
Synaptica

17 May 2016

Graph Comparison

Analytical

TypeBackendSupported FrameworksContext of Use
GiraphHadoop/HDFSSpark/HadoopData Processing for Analytics
GraphXTitan, Neo4J, HDFSSparkData Processing for Analytics (in-memory)
GraphLabHadoop/HDFSSpark/HadoopData Processing for Analytics, using PowerGraph and GAS models

Operational

TypeBackendSupported FrameworksContext of Use
CayleyMongoDB or LevelDBCustom Implementation in GoKnowledge Graph
TitanCassandra, HBase, HDFSTinkerpop & RDF
SPARQL
Massive Knowledge Graphs OLAP/OLTP (now part of Datastax)
Neo4JCustomTinkerpopData Visualization, Web Browsing, Portfolio Analytics, Gene Sequencing, Mobile Social Application
OrientDBCustomTinkerpop & RDF
SPARQL
Embedded and Standalone, Knowledge Graph, Multimodel (Document + Graph)

Semantic

TypeBackendSupported FrameworksContext of Use
Blazegraph and MapGraphCustomSesame
RDF
SPARQL
Tinkerpop
Massive Knowledge Graphs on GPU, includes support for Semantic Web Standards of W3C (used by Wikidata, a Wikimedia project)
StardogCustomRDF
SPARQL
In cloud the semantic data use case (third-party)
OntoText GraphDBCustomSesame
Jena
RDF
SPARQL
Optimized as a Semantic Graph Database based on Semantic Web Standards of W3C (used by BBC, Euromoney, FinancialTimes, etc)
VirtuosoCustom/HybridSesame
Jena
RDF
SPARQL
Optimized as a Semantic Graph Database based on Semantic Web Standards of W3C (used by DBPedia)
AllegrographCustomSesame
RDF
SPARQL
Optimized as a Semantic Graph Database based on Semantic Web Standards of W3C
OpenCogCustomSemantic KnowledgeMassive Artificial General Intelligence Graph Knowledge Base

OLTP/Graph Databases
OLTP/Analytical Databases
Graph Database as a Service
Native Semantic Graph Databases
Graph Query / Interfaces

8 November 2014

Semantic Representation

Representation of semantic data is a computationally expensive process with a lot of embedded metadata for building semantically contextual graphs. However, such representation also comes at a storage and processing cost. XML standard has always been a more complete representation option on basis of which other standards have been developed. However, the introduction of JSON-LD provides further options for flexibility. Unfortunately, flexibility of semantic data processing also comes at a cost from loss in fidelity. Representing JSON-LD maybe a plausible option. But, storing the raw form of RDF in XML compatible native form is always favorable. This loss in fidelity may arise during content negotiation and during conversion. But, RDF is quite a memory intensive representation format which requires a separate processing requirements. Even viewing RDF from property graph perspective may not be sufficient. And, utilizing triple stores and even quad stores have always been the best option even of today, while such options still provide issues with vendor lock-in at times. Although, RDF and semantic web have come along way, there is still a lot that can be done both in terms of standardization as well as better distributed semantic graph storage. Semantic integration is again a core aspect of Linked Data requirements which is another aspect that requires more standardization and advancement. JSON-LD appears to be a useful option for front-end client processing in a lightweight integration. It also has some fundamental limitations in comparison to RDF. A question arises as to why the W3C gave up on the idea of RDF/JSON standardization. However, this is a case of what is more important in the semantic web community and for an application context, whether the representation should be in computer readable or human readable form. Nonetheless, the core representation format of semantic web for storage, in most domain contexts, should really be maintained in the native form of RDF/XML and associated derivatives for obvious reasons.

6 November 2014

Metadata Standards

Library and Book Publishing metadata standards have come a long way and they are still in a state of flux and evolution as cataloging and publishing take on emerging new forms for further standardization and universal interpretation. Data science and Data Mining are also providing new ways of harnessing information and knowledge about classification of both data and content. However, metadata are still the epitome of differentiating and exposing data in all its transformations. XML is often seen as the mainstream format for most metadata standards. However, JSON and RDF have also emerged to break into a strong hold in developing more flexible and universal standard formats. Metadata is categorized in fundamentally three different types: administrative, descriptive, and structural. The following handbook provides further details on the book publishing structures and evolving metadata trends.

5 October 2014

Semantic Certifications

Getting certified in a particular technology is a debatable topic. For some employers, it could be a plus point for achievement. For others, it holds very little value. And, yet certifications can also get outdated very quickly. Perhaps, getting certified with the concepts is more important than the technology certification that is specific to a version. Semantic Web technologies are slow moving as they go through an extensive specification driven process. They are also rarely taught so formally at university and even for certifications. However, Semantic Web is growing in popularity as industry sees remarkable benefits in contextualizing data and information on the web as well as a wide variety of use cases. Semsphere Certifications is one unique starting point that provides solid grounding in the area with a rigorous exam. The certifications are primarily at two levels which may be of interest to most developers: Specialist and a Professional. The third level is primarily for trainers. The two levels cover broadly the core areas of interest and technologies in Semantic Web:

23 June 2014

EAV vs SPO

Knowledge Representation has been around for a long time, within database architecture, but also within abstractions of domain logic especially in area of analytics. There have been various formalisms defined for representation of such knowledge in order to derive a more structured logic for learning and reasoning from data. Databases have historically provided the persistence layer for many applications and usually hold the key to unlocking a lot of the data today. However, good data representation allows for versatility, performance, and unlocking hidden knowledge. Entity-Attribute-Value (EAV) and Subject-Predicate-Object (SPO) are similar in modelling approaches. In both cases, one is working with relationship by 3-tuples. However, EAV is a subset of SPO. Often, people who are not familiar with SPO and are more comfortable with the relational model design end up utilizing EAV, as an extension. However, in most cases, EAV is seen as an anti-pattern leading to higher development times, poor utilization of data, and undesirably more complex queries. In SPO, everything is treated as a resource and extended in a graph representation. It also allows for better reasoning capabilities for inference. SPO also lends very well to the architecture of the web, utilizing URI schemas, in a linked context, as an extension to the RESTful approach of using standard HTTP methods. On one hand, the EAV tries to build richer metadata semantics through a schema taxonomy, as an extension to the database relational model. While, in SPO the approach is more synonymous to ontologies in form of extensible linked data schema for a domain context, which is not only web friendly, but also machine readable. SPO is also linguistically inspired from natural language.

eav and spo
eav/cr model
rdf
SPO
considerations for eav
considerations for modelling eav for biomedical databases
magento eav
eav talk
understanding linked data via eav model

14 May 2014

Open Annotations

Open Annotations Community is an interesting collaborative group defining the standards for specifications on interoperable and extendable annotations which can be enabled for sharing across multiple application, device, and service domains. The open approach here looks towards maximizing on accessibility with unfettered access and even for the addition of new techniques. At same time, there is compatibility with use of standard approaches of publish/subscribe models. Although, it does not define a specific protocol for such interactions. The seamless effort is designed to work with the simplicity of the displaced architecture of the web. As a semantic web standard the approach towards an annotation is taken from the viewpoint of an RDF graph serialization. The design stipulates for body with one or more targets which can be defined as URI resources.  Although, some annotations may not utilize a body. Each resource then has a very distinctive metadata and provenance information with any relevant media type that can be dereferenced. Additional representations can also be defined or resolved as changes arise to resources via content negotiation. There are extensive use cases available for open annotations. The open annotations community is also very active and has draft specification in place for data model.

Clerezza

Clerezza is an OSGI service approach to building semantically driven web applications. It comes with a rich set of integration points and features which make it aptly useful for building to modular services. There is even a conscious effort towards security management with use of WebID which at times is almost lacking in some frameworks. As most aspects of semantic web processing is layered through a workflow process, building to bundles is often more efficient and useful for seamless integration of components. Such bundles might provide features for RDF/JSON formats for building semantic applications and using standard open technologies for implementations such as Jersey, Felix, Jena, Jetty, and even JQuery. The approach can even be made as a platform with specific compile and runtime requirements. Content management systems have multiple parts for working with aspects of content. Semantic web not only makes content more accessible but also using Clerezza can ease the implementation. There are two aspects to the Clerezza project: semantic web application development as well as the RDF storage and manipulation. The core implementations of Clerezza have been engineered in Scala language and provides for use of renderlets which are defined as part of ScalaServerPages for creating various representations. The approach follows the W3C RDF specification and triples are stored using smart content binding which is a versatile layer and very much agnostic to technology providing for both access and modification. The smart content binding also makes use of named graphs to facilitate operations on the data model as well as options to access multiple domain graphs. There are also various adaptors available for processing of RDF graphs. Lastly, the smart content binding provides serialization and parsing services for various conversions and representations. Although, the project does try to provide a very seamless approach, one of the core drawbacks to the initiatives has been in the lack of documentation which makes the stack difficult to understand as well as the use cases for implementation. There have been some efforts made towards improvement in this area and the project is actively in development. One very interesting integration convention is between UIMA/Clerezza for textual annotations using the Annotations Ontology. One can refer further to this on the Domeo Annotations Toolkit paper or the slideshare

26 March 2014

Semantic Annotations

Semantic annotations is a broad and complex area often requiring a mixture of natural language processing as well as knowledge representation. One of the major inherent requirements in an application is to provide for word sense disambiguation. There are also more light weight approaches that generalize on the semantics alone in form of ontologies especially for maintaining publications and cataloging. Such semantics can cater for both text as well as multimedia. What this enables is that semantic labels can be constructed in context and provided for findability, better visualization, reasoning over a set of web resources, and allowing for the conversion from syntactic to knowledge structures. One can approach this manually or in an automated fashion. The manual step often takes the typical approach of transforming of syntactic resources into interlinks of knowledge without taking account of much in way of multiple perspectives of data sources, and which is applied using third-party tools. There is also the approach of utilizing semi-automated annotations. Even though, they also require human intervention at various phases of the process.  GATE is one such semi-automated tool for extracting entity sets. Automated approaches usually require tuning and re-tuning after training. They can get their knowledge from the web and apply it to content in a context-driven manner for automatic extraction and annotation. Wrappers are created that can identify and recognize patterns in text for annotations. While at times, they may be human assisted. They may approach using various classifiers as a supervised way of learning patterns. For annotation of multimedia, this often takes the approach of rich metadata. Alternatively, it could be more in way of content semantics or even granular to the multimedia. Annotations could be global, collaborative, and even local. One could extend and provide rich annotations using custom metadata that could be variously defined through controlled vocabularies, taxonomies, ontologies, topic maps, and thesauri for different contexts. There is even a W3C effort for open annotations as well as the LRMI effort based on schema.org as a learning resources initiative. One could even build a pipeline approach through the various workflow stages of filtering process for content using UIMA. And, even as a CMS approach similar to Apache Stanbol. Standard tools like Tika, Solr, OpenNLP, Kea, can also be useful. Often languages like Java, Groovy, Python, XML, RDF, OWL, are used for implementations and rich textual semantics. However, increasingly tools are emerging on Scala as well.