DBPedia is a massive pool of semantic knowledge as an extension of Wikipedia. A learning agent can use DBPedia as a knowledge source to understand the open world. Entity Extraction can utilize Entity Linking via URI as mapped to DBPedia. However, there is more context necessary beyond the open world that we as humans understand. GoodRelations is an ecommerce web vocabulary, fairly generic and customizable with support of schema.org. Furthermore, it seems necessary to incorporate an extension of the open world to brand ontologies that provide more focused semantic conceptual understanding about products and services. A GoodBrands ontology or vocabulary may provide for a further extension in scope that may be automated via an aspect of web scraping of brand sites to expose customizable brand related schemas. One could by an obvious measure use the site map to identify such products and services of a brand. These schemas could be formalized and then reference linked to DBPedia as a root source for particular contextual concepts. Thus, avoiding any ambiguity of scope for disambiguation. The semantic aspects of natural language processing could be utilized via the Lemon Framework. One could also extend the Brand Ontology from a durable to a non-durable as well as consumable vs a non-consumable, even whether something is a product or a service, a thing or a concept. By having metadata understanding of products and services at a more granular level an agent could then provide for a more insightful inference for knowledge as well as better extraction with domain adaptation over brands and their related semantics in the open world.
17 February 2016
Brand Ontologies
Labels:
big data
,
dbpedia
,
ecommerce
,
intelligent web
,
linked data
,
metadata
,
natural language processing
,
semantic web
,
text analytics
16 February 2016
AI for Data Validation and Verification
It is predicted that robots will replace many jobs in next 30 years time. However, one of the first critical roles they need to replace is the verification and validation of data input from human error. This is one of most common problems that occurs in business where human error can cause fraud not to mention decline someone from an application that they may have made for a mortgage, loan, security clearance, or recruitment. Furthermore, a business requires data entry in supply chain, ledger accounting, and more. As one can tell the role of data entry is critical across multiple business sectors. Often times the manual task of replacing a form entry into a system by a human needs to be replaced through automation. Furthermore, critical verification and validation checks need to be in place to ensure the data is correct as well as to meet compliance and mitigate risk. Data Validation is usually the aspect of checking that the data entered is sensible and reasonable. However, it does not check the accuracy of such data. Types of validation incorporate checking: digits, format, length, acceptable value lookup, presence of field entry, range, and spelling. Data Verification is usually to check that the data entered matches the source. This can be checked in two ways: double entry and proofreading data. Most of these, if not all can be automated as part of an intelligent agent role designation that can semantically understand the context of the data for validation while at the same time being able to check for the verification of data entry. These days forms are scanned or copied rather than manually entered. However, even such processes require being able to read the handwriting. The intelligent agent needs to be able to understand the different forms of handwriting to deduce characters of a language and semantically understand the meaning without diluting the context of the form nor the data. In process, an intelligent agent needs to be able to process vast quantities at speed greater than that possible for a human i.e. batch processing. Big Data Pipelines have made significant in roads towards automation in the data mining and retrieval with options for stream processing of information. Forms on the web are another aspect of data entry that is often used and entered into a backend database which surely need more intelligent means of validation and verification. Even the role of call center agent can be replaced. Additionally, the knowledgeable intelligent agent will need speech recognition, ability for text-speech analysis, as well as affective understanding of human emotions as part of customer service. At same time, the intelligent agent will need to both facilitate knowledgeable understanding of domain context while processing new information as part of the data entry step. Multitasking is something that computers have been better at than most humans while avoiding error. But, for specialized agents and robots it becomes more complex in learning as tasks get diversified. As we look forward into the future, we are likely to increase trust in artificial intelligence for everyday things while making our lives more complex in other areas of life especially human relationships. In process, data drifts everywhere around us and we adapt to ubiquitous technology as part of a new lifestyle.
15 December 2015
Automatic Summarization
Automatic Summarization is a valuable aspect of Information Extraction in Natural Language Processing. It is applied within Information Retrieval, news summaries, building research abstracts, and within various knowledge exploration contexts. Automatic Summarization can be applied either over single or multiple documents. There is even aspect of building extractions over simple verses rich textual documents. The following extrapolate the various aspects concerning Automatic Summarization processes that are under active research and utilized for development within the various textual domain contexts.
Summarization Types:
extractive |
abstractive |
single document |
multi-document |
indicative |
informative |
keyword |
headline |
generic |
query-focused |
update |
main point |
key point |
outline |
descriptive |
Summary Sentence Approaches:
revision |
ordering |
fusion |
compression |
sentence selection vs summary selection |
Unsupervised Methods:
word frequency |
word probability |
tf*idf weighting |
log-likelihood ratio for topic signatures |
sentence clustering |
graph based methods for sentence ranks |
Semantics and Discourse:
lexical chaining |
latent semantic analysis |
coreference |
rhetorical structure |
discourse-driven graph representation |
Summary Generation Methods:
compression |
rule-based compression |
statistical compression |
headline |
fusion |
context dependent revision |
ordering of information |
Various Genre and Domains:
medical |
journal article |
news |
web |
speech |
conversation log |
financial data |
book |
social media |
legal |
Evaluation:
precision |
recall |
utility |
manual |
automatic |
pyramid |
linguistic quality |
accuracy |
14 December 2015
Question/Answering Approaches In Perspective
Question/Answering has become a hot top in recent years as it can be applied in a variety of domain contexts for data mining for knowledge discovery and as an application of natural language processing. However, one of the core underpinnings has always been about matching a question to an answer and a reformulation. In simple terms, one could apply a decision tree style approach or formalize a keyphrase matching over a set of rules. In recent years, there has been much growth towards applications of probabilistic techniques over rules-based systems. A hybrid approach in artificial intelligence has proven to be an optimal solution in many contexts. And, including semantic constructs through ontologies allows an agent to understand and reason over domain knowledge through inference and deduction. Furthermore, one can take such an intelligent metaphor of understanding a step further into the BDI context of multi-agent systems and mediations for argumentation and game theory. Deep Learning has also provided some robust alternatives. The below is a listing of some proposed ideas on how potentially effective question/answering strategies could be achieved for open/closed-domain understanding. In every case, a semantic ontological understanding becomes important for a somewhat guided way of reasoning about the open world. One can view question/answering as almost like a data funnel or pipeline of question to answer matching through a series of filtration steps in form of Sentiment Analysis, Sentence Comprehension as form of thought chains or tokens, Machine Learning for Classification and Clustering, as well as aspects of semantic domain concepts. In such respects, one can formulate a respective knowledge graph from a generalized view of the open world and gradually apply layers on top of specialized curated domain ontologies to provide for a Commonsense Reasoning, analogous to a human. DBPedia is a starting point to the open world and the entire web is another. A separate lexical store could also be used such as wordnet, sentiwordnet, and wiktionary. Alternative examples to further build on the knowledgebase include: Yago-Sumo, UMBEL, SenticNet, OMCS, and ConceptNet. One could even build a graph of the various curated FAQ sites for a connected knowledge source. However, one day the Web of Data would itself provide a gigantic linked data graph of queryable knowledge via metadata. Today such options are in form of Schema.org and others. In future as research evolves, cognitive agents will be more self-aware of their world with granular and more efficient ways of understanding without much guidance. Another aspect of practical note here is the desirability for a feedback loop between short-term and long-term retention of knowledge cues to avoid excessive repeated backtracking for inference on similar question patterns in context.
Description Steps | Agent Belief |
QA Semantic Domain Ontologies/NLP + BDI Multiagent Ensemble Classifiers (potential for Deep Learning) | Multiple BDI |
QA Semantic Domain Ontologies/NLP + BDI Multiagent Belief Networks using Radial Basis Functions (Autoencoders vs Argumentation) | Multiple BDI |
QA Semantic Domain Ontologies/NLP + BDI Multiagent Reinforcement Learning/Q Learning | Multiple BDI |
QA Semantic Domain Ontologies/NLP + Predicate Calculus for Deductive Inference | Single |
QA Semantic Domain Ontologies/NLP + Basic Commonsense Reasoning | Single |
QA Semantic Domain Ontologies/NLP + Deep Learning (DBN/Autoencoders) | Single |
QA Semantic Domain Ontologies/NLP + LDA/LSA/Search Driven | Single |
QA Semantic Domain Ontologies/NLP + Predicate Calculus for Deductive Inference + Commonsense Reasoning | Single |
QA Semantic Domain Ontologies/NLP + Groovy/Prolog Rules | Single |
QA Semantic Domain Ontologies/NLP + Bayesian Networks | Single |
QA TopicMap/NLP + DeepLearning (Recursive Neural Tensor Network) | Single |
QA Semantic Domain Ontologies/NLP + QATopicMap + Self-Organizing Map | Single |
QA Semantic Domain Ontologies/NLP + Connected Memory/Neuroscience (Associative Memory/HebianLearning) | Single |
QA Semantic Domain Ontologies/NLP + Machine Learning/Clustering in a DataGrid like GridGain | Single |
29 November 2015
Applied Design Patterns
Design Patterns have proven to be quite useful in software engineering practice. When used appropriately they have proven to have many benefits and have become an indispensable design approach for architects. A design pattern is essentially a reusable approach towards repeatable problems in context. Not only do they have benefits for architects and software engineers/developers but also other role players in an agile process including: project sponsors, project managers, testers as well as users. There are many design patterns and the field is always changing as new anti-patterns are found to disprove the existing approaches making way for new patterns. These patterns may also be elaborated for use at object and class scope levels. The Gang of Four have become a fundamental aspect of object-oriented theory and design. However, not every one is at home with utilizing such approaches within their mindset in practice. Having patterns baked in to the language is often seen as a good thing. And, perhaps this is a flaw in languages like Java which formally expect software engineers to have a design pattern style of thinking towards software development and even object-oriented design in particular. Whereas, functional programming languages take a different route and are simpler. Considering that there are a wide array of design patterns available and most with their relevant domain contexts it only seems plausible to have an intelligent template solution as a refactoring tool/library/plugin. This could be one extension to an intelligent agent model as part of the software engineering development process. The pragmatic agent would need to be able to interpret the code at both a logical but also at a more contextual basis and reason on the basis of understanding where it is appropriate to apply the right design pattern even to identify an anti-pattern. This context of software development automation could be extended to provide other uses within the refactoring process of both functional, service/object-oriented programming, data/object modelling, as well as various others as listed below.
23 Gang Of Four Design Patterns
Behavioral: manage relationships, algorithms, responsibilities between objects
- Chain of Responsibility (Object Scope)
- Command (Object Scope)
- Interpreter (Class Scope)
- Iterator (Object Scope)
- Mediator (Object Scope)
- Memento (Object Scope)
- Observer (Object Scope)
- State (Object Scope)
- Strategy (Object Scope)
- Template Method (Class Scope)
- Visitor (Object Scope)
Structural: build large object structures from disparate objects
- Composite (Object Scope)
- Decorator (Object Scope)
- Facade (Object Scope)
- Flyweight (Object Scope)
- Proxy (Object Scope)
- Adapter (Class/Object Scope)
- Bridge (Object Scope)
Creational: construct objects able to be decoupled from implementation
- AbstractFactory (Object Scope)
- Factory Method (Object Scope)
- Builder (Object Scope)
- Prototype (Object Scope)
- Singleton (Object Scope)
Software Design Pattern
Anti-Patterns
SOA Patterns
Data Science Design Patterns
Big Data WorkLoad Design Patterns
Architectural Patterns
Concurrency Patterns
Interactive Design Patterns
Big Data Architectural Patterns
Microservices Patterns
Microservices Architecture Patterns
Service Design Sheet
Linked Data Design Patterns
Ontology Design Patterns
SourceMaking
Enterprise Architecture Patterns
Enterprise Integration Patterns
Cloud Design Patterns
Labels:
big data
,
data science
,
distributed systems
,
interaction design
,
linked data
,
microservices
,
programming
,
software engineering
18 October 2015
Enterprise Architecture
Enterprise Architecture is a formidable terrain for large organizations seeped in system complexity and poor business alignments. Hence, various formal frameworks and methods were defined to manage the architecture of such deliverables. Often times the technology architecture mimics the dynamics of a business culture or organizational functions. The below is a list of the four key methodologies that are used for enterprise architecture and a further comparison.
- Zachman Framework (Taxonomy)
- TOGAF (Process)
- Federal Enterprise Architecture (Methodology)
- Gartner Methodology (EA Practice)
16 October 2015
Startup Stacks
It is always interesting to see what types of technology stacks are being used by startups especially of the ones that have been successful. Compared to enterprises, startups can often times have nominal legacy code and are open to trying out new approaches with bleeding edge technology. The below link sheds some interesting view of what technology stacks are being used by various startups in the industry and to get an idea of the trends across the different tools and services.
Labels:
big data
,
Cloud
,
intelligent web
,
nosql
,
open source
,
programming
,
rest
,
software engineering
,
startups
7 October 2015
Creative Work Licenses for Software
Original work should always be licensed in some way either for open source community or for full disclosure of protection rights. In a competitive world every one is looking for the shining new piece of artifact that could take a digital community by storm. It seems only plausible that one protect their hard work whether for sharing or otherwise. However, the license terms available are very broad and varied for which one has to be fully mindful and aware of the terms. The below are some helpful links in making an informed decision for the best course of action in the selection of an appropriate license term that best suits an artifact or a project requirements.
Subscribe to:
Posts
(
Atom
)