15 December 2015

Automatic Summarization

Automatic Summarization is a valuable aspect of Information Extraction in Natural Language Processing. It is applied within Information Retrieval, news summaries, building research abstracts, and within various knowledge exploration contexts. Automatic Summarization can be applied either over single or multiple documents. There is even aspect of building extractions over simple verses rich textual documents. The following extrapolate the various aspects concerning Automatic Summarization processes that are under active research and utilized for development within the various textual domain contexts.

Summarization Types:
extractive
abstractive
single document
multi-document
indicative
informative
keyword
headline
generic
query-focused
update
main point
key point
outline
descriptive

Summary Sentence Approaches:
revision
ordering
fusion
compression
sentence selection vs summary selection

Unsupervised Methods:
word frequency
word probability
tf*idf weighting
log-likelihood ratio for topic signatures
sentence clustering
graph based methods for sentence ranks

Semantics and Discourse:
lexical chaining
latent semantic analysis
coreference
rhetorical structure
discourse-driven graph representation

Summary Generation Methods:
compression
rule-based compression
statistical compression
headline
fusion
context dependent revision
ordering of information

Various Genre and Domains:
medical
journal article
news
email
web
speech
conversation log
financial data
book
social media
legal

Evaluation:
precision
recall
utility
manual
automatic
pyramid
linguistic quality
accuracy

14 December 2015

Question/Answering Approaches In Perspective

Question/Answering has become a hot top in recent years as it can be applied in a variety of domain contexts for data mining for knowledge discovery and as an application of natural language processing. However, one of the core underpinnings has always been about matching a question to an answer and a reformulation. In simple terms, one could apply a decision tree style approach or formalize a keyphrase matching over a set of rules. In recent years, there has been much growth towards applications of probabilistic techniques over rules-based systems. A hybrid approach in artificial intelligence has proven to be an optimal solution in many contexts. And, including semantic constructs through ontologies allows an agent to understand and reason over domain knowledge through inference and deduction. Furthermore, one can take such an intelligent metaphor of understanding a step further into the BDI context of multi-agent systems and mediations for argumentation and game theory. Deep Learning has also provided some robust alternatives. The below is a listing of some proposed ideas on how potentially effective question/answering strategies could be achieved for open/closed-domain understanding. In every case, a semantic ontological understanding becomes important for a somewhat guided way of reasoning about the open world. One can view question/answering as almost like a data funnel or pipeline of question to answer matching through a series of filtration steps in form of Sentiment Analysis, Sentence Comprehension as form of thought chains or tokens, Machine Learning for Classification and Clustering, as well as aspects of semantic domain concepts. In such respects, one can formulate a respective knowledge graph from a generalized view of the open world and gradually apply layers on top of specialized curated domain ontologies to provide for a Commonsense Reasoning, analogous to a human. DBPedia is a starting point to the open world and the entire web is another. A separate lexical store could also be used such as wordnet, sentiwordnet, and wiktionary. Alternative examples to further build on the knowledgebase include: Yago-Sumo, UMBELSenticNet, OMCS, and ConceptNet. One could even build a graph of the various curated FAQ sites for a connected knowledge source. However, one day the Web of Data would itself provide a gigantic linked data graph of queryable knowledge via metadata. Today such options are in form of Schema.org and others. In future as research evolves, cognitive agents will be more self-aware of their world with granular and more efficient ways of understanding without much guidance. Another aspect of practical note here is the desirability for a feedback loop between short-term and long-term retention of knowledge cues to avoid excessive repeated backtracking for inference on similar question patterns in context.

Description StepsAgent Belief
QA Semantic Domain Ontologies/NLP + BDI Multiagent Ensemble Classifiers (potential for Deep Learning)Multiple BDI
QA Semantic Domain Ontologies/NLP + BDI Multiagent Belief Networks using Radial Basis Functions (Autoencoders vs Argumentation)Multiple BDI
QA Semantic Domain Ontologies/NLP + BDI Multiagent Reinforcement Learning/Q LearningMultiple BDI
QA Semantic Domain Ontologies/NLP + Predicate Calculus for Deductive InferenceSingle
QA Semantic Domain Ontologies/NLP + Basic Commonsense ReasoningSingle
QA Semantic Domain Ontologies/NLP + Deep Learning (DBN/Autoencoders)Single
QA Semantic Domain Ontologies/NLP + LDA/LSA/Search DrivenSingle
QA Semantic Domain Ontologies/NLP + Predicate Calculus for Deductive Inference + Commonsense ReasoningSingle
QA Semantic Domain Ontologies/NLP + Groovy/Prolog RulesSingle
QA Semantic Domain Ontologies/NLP + Bayesian NetworksSingle
QA TopicMap/NLP + DeepLearning (Recursive Neural Tensor Network)Single
QA Semantic Domain Ontologies/NLP + QATopicMap + Self-Organizing MapSingle
QA Semantic Domain Ontologies/NLP + Connected Memory/Neuroscience (Associative Memory/HebianLearning)Single
QA Semantic Domain Ontologies/NLP + Machine Learning/Clustering in a DataGrid like GridGainSingle

29 November 2015

Applied Design Patterns

Design Patterns have proven to be quite useful in software engineering practice. When used appropriately they have proven to have many benefits and have become an indispensable design approach for architects. A design pattern is essentially a reusable approach towards repeatable problems in context. Not only do they have benefits for architects and software engineers/developers but also other role players in an agile process including: project sponsors, project managers, testers as well as users. There are many design patterns and the field is always changing as new anti-patterns are found to disprove the existing approaches making way for new patterns.  These patterns may also be elaborated for use at object and class scope levels. The Gang of Four have become a fundamental aspect of object-oriented theory and design. However, not every one is at home with utilizing such approaches within their mindset in practice. Having patterns baked in to the language is often seen as a good thing. And, perhaps this is a flaw in languages like Java which formally expect software engineers to have a design pattern style of thinking towards software development and even object-oriented design in particular. Whereas, functional programming languages take a different route and are simpler.  Considering that there are a wide array of design patterns available and most with their relevant domain contexts it only seems plausible to have an intelligent template solution as a refactoring tool/library/plugin. This could be one extension to an intelligent agent model as part of the software engineering development process. The pragmatic agent would need to be able to interpret the code at both a logical but also at a more contextual basis and reason on the basis of understanding where it is appropriate to apply the right design pattern even to identify an anti-pattern. This context of software development automation could be extended to provide other uses within the refactoring process of both functional, service/object-oriented programming, data/object modelling, as well as various others as listed below.

23 Gang Of Four Design Patterns

Behavioral: manage relationships, algorithms, responsibilities between objects

  • Chain of Responsibility (Object Scope) 
  • Command (Object Scope) 
  • Interpreter (Class Scope) 
  • Iterator (Object Scope) 
  • Mediator (Object Scope) 
  • Memento (Object Scope) 
  • Observer (Object Scope) 
  • State (Object Scope)
  • Strategy (Object Scope) 
  • Template Method (Class Scope) 
  • Visitor (Object Scope) 

Structural: build large object structures from disparate objects

  • Composite (Object Scope) 
  • Decorator (Object Scope) 
  • Facade (Object Scope) 
  • Flyweight (Object Scope) 
  • Proxy (Object Scope) 
  • Adapter (Class/Object Scope) 
  • Bridge (Object Scope) 

Creational: construct objects able to be decoupled from implementation

  • AbstractFactory (Object Scope) 
  • Factory Method (Object Scope) 
  • Builder (Object Scope) 
  • Prototype (Object Scope) 
  • Singleton (Object Scope)

Software Design Pattern
Anti-Patterns
SOA Patterns
Data Science Design Patterns
Big Data WorkLoad Design Patterns
Architectural Patterns
Concurrency Patterns
Interactive Design Patterns
Big Data Architectural Patterns
Microservices Patterns
Microservices Architecture Patterns
Service Design Sheet
Linked Data Design Patterns
Ontology Design Patterns
SourceMaking
Enterprise Architecture Patterns
Enterprise Integration Patterns
Cloud Design Patterns

18 October 2015

Enterprise Architecture

Enterprise Architecture is a formidable terrain for large organizations seeped in system complexity and poor business alignments. Hence, various formal frameworks and methods were defined to manage the architecture of such deliverables. Often times the technology architecture mimics the dynamics of a business culture or organizational functions. The below is a list of the four key methodologies that are used for enterprise architecture and a further comparison.

16 October 2015

Startup Stacks

It is always interesting to see what types of technology stacks are being used by startups especially of the ones that have been successful. Compared to enterprises, startups can often times have nominal legacy code and are open to trying out new approaches with bleeding edge technology. The below link sheds some interesting view of what technology stacks are being used by various startups in the industry and to get an idea of the trends across the different tools and services.

7 October 2015

Creative Work Licenses for Software

Original work should always be licensed in some way either for open source community or for full disclosure of protection rights. In a competitive world every one is looking for the shining new piece of artifact that could take a digital community by storm. It seems only plausible that one protect their hard work whether for sharing or otherwise. However, the license terms available are very broad and varied for which one has to be fully mindful and aware of the terms. The below are some helpful links in making an informed decision for the best course of action in the selection of an appropriate license term that best suits an artifact or a project requirements.

3 October 2015

Microservices Monitoring

Breaking down a system into more granular services guided by the single responsibility principle does have multiple benefits of bounded context. However, it can also add a degree of complexity that requires more extensive monitoring. With multiple services interaction in a distributed systems context implies multiple log files and a need to aggregate them as well as multiple places for network latency issues to arise. One simple approach is to monitor everything in the entire workflow of the services as well as the system as whole but at same time try to get the bigger picture through an aggregation process. Also, add structure to the logs by utilizing correlation IDs which can then provide a guided trail. The need to be responsive can also be important so real time alerting may also be needed in order to avoid cascaded issues. One can abstract away the service from the system for a monitoring strategy.  The current trend towards monitoring is in a holistic way to get the full picture of the entire system including all its sub-systems as well as all the services interaction within it. A break down of the types of things that can be monitored and examples of tools is given below.

Service-Level Tracking:
  • check inbound response times, error rates, and application metrics
  • check downstream response health, response times of calls, error rates (Hystrix)
  • standardize metrics collection process and pipelines
  • standardize on logging formats so aggregation is easier
  • check system processes for the OS in order to plan for capacity

System-Level Tracking:
  • check host metrics like CPU
  • check system logs and aggregate them so it is possible to filter on individual hosts
  • standardize on single query option for searching through logs
  • standardize on correlation IDs
  • standardize on an action plan and alert levels
  • unify aggregation (Riemann or Suro)

Logstash and Graphite/Collectd/Statsd are also often used in conjunction for the collection and aggregation of logs. One can also apply the ELK stack. The Java Metrics Library can also be utilized to get insights of code during production. There are other tool options available like Skyline and Oculus for anomaly detection and correlation. 

30 September 2015

Open Data and Knowledge

OpenData is all about making data freely available for all without restrictions and mirrors other open source initiatives. It often parallels that of Data.gov and Data.gov.uk. To get involved with OpenKnowledge one can check out Open Knowledge Labs. OpenKnowledge working group areas and data process tools are listed below.

Lobbying Transparency
Open Access
Open Bibliography
Open Definition
Open Design & Hardware
Open Development
Open Economics
Open Education
OpenGLAM
Open Government Data
Open Humanities
Open Linguistics
Open Product Data
Open Science
OpenSpending
Open Sustainability
Open Transport
Personal Data and Privacy
Public Domain

Extracting:

Cleaning:
Nonmenklature

Analyzing:
R

Presenting:

Sharing:

Further details can be found on School of Data.

Open Data Institute