Mabble Rabble

15 December 2015

Automatic Summarization

Automatic Summarization is a valuable aspect of Information Extraction in Natural Language Processing. It is applied within Information Retrieval, news summaries, building research abstracts, and within various knowledge exploration contexts. Automatic Summarization can be applied either over single or multiple documents. There is even aspect of building extractions over simple verses rich textual documents. The following extrapolate the various aspects concerning Automatic Summarization processes that are under active research and utilized for development within the various textual domain contexts.

Summarization Types:

extractive

abstractive

single document

multi-document

indicative

informative

keyword

headline

generic

query-focused

update

main point

key point

outline

descriptive

Summary Sentence Approaches:

revision

ordering

fusion

compression

sentence selection vs summary selection

Unsupervised Methods:

word frequency

word probability

tf*idf weighting

log-likelihood ratio for topic signatures

sentence clustering

graph based methods for sentence ranks

Semantics and Discourse:

lexical chaining

latent semantic analysis

coreference

rhetorical structure

discourse-driven graph representation

Summary Generation Methods:

compression

rule-based compression

statistical compression

headline

fusion

context dependent revision

ordering of information

Various Genre and Domains:

medical

journal article

news

web

speech

conversation log

financial data

book

social media

legal

Evaluation:

precision

recall

utility

manual

automatic

pyramid

linguistic quality

accuracy

14 December 2015

Question/Answering Approaches In Perspective

Question/Answering has become a hot top in recent years as it can be applied in a variety of domain contexts for data mining for knowledge discovery and as an application of natural language processing. However, one of the core underpinnings has always been about matching a question to an answer and a reformulation. In simple terms, one could apply a decision tree style approach or formalize a keyphrase matching over a set of rules. In recent years, there has been much growth towards applications of probabilistic techniques over rules-based systems. A hybrid approach in artificial intelligence has proven to be an optimal solution in many contexts. And, including semantic constructs through ontologies allows an agent to understand and reason over domain knowledge through inference and deduction. Furthermore, one can take such an intelligent metaphor of understanding a step further into the BDI context of multi-agent systems and mediations for argumentation and game theory. Deep Learning has also provided some robust alternatives. The below is a listing of some proposed ideas on how potentially effective question/answering strategies could be achieved for open/closed-domain understanding. In every case, a semantic ontological understanding becomes important for a somewhat guided way of reasoning about the open world. One can view question/answering as almost like a data funnel or pipeline of question to answer matching through a series of filtration steps in form of Sentiment Analysis, Sentence Comprehension as form of thought chains or tokens, Machine Learning for Classification and Clustering, as well as aspects of semantic domain concepts. In such respects, one can formulate a respective knowledge graph from a generalized view of the open world and gradually apply layers on top of specialized curated domain ontologies to provide for a Commonsense Reasoning, analogous to a human. DBPedia is a starting point to the open world and the entire web is another. A separate lexical store could also be used such as wordnet, sentiwordnet, and wiktionary. Alternative examples to further build on the knowledgebase include: Yago-Sumo, UMBEL, SenticNet, OMCS, and ConceptNet. One could even build a graph of the various curated FAQ sites for a connected knowledge source. However, one day the Web of Data would itself provide a gigantic linked data graph of queryable knowledge via metadata. Today such options are in form of Schema.org and others. In future as research evolves, cognitive agents will be more self-aware of their world with granular and more efficient ways of understanding without much guidance. Another aspect of practical note here is the desirability for a feedback loop between short-term and long-term retention of knowledge cues to avoid excessive repeated backtracking for inference on similar question patterns in context.

Description Steps	Agent Belief
QA Semantic Domain Ontologies/NLP + BDI Multiagent Ensemble Classifiers (potential for Deep Learning)	Multiple BDI
QA Semantic Domain Ontologies/NLP + BDI Multiagent Belief Networks using Radial Basis Functions (Autoencoders vs Argumentation)	Multiple BDI
QA Semantic Domain Ontologies/NLP + BDI Multiagent Reinforcement Learning/Q Learning	Multiple BDI
QA Semantic Domain Ontologies/NLP + Predicate Calculus for Deductive Inference	Single
QA Semantic Domain Ontologies/NLP + Basic Commonsense Reasoning	Single
QA Semantic Domain Ontologies/NLP + Deep Learning (DBN/Autoencoders)	Single
QA Semantic Domain Ontologies/NLP + LDA/LSA/Search Driven	Single
QA Semantic Domain Ontologies/NLP + Predicate Calculus for Deductive Inference + Commonsense Reasoning	Single
QA Semantic Domain Ontologies/NLP + Groovy/Prolog Rules	Single
QA Semantic Domain Ontologies/NLP + Bayesian Networks	Single
QA TopicMap/NLP + DeepLearning (Recursive Neural Tensor Network)	Single
QA Semantic Domain Ontologies/NLP + QATopicMap + Self-Organizing Map	Single
QA Semantic Domain Ontologies/NLP + Connected Memory/Neuroscience (Associative Memory/HebianLearning)	Single
QA Semantic Domain Ontologies/NLP + Machine Learning/Clustering in a DataGrid like GridGain	Single

29 November 2015

Applied Design Patterns

Design Patterns have proven to be quite useful in software engineering practice. When used appropriately they have proven to have many benefits and have become an indispensable design approach for architects. A design pattern is essentially a reusable approach towards repeatable problems in context. Not only do they have benefits for architects and software engineers/developers but also other role players in an agile process including: project sponsors, project managers, testers as well as users. There are many design patterns and the field is always changing as new anti-patterns are found to disprove the existing approaches making way for new patterns. These patterns may also be elaborated for use at object and class scope levels. The Gang of Four have become a fundamental aspect of object-oriented theory and design. However, not every one is at home with utilizing such approaches within their mindset in practice. Having patterns baked in to the language is often seen as a good thing. And, perhaps this is a flaw in languages like Java which formally expect software engineers to have a design pattern style of thinking towards software development and even object-oriented design in particular. Whereas, functional programming languages take a different route and are simpler. Considering that there are a wide array of design patterns available and most with their relevant domain contexts it only seems plausible to have an intelligent template solution as a refactoring tool/library/plugin. This could be one extension to an intelligent agent model as part of the software engineering development process. The pragmatic agent would need to be able to interpret the code at both a logical but also at a more contextual basis and reason on the basis of understanding where it is appropriate to apply the right design pattern even to identify an anti-pattern. This context of software development automation could be extended to provide other uses within the refactoring process of both functional, service/object-oriented programming, data/object modelling, as well as various others as listed below.

23 Gang Of Four Design Patterns

Behavioral: manage relationships, algorithms, responsibilities between objects

Chain of Responsibility (Object Scope)
Command (Object Scope)
Interpreter (Class Scope)
Iterator (Object Scope)
Mediator (Object Scope)
Memento (Object Scope)
Observer (Object Scope)
State (Object Scope)
Strategy (Object Scope)
Template Method (Class Scope)
Visitor (Object Scope)

Structural: build large object structures from disparate objects

Composite (Object Scope)
Decorator (Object Scope)
Facade (Object Scope)
Flyweight (Object Scope)
Proxy (Object Scope)
Adapter (Class/Object Scope)
Bridge (Object Scope)

Creational: construct objects able to be decoupled from implementation

AbstractFactory (Object Scope)
Factory Method (Object Scope)
Builder (Object Scope)
Prototype (Object Scope)
Singleton (Object Scope)

Software Design Pattern
Anti-Patterns
SOA Patterns
Data Science Design Patterns
Big Data WorkLoad Design Patterns
Architectural Patterns
Concurrency Patterns
Interactive Design Patterns
Big Data Architectural Patterns
Microservices Patterns
Microservices Architecture Patterns
Service Design Sheet
Linked Data Design Patterns
Ontology Design Patterns
SourceMaking
Enterprise Architecture Patterns
Enterprise Integration Patterns
Cloud Design Patterns

18 October 2015

Enterprise Architecture

Enterprise Architecture is a formidable terrain for large organizations seeped in system complexity and poor business alignments. Hence, various formal frameworks and methods were defined to manage the architecture of such deliverables. Often times the technology architecture mimics the dynamics of a business culture or organizational functions. The below is a list of the four key methodologies that are used for enterprise architecture and a further comparison.

Zachman Framework (Taxonomy)
TOGAF (Process)
Federal Enterprise Architecture (Methodology)
Gartner Methodology (EA Practice)

16 October 2015

Startup Stacks

It is always interesting to see what types of technology stacks are being used by startups especially of the ones that have been successful. Compared to enterprises, startups can often times have nominal legacy code and are open to trying out new approaches with bleeding edge technology. The below link sheds some interesting view of what technology stacks are being used by various startups in the industry and to get an idea of the trends across the different tools and services.

Stackshare

7 October 2015

Creative Work Licenses for Software

Original work should always be licensed in some way either for open source community or for full disclosure of protection rights. In a competitive world every one is looking for the shining new piece of artifact that could take a digital community by storm. It seems only plausible that one protect their hard work whether for sharing or otherwise. However, the license terms available are very broad and varied for which one has to be fully mindful and aware of the terms. The below are some helpful links in making an informed decision for the best course of action in the selection of an appropriate license term that best suits an artifact or a project requirements.

choose a license

osfreesoft

Comparison_of_free_and_open-source_software_licenses

producingoss

opensource

software license

3 October 2015

Microservices Monitoring

Breaking down a system into more granular services guided by the single responsibility principle does have multiple benefits of bounded context. However, it can also add a degree of complexity that requires more extensive monitoring. With multiple services interaction in a distributed systems context implies multiple log files and a need to aggregate them as well as multiple places for network latency issues to arise. One simple approach is to monitor everything in the entire workflow of the services as well as the system as whole but at same time try to get the bigger picture through an aggregation process. Also, add structure to the logs by utilizing correlation IDs which can then provide a guided trail. The need to be responsive can also be important so real time alerting may also be needed in order to avoid cascaded issues. One can abstract away the service from the system for a monitoring strategy. The current trend towards monitoring is in a holistic way to get the full picture of the entire system including all its sub-systems as well as all the services interaction within it. A break down of the types of things that can be monitored and examples of tools is given below.

Service-Level Tracking:

check inbound response times, error rates, and application metrics
check downstream response health, response times of calls, error rates (Hystrix)
standardize metrics collection process and pipelines
standardize on logging formats so aggregation is easier
check system processes for the OS in order to plan for capacity

System-Level Tracking:

check host metrics like CPU
check system logs and aggregate them so it is possible to filter on individual hosts
standardize on single query option for searching through logs
standardize on correlation IDs
standardize on an action plan and alert levels
unify aggregation (Riemann or Suro)

Logstash and Graphite/Collectd/Statsd are also often used in conjunction for the collection and aggregation of logs. One can also apply the ELK stack. The Java Metrics Library can also be utilized to get insights of code during production. There are other tool options available like Skyline and Oculus for anomaly detection and correlation.

30 September 2015

Open Data and Knowledge

OpenData is all about making data freely available for all without restrictions and mirrors other open source initiatives. It often parallels that of Data.gov and Data.gov.uk. To get involved with OpenKnowledge one can check out Open Knowledge Labs. OpenKnowledge working group areas and data process tools are listed below.

Lobbying Transparency

Open Access

Open Bibliography

Open Definition

Open Design & Hardware

Open Development

Open Economics

Open Education

OpenGLAM

Open Government Data

Open Humanities

Open Linguistics