3 October 2015

Microservices Monitoring

Breaking down a system into more granular services guided by the single responsibility principle does have multiple benefits of bounded context. However, it can also add a degree of complexity that requires more extensive monitoring. With multiple services interaction in a distributed systems context implies multiple log files and a need to aggregate them as well as multiple places for network latency issues to arise. One simple approach is to monitor everything in the entire workflow of the services as well as the system as whole but at same time try to get the bigger picture through an aggregation process. Also, add structure to the logs by utilizing correlation IDs which can then provide a guided trail. The need to be responsive can also be important so real time alerting may also be needed in order to avoid cascaded issues. One can abstract away the service from the system for a monitoring strategy.  The current trend towards monitoring is in a holistic way to get the full picture of the entire system including all its sub-systems as well as all the services interaction within it. A break down of the types of things that can be monitored and examples of tools is given below.

Service-Level Tracking:
  • check inbound response times, error rates, and application metrics
  • check downstream response health, response times of calls, error rates (Hystrix)
  • standardize metrics collection process and pipelines
  • standardize on logging formats so aggregation is easier
  • check system processes for the OS in order to plan for capacity

System-Level Tracking:
  • check host metrics like CPU
  • check system logs and aggregate them so it is possible to filter on individual hosts
  • standardize on single query option for searching through logs
  • standardize on correlation IDs
  • standardize on an action plan and alert levels
  • unify aggregation (Riemann or Suro)

Logstash and Graphite/Collectd/Statsd are also often used in conjunction for the collection and aggregation of logs. One can also apply the ELK stack. The Java Metrics Library can also be utilized to get insights of code during production. There are other tool options available like Skyline and Oculus for anomaly detection and correlation. 

30 September 2015

Open Data and Knowledge

OpenData is all about making data freely available for all without restrictions and mirrors other open source initiatives. It often parallels that of Data.gov and Data.gov.uk. To get involved with OpenKnowledge one can check out Open Knowledge Labs. OpenKnowledge working group areas and data process tools are listed below.

Lobbying Transparency
Open Access
Open Bibliography
Open Definition
Open Design & Hardware
Open Development
Open Economics
Open Education
OpenGLAM
Open Government Data
Open Humanities
Open Linguistics
Open Product Data
Open Science
OpenSpending
Open Sustainability
Open Transport
Personal Data and Privacy
Public Domain

Extracting:

Cleaning:
Nonmenklature

Analyzing:
R

Presenting:

Sharing:

Further details can be found on School of Data.

Open Data Institute

15 September 2015

Computational Linguistics and NLP Conferences

The below link provides the entire calendar of schedule for computational linguistics and natural language processing conferences in play globally for the year as well as an archive of dates.

18 July 2015

ICML 2015

This year the International Conference on Machine Learning took place in Lille, France. It was a fantastic event to bring research from a diverse areas of Machine Learning in a collaborative setting. The conference went down really well. There was an immense amount of research shared within the community.  Also, a noticeable increase in number of people that attended the conference this year. The schedule was broken down into conferences, workshops, and tutorials. Even an open question and discussion session was available after each session. The banquet was a joyful experience. However, both the banquet and the local Lille food experience was much to be desired. Cheese was on display, in all forms, and showing itself in every french menu. For vegetarians, Lille offers cheese, french fries, and salad. Some of the most popular areas of research covered included: Deep Learning, Topic Modelling, Structured Prediction, Networks and Graphs, Natural Language Processing, Reinforcement Learning, and Transfer Learning. Deep Learning, Reinforcement Learning, and Word2Vec were the most popular researched topics in attendance. Many of the presented papers can be found also on Arxiv. The conference also showed how far Machine Learning has come as well as the level of popularity it has garnered over the years. Machine Learning is proving to be an invaluable area in a multitude of domains which is having profound effects for business and society as a whole.  But, one thing was reverberated throughout the conference that even now there is still a lot to be discovered before Artificial Intelligence can truly match the abilities of a human being.

15 July 2015

London Shopping Centres

Shopping in UK is not comparable to quality and the vast expanse of malls in US. Department stores are also relatively unmatched apart from Harrods, Selfridges, and John Lewis. Not only is shopping in UK far more expensive than US but there is also less variety as well as competition for bargains. But, things are slowly changing in UK especially for London where there is plenty of options and this is because the major city and capital gets such a huge influx of tourists year round. Although, not exhaustive, the following list includes some popular shopping arcades in London as well as a link to a few popular areas around UK. There is also a link to a lonely planet guide to shopping in London.

Awesome Big Data

Big Data has taken off in leaps and bounds for distributed systems as well as machine learning. The following links provides useful set of curated and category list of Big Data frameworks, libraries, resources, and other related technologies. No doubt this will change as the domain has proven to be very dynamic.

3 May 2015

Common Crawl

Common Crawl provides an archive snapshot dataset of the web which can be utilized for massive array of applications. It is also based on the Heritrix archival crawler making it quite reusable and extensible for open-ended solutions whether that be building a search engine against years of web page data, extracting specific data from web page documents, or even to train machine learning algorithms. Common Crawl is also available via the AWS public data repository and accessible via the AWS S3 blob store. There are plenty of MapReduce examples available in both Python and Java to make it approachable for developers. Having years of data at a developer's disposal saves one from manually setting up such crawler processes.