Mabble Rabble

29 October 2016

Big Data Stream Processing

Spark
Flink
DataFlow/Beam
Streamsets

awesome streaming

26 October 2016

Machine Learning Taxonomy

Machine Learning is about designing algorithms that provide a computer the means to learn, often from finding patterns in the data. The below outline the key taxonomy areas of machine learning.

Supervised Learning

Semi-Supervised Learning
Unsupervised Learning
Reinforcement Learning
Transduction
Learning to Learn

Scala Data Tools

A list is provided below of the general mathematics and machine learning data tools that have emerged in Scala aside from the Hadoop and Scala API's for databases.

Algebird: Twitter’s API for abstract algebra that can be used with almost any Big Data API.
Factorie: A toolkit for deployable probabilistic modeling, with a succinct language for creating relational factor graphs, estimating parameters, and performing inference.
Figaro: A toolkit for probabilistic programming.
H2O: A high-performance, in-memory distributed compute engine for data analytics. Written in Java with Scala and R APIs.
Relate: A thin database access layer focused on performance.
ScalaNLP: A suite of Machine Learning and numerical computing libraries. It is an umbrella project for several libraries, including Breeze, for machine learning and numerical computing, and Epic, for statistical parsing and structured prediction.
ScalaStorm: A Scala API for Storm.
Scalding: Twitter’s Scala API around Cascading that popularized Scala as a language for Hadoop programming.
Scoobi: A Scala abstraction layer on top of MapReduce with an API that’s similar to Scalding’s and Spark’s.
Slick: A database access layer developed by Typesafe.
Spark: The emerging standard for distributed computation in Hadoop environments, as well in Mesos clusters and on single machines (“local” mode).
Spire: A numerics library that is intended to be generic, fast, and precise.
Summingbird: Twitter’s API that abstracts computation over Scalding (batch mode) and Storm (event streaming).

25 October 2016

Reactive Manifesto

The Reactive Manifesto is an effort to provide a definition of what a reactive system should look like with four sets of characteristics:

Message or Event-driven: As a baseline the system needs to respond to messages or events

Elastically Scalable: System needs to meet scale out demands (horizontal scaling via processes, cores, nodes)

Resilient: System needs to be able to recover gracefully from failures

Responsive: System is available for service requests even if this means graceful degradation of failed components during high traffic

Reactive Extensions
Functional Reactive Programming
Akka (Actors Model)

21 October 2016

Alternatives to Kafka

Kinesis
RabbitMQ
ZeroMQ
Kudu
Storm
Samza
SQS
Redis
Aeron
MAPR Streams

Kafka for Beginners
Confluent

One must make note that Storm and Samza can in fact be used along side Kafka in a data pipeline. It is the context of how one plans to use a platform, invariably dictated by the given constraints of the problem at hand, which may be in form of either batch or real-time streams for that matter.

18 October 2016

Beer Slangs

Homebrew uses beer analogy as a MAC package manager. Beer is also a staple for social gatherings with the data science field. It has become an essential element of society. Over the years it has evolved with a diverse set of regional slangs as well as the variety of flavors from around the world. Even an ontology can be produced for the consumable term for beer in form of a concept or thing as well as a product with a set of ingredients, categories, and tastes. In process, helping people to explore and produce a recommendation graph to associate to their evolving tastes, merry meet ups, and as a choice for food accompaniment.

beer slang
thrillist
beerslanging
15 brewtastic ways say beer
craftbeer
alldownunder
irishdrinking
1800s beer slang

13 October 2016

Frozen Yogurts in London

Frozen yogurts are an interesting analogy of applying machine learning or specifically data science towards understanding the customer based on the scoops and taste choices. Analytics has given way towards self-service frozen yogurts putting the choice of the flavors at the hands of the user in process improving the customer experience. This defines a value shift towards the user and the association of data that relates to them. It also shows huge investments do not need to be made to shift business models. A self-service actually reduces labor costs. This is all part of analytics towards the maximization of revenue. By shifting the control to the user, one can allow a customer to attain better satisfaction and a sense of assurance that they are getting their money's worth. The below list provides a few interesting frozen yogurt places in a dynamic society of London.

Pinkberry
Snog
Itsu
Frae
Moosh
Moto Yogo
Yoomoo
Yogland

12 October 2016

Sentiment Ontologies

SenticNet
Marl
Onyx
EmotionML
EmotionML Vocabularies
Lemon
FOAF

Sentiment Analysis in Social Networks