Mabble Rabble: scala

Three languages have become critical as part of the data scientist arsenal of choice: R, Python, and Scala. Major ecosystem of accessible libraries to support statistical computing and machine learning are critical especially at scale. Scala is still a struggling block for data scientists as the language can be quite complex. Often data scientists use R and Python without venturing beyond. However, there is a significant window of computational and data intensive gains to be made with utilizing languages like Julia and Scala. Although, in certain microbenchmarks even the performance of Julia can come into question and even the state of the language. If one is a graduate and just starting out in the domain of data science then Python is the best choice. As a research scholar languages like R, Python, Scala, and even Julia become the languages of choice. As an employee the usual alternatives are again Python and R and even Scala especially with Spark. However, if one is willing to take the plunge Julia is emerging to be useful contender for Big Data and likely to play a stronger role in the future if the language takes shape within the open source community. In general, if one has a need to be flexible and work with data across a multitude of different algorithms then the choice is often to use R. However, if such flexibility needs to be extended into the use of data structures and external application integration then Python seems to be a better alternative with the optimizations that can be gained from low-level C implementations. But, to build massively scalable components utilizing batch and streaming data pipelines then one can't beat the ecosystem of Big Data use with Java/Scala and Python. Julia still has a long way to go in catching up to the likes of Python. A few areas that still require improvements are in performance, syntax, interoperability with other languages, text formatting, testing issues that make it difficult to write robust code with defensive programming, accessibility of native API, still a very research-led language that is fairly limited in accessibility for the larger open source community for contributions of libraries and frameworks.

9 February 2017

Big Data Watch

Airflow
Apex
Arrow
Beam
BlinkDB
Cascading
DL4J
Drill
Druid
Flink
Flume
Gearpump
GlusterFS
H2O
Hadoop
Heron
Ignite
Impala
Kafka
Kudu
Mahout
Nifi
Phoenix
Prestodb
Samza
Scalding
Spark
Storm
Streamsets
Zookeeper
Oryx

hadoop ecosystem table

Deep Learning for Various Languages

There are different kinds of deep learning architectures: generative, discriminative, and hybrid. Generative architectures are unsupervised and extract features from data. Discriminative architectures are supervised and classify inputs into classes. Hybrid architectures are made up of both generative and discriminative architectures (generative network feeds into discriminative network). The following provide deep learning libraries in various programming languages, albeit not exhaustive.

Python

Java/Scala

Javascript

Various

MXNet

awesome-deep-learning
awesome-deeplearning-resources

8 February 2017

Data Visualization Tools

Below are a few useful data visualization tools, albeit not exhaustive.

Cytoscape
NetworkX
Gelphi
Lumify
D3
Seaborn
Bokeh
Tableau
Fusion Tables
Qlik
Zeppelin
Datawrapper
Raw
InfoViz
Sigma.js
iGraph

Apache Projects Directory

8 January 2017

SMACK Stack

S : Scala and Spark (The Engine)
M : Mesos (The Hardware Scheduler)
A : Akka (The Actor Model)
C : Cassandra (The Storage)
K : Kafka (The Message Broker)

A Brief History of Smack
Smack Hands-On
Smack Made Simple
Smack Guide
why is smack stack all rage lately
Smack Slideshare
Smack Personalization

Alternatives for Stream Analytics:
GearPump
Flink

16 December 2016

Twitter Open Source Projects

Scalding
Algebird
Finagle
FlockDB
Finatra
Ambrose
Parquet
Summingbird
Bootstrap
Bower
Flight
Twemcache
Heron
Cassovary

Mabble Rabble

20 April 2020

Beam Capability Matrix

28 June 2019

Ethereum

25 November 2017

Database Migrations

4 September 2017

Reactive Streams

22 April 2017

Comparing Deep Learning Frameworks

16 April 2017

MXNet

Scala vs Go Concurrency

25 March 2017

Crunch & Scrunch

16 March 2017

Clerezza & UIMA Integration

5 March 2017

Akka Streams

CQRS

Clever Cloud

21 February 2017

Awesome-Vertx

17 February 2017

R, Python, Scala, and Julia