Showing posts with label scala. Show all posts
Showing posts with label scala. Show all posts

16 April 2017

MXNet

MXNet

Scala vs Go Concurrency

Scala:
  • Immutable and persistent data structures
  • First-Class Functions and Closures
  • Concurrency and Remoting with Actor model
  • Software Transactional Memory


Go:
  • Expressive lightweight machine code driven
  • Go-routines and unix pipe-like channels
  • Isolated mutability abstractions for concurrency
  • High-speed compilation

17 February 2017

R, Python, Scala, and Julia

Three languages have become critical as part of the data scientist arsenal of choice: R, Python, and Scala. Major ecosystem of accessible libraries to support statistical computing and machine learning are critical especially at scale. Scala is still a struggling block for data scientists as the language can be quite complex. Often data scientists use R and Python without venturing beyond. However, there is a significant window of computational and data intensive gains to be made with utilizing languages like Julia and Scala. Although, in certain microbenchmarks even the performance of Julia can come into question and even the state of the language. If one is a graduate and just starting out in the domain of data science then Python is the best choice. As a research scholar languages like R, Python, Scala, and even Julia become the languages of choice.  As an employee the usual alternatives are again Python and R and even Scala especially with Spark. However, if one is willing to take the plunge Julia is emerging to be useful contender for Big Data and likely to play a stronger role in the future if the language takes shape within the open source community. In general, if one has a need to be flexible and work with data across a multitude of different algorithms then the choice is often to use R. However, if such flexibility needs to be extended into the use of data structures and external application integration then Python seems to be a better alternative with the optimizations that can be gained from low-level C implementations. But, to build massively scalable components utilizing batch and streaming data pipelines then one can't beat the ecosystem of Big Data use with Java/Scala and Python. Julia still has a long way to go in catching up to the likes of Python. A few areas that still require improvements are in performance, syntax, interoperability with other languages, text formatting, testing issues that make it difficult to write robust code with defensive programming, accessibility of native API, still a very research-led language that is fairly limited in accessibility for the larger open source community for contributions of libraries and frameworks. 

9 February 2017

Big Data Watch

Airflow
Apex
Arrow
Beam
BlinkDB
Cascading
DL4J
Drill
Druid
Flink
Flume
Gearpump
GlusterFS
H2O
Hadoop
Heron
Ignite
Impala
Kafka
Kudu
Mahout
Nifi
Phoenix
Prestodb
Samza
Scalding
Spark
Storm
Streamsets
Zookeeper
Oryx

hadoop ecosystem table

Deep Learning for Various Languages

There are different kinds of deep learning architectures: generative, discriminative, and hybrid. Generative architectures are unsupervised and extract features from data. Discriminative architectures are supervised and classify inputs into classes. Hybrid architectures are made up of both generative and discriminative architectures (generative network feeds into discriminative network). The following provide deep learning libraries in various programming languages, albeit not exhaustive.

Python
Java/Scala
Javascript
Various

8 January 2017

SMACK Stack

S : Scala and Spark (The Engine)
M : Mesos (The Hardware Scheduler)
A : Akka (The Actor Model)
C : Cassandra (The Storage)
K : Kafka (The Message Broker)

A Brief History of Smack
Smack Hands-On
Smack Made Simple
Smack Guide
why is smack stack all rage lately
Smack Slideshare
Smack Personalization

Alternatives for Stream Analytics:
GearPump
Flink