Showing posts with label spark. Show all posts
Showing posts with label spark. Show all posts

11 December 2020

Should You Use Flink

Flink is currently a very unstable platform. They have re-instituted the FlinkML which is unstable. They have rebalanced the graph option and the introduction of table. Any stable work now really depends on Spark. The Flink team really need to make up their minds and get their heads around stream processing and the abstracted features they want to provide to the stack. In fact, the Python option is just riddled with bugs. Perhaps, waiting a while might make the entire platform more stable but that is dependent on the goals of the team in the near future. Even the documentation is going slightly pair shaped. When a core aspect of a platform changes, it is best to fork it into a completely separate project. However, this fundamental shift, is what has made the Flink platform so unstable and the documentation untrackable. Maybe, in near future something better would come along to replace Spark and Flink that is ready for commercial use. But, so far it seems Spark is the only real contender in the market, albeit slightly unstable in its own right providing sufficient amount of flexibility without the added frustration. 

4 April 2018

Feature Structure Goals in Spark

Classification & Regression
End Goal:
  • Column of type Double to represent Label
  • Column of type Vector (Sparse or Dense)
Recommendations
End Goal:
  • Column of Users
  • Column of Items
  • Column of Ratings
Unsupervised Learning
End Goal:
  • Column of Type Vector (Sparse or Dense)
Graph Analytics
End Goal:
  • DataFrame of Vertices
  • DataFrame of Edges

8 January 2017

SMACK Stack

S : Scala and Spark (The Engine)
M : Mesos (The Hardware Scheduler)
A : Akka (The Actor Model)
C : Cassandra (The Storage)
K : Kafka (The Message Broker)

A Brief History of Smack
Smack Hands-On
Smack Made Simple
Smack Guide
why is smack stack all rage lately
Smack Slideshare
Smack Personalization

Alternatives for Stream Analytics:
GearPump
Flink

17 May 2016

Engine Paradigms & Systems

Paradigm
System
Explanation
MapReduceHadoopSmall recoverable code tasks, sequential tasks inside map and reduce functions
Dryad/NepheleTezExtends the mapreduce model to DAGs model, backtracking based recovery
PACTsFlinkEmbeded query processing runtime in DAGs engine, extend DAGs to cyclic graphs, incremental construction of graphs
RDDsSPARKFunctional implementation of Dryad recovery (RDDs), restrict to coarse-grained transformations, direct execution of API
Engine Comparison
Hadoop
Tez
Spark
Flink
APImapreduce on
k/v pairs
k/v pairs readers/writerstransformation
on k/v pair collections
iterative transformation
on collections
ParadigmmapreduceDAGRDDCyclic Dataflows
Optimizationnonenoneoptimization
of
SQL
queries
Optimization
in all APIs
Executionbatch sortingbatch sorting and partitioningbatch with memory pinningstream with
out-of-core algorithms
Courtesy of Apache Flink