Flink is currently a very unstable platform. They have re-instituted the FlinkML which is unstable. They have rebalanced the graph option and the introduction of table. Any stable work now really depends on Spark. The Flink team really need to make up their minds and get their heads around stream processing and the abstracted features they want to provide to the stack. In fact, the Python option is just riddled with bugs. Perhaps, waiting a while might make the entire platform more stable but that is dependent on the goals of the team in the near future. Even the documentation is going slightly pair shaped. When a core aspect of a platform changes, it is best to fork it into a completely separate project. However, this fundamental shift, is what has made the Flink platform so unstable and the documentation untrackable. Maybe, in near future something better would come along to replace Spark and Flink that is ready for commercial use. But, so far it seems Spark is the only real contender in the market, albeit slightly unstable in its own right providing sufficient amount of flexibility without the added frustration.
Showing posts with label spark. Show all posts
Showing posts with label spark. Show all posts
11 December 2020
9 April 2018
Deep Learning Pipelines with Spark
BigDL - CPU Optimized
DeepLearning4J - JVM
DeepLearning Pipelines - Integration
MLLIB Perceptron - Integration
TensorflowOnSpark - Integration
TensorFrames - Integration
DeepLearning4J - JVM
DeepLearning Pipelines - Integration
MLLIB Perceptron - Integration
TensorflowOnSpark - Integration
TensorFrames - Integration
Labels:
artificial intelligence
,
big data
,
data science
,
deep learning
,
distributed systems
,
machine learning
,
spark
4 April 2018
Feature Structure Goals in Spark
Classification & Regression
End Goal:
End Goal:
End Goal:
End Goal:
End Goal:
- Column of type Double to represent Label
- Column of type Vector (Sparse or Dense)
End Goal:
- Column of Users
- Column of Items
- Column of Ratings
End Goal:
- Column of Type Vector (Sparse or Dense)
End Goal:
- DataFrame of Vertices
- DataFrame of Edges
31 March 2018
22 March 2018
5 March 2018
Beam Capability Matrix
Labels:
big data
,
data science
,
distributed systems
,
event-driven
,
flink
,
Java
,
message-driven
,
python
,
spark
13 February 2018
py4J
Labels:
big data
,
data science
,
distributed systems
,
Java
,
machine learning
,
programming
,
python
,
spark
22 April 2017
25 March 2017
5 March 2017
21 February 2017
Data Science & Big Data Salary Surveys
big data salary
big data salaries top bi data warehousing
the new tech job paying up 500k on wall street
Harnham Salary Guide 2015
2016 data science salary survey
2016 data science salary survey
2015 data science salary survey
big data salaries top bi data warehousing
the new tech job paying up 500k on wall street
Harnham Salary Guide 2015
2016 data science salary survey
2016 data science salary survey
2015 data science salary survey
Labels:
big data
,
Cloud
,
data science
,
flink
,
hadoop
,
linked data
,
machine learning
,
nosql
,
semantic web
,
spark
20 February 2017
16 February 2017
8 January 2017
SMACK Stack
S : Scala and Spark (The Engine)
M : Mesos (The Hardware Scheduler)
A : Akka (The Actor Model)
C : Cassandra (The Storage)
K : Kafka (The Message Broker)
A Brief History of Smack
Smack Hands-On
Smack Made Simple
Smack Guide
why is smack stack all rage lately
Smack Slideshare
Smack Personalization
Alternatives for Stream Analytics:
GearPump
Flink
M : Mesos (The Hardware Scheduler)
A : Akka (The Actor Model)
C : Cassandra (The Storage)
K : Kafka (The Message Broker)
A Brief History of Smack
Smack Hands-On
Smack Made Simple
Smack Guide
why is smack stack all rage lately
Smack Slideshare
Smack Personalization
Alternatives for Stream Analytics:
GearPump
Flink
Labels:
akka
,
big data
,
cassandra
,
data science
,
distributed systems
,
hadoop
,
kafka
,
machine learning
,
nosql
,
reactive
,
scala
,
spark
29 October 2016
24 September 2016
Spark vs Flink
Labels:
big data
,
data science
,
distributed systems
,
event-driven
,
flink
,
hadoop
,
Java
,
machine learning
,
scala
,
spark
17 May 2016
Engine Paradigms & Systems
Paradigm
|
System
|
Explanation
|
---|---|---|
MapReduce | Hadoop | Small recoverable code tasks, sequential tasks inside map and reduce functions |
Dryad/Nephele | Tez | Extends the mapreduce model to DAGs model, backtracking based recovery |
PACTs | Flink | Embeded query processing runtime in DAGs engine, extend DAGs to cyclic graphs, incremental construction of graphs |
RDDs | SPARK | Functional implementation of Dryad recovery (RDDs), restrict to coarse-grained transformations, direct execution of API |
Engine Comparison
|
Hadoop
|
Tez
|
Spark
|
Flink
|
---|---|---|---|---|
API | mapreduce on k/v pairs | k/v pairs readers/writers | transformation on k/v pair collections | iterative transformation on collections |
Paradigm | mapreduce | DAG | RDD | Cyclic Dataflows |
Optimization | none | none | optimization of SQL queries | Optimization in all APIs |
Execution | batch sorting | batch sorting and partitioning | batch with memory pinning | stream with out-of-core algorithms |
Courtesy of Apache Flink
Subscribe to:
Posts
(
Atom
)