Google Cloud Dataflow Benchmark
dataflow tops spark benchmark test
Google Cloud Dataflow
ApacheFlink-DataFlow (use Beam Runner)
ApacheSpark-DataFlow (use Beam Runner)
Beam vs Spark Comparison
26 August 2016
24 August 2016
NER Projects
Named Entity Recognizers are a form of information extraction focusing precisely on named entities in order to classify them into specifically defined categories which may utilize entity linking. Annotation is a fundamental aspect of this classification. Quality measures often incorporate the use of precision, recall and F1 score (harmonic mean). Evaluations are also often compared against a gold standard: a benchmark that is available under reasonable conditions or the most accurate test possible without restrictions which is defined as the ground truth for the absolute state of information. The below highlight a few open source and commercial projects for NER. One can even utilize semantic web in form of a thesaurus server to incorporate SKOS schemes as a way of classification or annotation of terms in form of embedded URIs. One can view further examples from applications of PoolParty or Apache Stanbol.
OpeNER
Other Libraries for custom NER:
OpenNLP
UIMA
CORENLP
SPACY
NLTK
SyntaxNet & TensorFlow
DL4J
Apache Lucene
KEA
FastText
SpeedRead
Knowledge Population
Benchmarking
NER Survey
Google NLP API
Other Libraries for custom NER:
OpenNLP
UIMA
CORENLP
SPACY
NLTK
SyntaxNet & TensorFlow
DL4J
Apache Lucene
KEA
FastText
SpeedRead
Knowledge Population
Benchmarking
NER Survey
Google NLP API
Labels:
data science
,
linked data
,
metadata
,
natural language processing
,
semantic web
,
text analytics
20 August 2016
Words and Vectors
Clustering has become an active research area driven through deep learning techniques in deriving vectors of understanding in Natural Language Processing. Word2Vec is a fairly actively used technique for clustering. Its input is a text corpus and its output is a set of feature vectors for words. There are many libraries available that provide implementations for word embeddings including Gensim, DL4J, Spark, and others. The following are some variational areas within the same Word2Vec approach.
16 August 2016
Open Semantic Search
Seems like a new open source project in semantic search, quite useful in the coverage of features that they are trying to achieve. Although, it appears it is still a very new project with much to be implemented. However, tracking it would be still very useful.
Popular BigData and Machine Learning Libraries
Machine Learning libraries and frameworks are constantly evolving. However, there is no harmonization with one tool that fits all solutions. It seems quite apparent that as more and more libraries evolve the plethora of Machine Learning libraries to choose from will grow to such levels that they will eventually be shunned and refactored towards the cloud in order to utilize greater data processing requirements for scale out. However, certain libraries have a massive following already in industry as examples of some are listed below. Languages like Python, Java, Scala, and C++ are most suited to such contextual work. However, languages like Go are not far behind either. Most of these libraries are directly related to the progress in academic research in the area which can equally provide an indication of what new approaches can be utilized now and what may be possible in the future.
TensorFlow
DL4J
DataFlow
Flink
Spark
Theano
ScikitLearn
GraphLab
Mahout
SpringXD
14 June 2016
Game Theory
Game Theory in many respects is the bedrock of advertising and algorithmic trading markets. Combine this with Complex Networks and formal Machine Learning and one has a decisive strategy model. The mathematical models are also applied in Multiagent Systems for studying Argumentation Theory and communication between agents. A Beautiful Mind was a movie that perhaps made Game Theory and the concept of Nash Equilibrium a more mainstream concept. There are endless applications to the field and a few reference sources of further study are provided below.
Labels:
big data
,
contextual ads
,
data science
,
ecommerce
,
economics
,
finance
,
intelligent web
,
machine learning
,
multiagents
,
society
Complex Networks
The rising scale of data and the need for information gain has provided a greater need towards understanding patterns to form knowledgeable insights. In many cases, such patterns can be derived through machine learning and data mining. But, also through studying complex networks that form within contextual data. The below links provide useful sources of study in the science of complex networks.
Labels:
big data
,
data science
,
infuencegraph
,
intelligent web
,
machine learning
,
nosql
,
social media
,
social networks
,
socialgraph
12 June 2016
Daily Free Packt EBooks
Labels:
big data
,
data science
,
Java
,
nodejs
,
nosql
,
programming
,
python
,
scala
,
software engineering
Subscribe to:
Posts
(
Atom
)