Big Data has taken off in leaps and bounds for distributed systems as well as machine learning. The following links provides useful set of curated and category list of Big Data frameworks, libraries, resources, and other related technologies. No doubt this will change as the domain has proven to be very dynamic.
15 July 2015
16 May 2015
London Tech Accelerators and Incubators
Tech Accelerators and Incubators can boost a startup investment as well as provide for shared networking option on premise support and business transformation. The following are some interesting accelerators and incubators in London.
Entrepreneur First (*Best to avoid this one)
3 May 2015
Common Crawl
Common Crawl provides an archive snapshot dataset of the web which can be utilized for massive array of applications. It is also based on the Heritrix archival crawler making it quite reusable and extensible for open-ended solutions whether that be building a search engine against years of web page data, extracting specific data from web page documents, or even to train machine learning algorithms. Common Crawl is also available via the AWS public data repository and accessible via the AWS S3 blob store. There are plenty of MapReduce examples available in both Python and Java to make it approachable for developers. Having years of data at a developer's disposal saves one from manually setting up such crawler processes.
Labels:
big data
,
data science
,
intelligent web
,
linked data
,
nosql
,
semantic web
,
text analytics
,
webcrawler
,
webscraper
30 April 2015
London Clubs, Hubs, and Co-Workspaces
London has a vibrant and growing entrepreneurial, startup, and freelance culture. And, there is a growing amount of places where such people hang out to network as well as find productivity. The below list provides a glimpse of a few places available in London.
TechHub
CentralWorking
Club Workspace
TheCube
One Alfred Place
The Clubhouse
The Trampery
Impact Hub
90 Mainyard
ClubRooms
Co-Work
Le Bureau
TechSpace
Rainmaking Loft
Google Campus
TechStars
Level39
Microsoft Ventures
Wayra
Hoxton Mix
Collider
Whitebear Yard
Biosciences Innovation Centre
The Bakery
TechCity
Small Business Centre
Seedcamp
Oxygen Accelerator
London City Incubator
Innovation Warehouse
Imperial Innovations
IDEA London
Healthbox London
Emerge Education
EF
EdTech Incubator
Eden Ventures
Digital Greenwich
Climate-KIC
Box UK
Bethnal Green Ventures
Beta Foundry
Bestport Ventures
BBC Worldwide Labs
Bathtub 2 Boardroom
Accelerator Academy
Wework
26 March 2015
Deep Learning for Java
Deep learning has become the next big thing in realization of Artificial Intelligence. However, many libraries and frameworks are still very much experimental and for research purposes. In realistic business applications, in order for deep learning to be a viable option it has to be scalable over Big Data. Cloud environments and massive parallelization have made such scalability requirements of Machine Learning a possibility. DL4J is an open source library much needed in the Deep Learning community. It provides for an interesting option and an array of developer friendly neural network implementations. Whatever the domain requirements are for a business, DL4J provides a viable and accessible option towards delivering a working production ready implementation.
Deep Learning A Practitioner's Approach
Fundamentals of Deep Learning Designing Next-Generation Machine Intelligence Algorithms
Deep Learning A Practitioner's Approach
Fundamentals of Deep Learning Designing Next-Generation Machine Intelligence Algorithms
24 March 2015
Natural Language Processing
Natural Language Processing has come a long way from the past eras of rules driven approaches to utilizing more Machine Learning techniques, paving the way to even more advanced hybrid methods. The area is also quite diverse and constantly growing with active research in the community. We also find Natural Language Processing as an applied discipline for almost all web and document related extraction problems. However, there is still room for more scalable libraries and frameworks as they seem to emerge out of mainly research and at times also have restricted user licenses. Natural Language Processing applications are usually designed in a pipeline architecture. They can also utilize rich domain semantics from Linked Data ontologies, vocabularies, thesauri, or even commonsense knowledge bases. Increasingly, they are also utilizing deep learning methods. In general, there are also formal frameworks supported by industry collaborations such as UIMA for building entire pipelines. Or, even frameworks like Gate that provide a variety of pluggable libraries for different domain cases and tasks in the pipeline. The following are some interesting libraries in the domain area that could be applied for Natural Language Processing applications.
solr | java |
elasticsearch | java |
coreNLP | java |
gate | java |
jgibbLDA | java |
kea/maui | java |
lingPipe | java |
minorthird | java |
openNLP | java |
sphinx | java |
spotlight | java |
weka | java |
tika | java |
carrot2 | java |
UIMA | java |
wordfreak | java |
cleartk | java |
dkpro | java |
balie | java |
simpleNLG | java |
openCCG | java |
glossary | javascript |
lingo | javascript |
natural | javascript |
nlp-node | javascript |
pos-js | javascript |
reds | javascript |
tfidf | javascript |
conceptnet | python |
gensim | python |
nltk | python |
pattern | python |
textblob | python |
rake | python |
spacy | python |
scalaNLP | scala |
linkgrammar | c |
Labels:
big data
,
information retrieval
,
intelligent web
,
Java
,
JavaScript
,
linked data
,
natural language processing
,
python
,
scala
,
text analytics
16 March 2015
Mind Mapping
Mind mapping and brainstorming tools come in handy for visually working out relationships between ideas and concepts. They often elucidate our thoughts towards more plausible and realistic outcomes. Mind mapping tools are also useful to information architects in structuring out information flows towards concepts within a domain context. Often brainstorming exercises are the best way of working out all the corner cases of a knowledge representation on data. The whole process can also provide a way of discovering new connected ideas and storyboarding before formalizing into an implementation strategy. The below provides links to a few mind mapping tools.
Labels:
big data
,
data science
,
information retrieval
,
intelligent web
,
linked data
,
semantic web
,
visualization
4 March 2015
Online CI Providers
Hosted Continuous Integration is a hot area but also a very competitive domain. While some choose to have it hosted in the cloud others like to have more corporate autonomy with using such tools as Jenkins and TeamCity. Continuous Integration is an agile work flow practice that involves developers to integrate on code, in shared repositories, and utilize automated tests to verify for build quality, in order to allow teams to check for issues, early and often, on a daily basis. A step further in the Continuous Integration process is Continuous Delivery. Continuous Delivery being the hardest bit to fully achieve on a large complex architecture and may even prove to be foolhardy. Although, CI has been around for years, it really boils down to team dynamics and whether one really has the time to manually setup and monitor builds in comparison to a hosted option. In some corporate environments, teams may even have a dedicated team member for build and configuration management. The following is a list of a few hosted Continuous Integration providers and the different use cases that they provide for an agile software engineering process.
TravisCI
DroneIO
CircleCI
BuildHive/CloudBees
SemaphoreApp
ShiningPanda
Hosted-CI
Bamboo
CodeShipIO
MagnumCI
SnapCI
SolanoLabs
ShipIO
Shippable
Wercker
Appveyor
ZeroCI
dployIO
Comparison of continuous integration software
Labels:
agile
,
Cloud
,
CSS
,
Go
,
groovy
,
html
,
Java
,
JavaScript
,
mobile
,
nodejs
,
programming
,
python
,
scala
,
scrum
,
software engineering
,
webservices
Subscribe to:
Posts
(
Atom
)