26 March 2015

Deep Learning for Java

Deep learning has become the next big thing in realization of Artificial Intelligence. However, many libraries and frameworks are still very much experimental and for research purposes. In realistic business applications, in order for deep learning to be a viable option it has to be scalable over Big Data. Cloud environments and massive parallelization have made such scalability requirements of Machine Learning a possibility. DL4J is an open source library much needed in the Deep Learning community. It  provides for an interesting option and an array of developer friendly neural network implementations. Whatever the domain requirements are for a business, DL4J provides a viable and accessible option towards delivering a working production ready implementation.

Deep Learning A Practitioner's Approach
Fundamentals of Deep Learning Designing Next-Generation Machine Intelligence Algorithms

24 March 2015

Natural Language Processing

Natural Language Processing has come a long way from the past eras of rules driven approaches to utilizing more Machine Learning techniques, paving the way to even more advanced hybrid methods. The area is also quite diverse and constantly growing with active research in the community. We also find Natural Language Processing as an applied discipline for almost all web and document related extraction problems. However, there is still room for more scalable libraries and frameworks as they seem to emerge out of mainly research and at times also have restricted user licenses. Natural Language Processing applications are usually designed in a pipeline architecture. They can also utilize rich domain semantics from Linked Data ontologies, vocabularies, thesauri, or even commonsense knowledge bases. Increasingly, they are also utilizing deep learning methods. In general, there are also formal frameworks supported by industry collaborations such as UIMA for building entire pipelines. Or, even frameworks like Gate that provide a variety of pluggable libraries for different domain cases and tasks in the pipeline. The following are some interesting libraries in the domain area that could be applied for Natural Language Processing applications.

solrjava
elasticsearchjava
coreNLPjava
gatejava
jgibbLDAjava
kea/mauijava
lingPipejava
minorthirdjava
openNLPjava
sphinxjava
spotlightjava
wekajava
tikajava
carrot2java
UIMAjava
wordfreakjava
cleartkjava
dkprojava
baliejava
simpleNLGjava
openCCGjava
glossaryjavascript
lingojavascript
naturaljavascript
nlp-nodejavascript
pos-jsjavascript
redsjavascript
tfidfjavascript
conceptnetpython
gensimpython
nltkpython
patternpython
textblobpython
rakepython
spacypython
scalaNLPscala
linkgrammarc

16 March 2015

Mind Mapping

Mind mapping and brainstorming tools come in handy for visually working out relationships between ideas and concepts. They often elucidate our thoughts towards more plausible and realistic outcomes. Mind mapping tools are also useful to information architects in structuring out information flows towards concepts within a domain context. Often brainstorming exercises are the best way of working out all the corner cases of a knowledge representation on data. The whole process can also provide a way of discovering new connected ideas and storyboarding before formalizing into an implementation strategy. The below provides links to a few mind mapping tools.

4 March 2015

Online CI Providers

Hosted Continuous Integration is a hot area but also a very competitive domain. While some choose to have it hosted in the cloud others like to have more corporate autonomy with using such tools as Jenkins and TeamCity. Continuous Integration is an agile work flow practice that involves developers to integrate on code, in shared repositories, and utilize automated tests to verify for build quality, in order to allow teams to check for issues, early and often, on a daily basis. A step further in the Continuous Integration process is Continuous Delivery. Continuous Delivery being the hardest bit to fully achieve on a large complex architecture and may even prove to be foolhardy. Although, CI has been around for years, it really boils down to team dynamics and whether one really has the time to manually setup and monitor builds in comparison to a hosted option. In some corporate environments, teams may even have a dedicated team member for build and configuration management. The following is a list of a few hosted Continuous Integration providers and the different use cases that they provide for an agile software engineering process. 

TravisCI
DroneIO
CircleCI
BuildHive/CloudBees
SemaphoreApp
ShiningPanda
Hosted-CI
Bamboo
CodeShipIO
MagnumCI
SnapCI
SolanoLabs
ShipIO
Shippable
Wercker
Appveyor
ZeroCI
dployIO

Comparison of continuous integration software

28 February 2015

Alternatives To OpenRefine

OpenRefine which used to be part of a Google project stream has become an almost irreplaceable tool for data cleansing and transformations. This is a part of activity regarded generally as data wrangling. One can clean messy data, transform data into various normalizations/denormalizations, parse data from various websites, merge data from various sources, and reconcile with Freebase (this has now been discontinued and work continues on Wikidata). However, the tool does have its many quirks and limitations. There are quite a few tools available as alternatives, most of which stem from research then end up becoming commercial products in their own right. Unfortunately, other open source options are only left as experimental and then slowly are made unavailable for public use. A few interesting free alternatives are listed below. 

DataWrangler (commercialized into Trifacta)
Karma
Potluck
Exhibit
FusionTables
Many Eyes (discontinued)
DataCleaner

School of Data Online Resources

Alternatives To Zookeeper

Zookeeper has over the years become a basis for many open source distributed service projects. Often the important aspects to consider when choosing the right location and coordination services is to understand the right discovery architecture as well as the operational requirements. In general, the key concerns are in load balancing, monitoring, integration, dependencies during runtime, as well as availability needs. As quantity of disparate service needs grow for scalability it becomes paramount to have dynamic service registries and discovery to coordinate their changing location and deployments in order to minimize failure and interruption. In many respects, Zookeeper can be viewed as a relatively old implementation and does not provide many out of the box service discovery options compared to new alternatives. Consul, for example, goes some ways further than Zookeeper in providing certain functional features and capabilities. There are also other interesting options like Eureka, Etcd, and Serf.  The intention for many dynamic service registries is to resolve the downsides of using standard DNS for finding nodes in a highly dynamic environments. The following list provides some alternatives to Zookeeper both from perspective of general and single purpose registries and coordination.

Consul
Doozer
Etcd
Serf
Accord
OpenReplica
Eureka
SmartStack
NSQ
SkyDNS
SpotifyDNS

JavaScript Map Libraries

JavaScript ecosystem provides for an insurmountable options for mapping and for building holistic GIS applications. There is a huge array of libraries, plugins, and APIs to choose from to harness, process, and customize the visualization of data. GIS is a hot domain that is advancing at a fast pace especially as public service initiatives are unlocking key data for developers to explore and for building creative applications. Although, not fully exhaustive, the below list provides some interesting JavaScript mapping tools.

Leaflet.js
OpenLayers
MapBox
GoogleMaps
ModestMaps
PolyMaps
D3.js
DataMaps
Raphaeljs
jVectorMap
JQVMap
GeoChart
NokiaMaps
MapQuest Maps
Bing Maps
AmMap
GeoPrism
GeoExt
Mapstraction
gMaps
Kartograph
ArcGIS
ViaMichelin
Geo5
Cesium
WebGL Globe
OSMbuildings
Stately
Clickable Maps
jHere
Jump
jQueryGEO
jQuery Mapael
jQuery Birdseye
MaPlace
jMapping
GoMap
GMap3
MapEscape
GeoComplete
MapMarker
QuikMaps
CrowdMaps
AniMap
MapBuilder
Click2Map
ZeeMaps
deCarta
Cloudmade
ESRI
UMapper
ClustrMaps
ReadyMaps
TimeMap
Cartographer
Processing

A simple map making tutorial