9 September 2014

API Design

API designs have emerged into formal data model approaches. They may formally define the APIs using markdown or JSON, allowing one to describe the interfaces and models. In this manner, it facilitates communication, provides a way to mock API designs, test a specification, as well as have a maintainable documentation. In a semantic web scenario, JSON-LD may even provide a more formal approach for understanding linked resources in a flexible manner. A few popular API design approaches are mentioned below.
  • API Blueprint - life cycle documentation of APIs with plenty of tooling
  • Swagger - A specification framework for producing, describing, consuming, and visualizing services with active documentation.
  • RAML - a modeling language for RESTful services for describing APIs in a formal way for reuse, discovery, sharing, as well as utilizing extensible best practices.
  • HAL - easy way to hyperlink between resources in APIs using hypermedia

7 September 2014

Big Data Graph Processing

The web with its many hyperlinked documents is a massive graph network for interlinks. Such links provide big data complexities for processing. There are many use cases for where graph processing becomes essential from contextual ads to social network analysis to even linked data. Processing such graphs in the large still remains a challenge even with its many data forms. However, graph processing from standard graph theory and network science has provided many advances for Big Data. The functional programming approaches have also facilitated more robust solutions. In OLTP, it is about the processing low-latency of workloads for accessing small portions of graphs. In OLAP, it is about batch processing workloads for accessing large portions of graphs. A graph can be stored in a specific graph database or even a column store such as Accumulo or Cassandra. They can even be stored on the HDFS. Real-time processing of graphs is also a challenge. In general, standard NoSQL stores will be able to cope with limited lookups and small number of traversals at scale. For complex traversals over the Web of Data, it would require alternative and even combined approaches for scalable batch processing in a distributed way. The below provide some options for frameworks in the big data graph processing.

Giraph
Cassovary
Drill
Impala
JUNG
SNAP
Shark
Hama
GraphX
Titan / Faunas
GraphLab / GraphChi

6 September 2014

RethinkDB

An alternative distributed document store is RethinkDB which is slowly emerging in the mainstream NoSQL environment. However, it is still very much in its infancy for it to be viable as a stable database for any production use. The general trend in NoSQL approaches is towards providing amalgamation of features and tools, big data integration as well as management simplicity for scalability requirements. Restrictive license often also poses a hurdle for many businesses who are looking to scale out of relational database schema. It seems the door into NoSQL implementation is getting wider and wider with a growing plethora of options, features, and language bindings. But, with such a vast options becomes all the more critical in selecting the right database.  RethinkDB is an approach to bind all the good things between Cassandra and CouchDB into one database implementation. It is questionable as to what really one classes as positives for CouchDB. One reason why CouchDB was forked into Couchbase was as a way of harnessing a more stable approach with utilization of Memcache. More often than not, businesses will opt for a MongoDB solution over CouchDB. RethinkDB in time could offer an alternative over MongoDB. There is still a lot that appears to be needed with RethinkDB in terms of native Java support for it to be endorsed into production grade deployments. Also, more use cases of production ready deployments would provide for more community and industry driven insights. RethinkDB, as the name suggests, is a rethink in all things related to document store design philosophy of today and what it can be in the future, without compromising on the good parts.

a comparison of mongodb and rethinkdb with patent data
comparing mongodb and rethinkdb bulk insert performance
rethinkdb vs mongodb
rethinkdb a qualitative review
findthebest

1 September 2014

Rubbish And Senseless People

Some people have strange ways. Why is it when there is literally a bin close by that some feel the need to leave their rubbish right next to it rather than actually putting it in the bin. It is alarming the way some people display their lack of sense. Even in public people feel the need to litter on the ground when there are plenty of litter bins around. One would imagine that in a western society people would have more sense. And, yet in residential accommodations there is always someone that feels the need to take their rubbish and leave it right on ground next to the bin as if to invite rodents. On other occasions we find a total lack of sense with the types of things people seem to want to throw away in rubbish bags. One can find the odd student in city disposing of ice in a rubbish bag and it is times like those when one feels education truly does not work on some individuals. Perhaps, it is time that we started enforcing stricter penalties for littering in public. There are plenty of public cameras about. Why not do it like Singapore and raise the bar to a penalty fine for public littering. On weekends one can find empty bottles everywhere. Is it any wonder why rodents love being around people and living in cities as there is just so much rubbish. Even the collection of rubbish is so inefficient that it is usually the time when almost everyone displays their immense refuge outside their house. Maybe, councils and districts should demand houses start having their own disposable methods. It is a strange effect that towns breed close knit neighborhoods while cities breed distant and more individualism among people. There needs to be better waste management in place in our communities that allows us a way to reuse our refuge for fuel. There is also so much of rubbish that can be given to shelters and charities where people are struggling to make ends meet and do not even have the luxury to throw things away. Cities often times breed selfish uncaring human instincts, which is a direct result of the capitalist ideals. Science is failing waste disposal as a renewable resource. 

30 August 2014

When Not To Use Hadoop

Hadoop has become a necessity for almost all analytical applications that have huge data processing requirements. It also offers an open source flexibility as well as a range of subprojects to facilitate processing, ingestion, and downstreaming of input/outputs. However, Hadoop is not appropriate for all business applications. Often times a first litmus test when deciding to use Hadoop should be to answer a few specific questions around loading and processing of data. If one can load the data in a standard database without much problems then surely Hadoop is not really the way to go. Is a few hundred MB size dataset for processing a business case for Hadoop? What about a few hundred GB of datasets? It is also not a replacement for standard databases. In general, Hadoop has problems dealing with small files. So, having large number of small files is going to be suboptimal for Hadoop compared to large number of large files for processing. This is primarily why the platform works of a MapReduce approach and why the underlining layer is HDFS as standard approaches are just unable to handle such large data processing efficiently, albeit at a cost. Also, working with XML/RDF type of data will pose much problems and require pre-processing for deserialization to other processing formats such as SequenceFilesAvro, Protocol Buffers, and Thrift. Hadoop is also not appropriate for direct real-time processing needs. Although, stream processing has become available. It is most appropriate for as a flexible data warehouse where generally static data is stored for analysis rather than a rapidly changing dataset. It is useful for merging and unlocking large amounts of corporate and even web data from various data sources and providing analytical processing for useful insights and filtering to other systems. Hadoop in the cloud can save much headache for operations management. However, it still requires a careful strategy in the management of an appropriate cluster and capacity planning over namenodes. Otherwise, costs can invariably get out of hand in the cloud very quickly due to high computational processing requirements of Big Data.  The term Big Data also needs some clarity. Datasets in the order of terabytes and petabytes at web scale are aptly classed as Big Data where not only one is working with unstructured data but also size of data is so huge that it could not sensibly fit into a standard data architecture for continuous processing. Hadoop here could work wonderfully with HBase as a storage layer for the unstructured data and then filter more structured data downstream to other more appropriate systems. Increasingly, NoSQL approaches have also started to provide their own equivalent support for MapReduce. For example, MongoDB provides a MapReduce functionality and with its varying use cases, it is also widely used for real-time advertising. Although, MapReduce on MongoDB may not be in any comparison to the level of processing that could be done on Hadoop at scale. One obviously needs to understand firstly their data, and secondly what they plan to do with it. The below links provide further views on why Hadoop may not be the right approach for solving particular business problems.

Mule In Perspective

Service Oriented Architectures are big step towards integration of disparate systems. However, over time the approach of Web Services have branched out from SOAP to REST. There have also emerged many integration approaches from component to mediators as well as full enterprise service bus. Almost every software engineering area has a significant set of design patterns in which to approach large scale solutions. Mule has over the years become a strong contender in the enterprise service bus area. It provides a very open and holistic approach to integration, facilitated by connectors as well as a visual flow mechanisms. However, the platform does have its many quirks and drawbacks that leaves one wondering whether quality assurance was compromised over the sake of releases. The visual flow mechanism is also a very buggy and limiting perspective for a developer who may want to directly utilize XML to gain flexibility. Also, even their training course instructors dispel many truths to significant buggy areas of the platform especially within the Mule Studio. One has to really get their head around the whole idea of visual flows and how to wire them in the most optimized and efficient way. Using Mule most likely will also lead to vendor lock in as well as complexities when it comes to upgrading versions from which backward compatibility of flow components can only be left as questionable. These days one rarely has a full need for such heavy weight enterprise service bus within enterprise architectures. Often using mediators and such can be sufficient. Loose coupling is paramount for service oriented delivery of business applications. However, using Mule one could question whether loose coupling comes at a cost of excessive XML and rigid methods in implementation. These days even integration services provide for multiple forms of functionality towards the full Big Data support for ETL. Although, Mule does support batch processing, one could argue that such implementations should really be separate from the use of ESB. Alternatives, that can provide for a more flexible option for integration include Camel in comparison to Mule, even if they strictly speaking cater to different functional domains. Utilizing Mule in new projects and within large teams could require an investment in time. But, one is always left wondering whether using such a technology is perhaps just over engineering on the problem which can better be solved through more loosely coupled approaches and even a wide range of open source libraries.

23 August 2014

Cheerleaders

What is the point of a cheerleader? Well, essentially as the title says they are supposed to lead the crowd, into a cheer, for their team, during a sporting game. However, the whole aspect of cheerleading has turned into an almost gratuitous and sexualized activity as well as pretty much a sexist affair, during certain sporting events. One would wonder, in a modern society where women are looking for equal rights, should they really be taking on such professions to begin with? One also would wonder as to why male cheerleaders get frowned upon and are quite uncommon as a result. It can be the same way stated towards why so many women choose to go into such unwieldy professions only to later claim for more feminist ideals of equality.  Cheerleading is not a high paying profession, so why do so many women find it interesting compared to modelling where they could command comparatively higher pay scales? Are they just looking to be discovered? Is cheerleading a way for them to head into more seedy professions? Are there no real professions available for women in our society? Are such women just craving for attention and popularity? Or, can this be seen more of an animal instinct where women try to attract the most able of men. It appears to be about equality when it suits them. Should men still be expected to hold doors for women as gentlemen were expected to do of the past? Should women be expected more and more to look after themselves? We still find the gold digger analogy where women with no real ambitions other than to find a wealthy man that can provide for them. In what way is this describing equality of women? Perhaps, such ideals of some women taints the bigger picture of what most women actually want out of society. Obviously, it would be unfair to generalize. It is an undeniable fact that cheerleading makes sporting events interesting and entertaining. Models in adverts are also often used to entice consumers. Models are also used for fashion to showcase new designs. Many do feel that a female body is an art form that should be celebrated. However, where does one draw the line between what is equality and what is deemed as hypocritical?

Semantic Pricing

For many businesses it is critical to have an accurate price to sell their products and services. It also provides them a measure of profitability and growth as well as an indicator of optimization of the balance between pricing right to offset supply and demand. One needs to understand competitors in the market as well as to measure consumer demand, and then to calculate the optimal price. As a result, companies often use complicated pricing analytics as consumer markets can change on a daily basis. Ecommerce is a major mover in pricing analytics and there are plenty of specialized software catered to provide such services for decision makers. However, it seems one could even benefit with more semantic pricing of goods and services in markets. Furthermore, Semantic Web with Linked Data could provide for a more connected form of real-time pricing that can impact the business in a positive way on a daily basis. Semantics add more context which is often needed for business strategy and forecasting. Semantic pricing could also come into effect within locals and regions of consumer markets. Semantic Pricing can also add more granularity to seasonal and holiday variances as well as based on variations in promotions and deals.