30 December 2013

Year 2013

As we look back towards the end of the year 2013, we can wonder about the many past occurrences in the world that made it eventful. Like they say it is always best to look on the brighter side of life as we project on into a gradual transition to 2014. It seems as things change the more they do stay the same and at the very slow pace life transitions in to a higher level of modernity while time skips ahead. We wonder what new experiences one will feel in new year for a non-stop rhythmic beats of the heart and the soul searching mind destined for greatness. The below lists a few major events that took place in 2013:

China Moon Rover Lands on Moon.(Dec-10)
Nelson Mandela Dies at 95.(Dec-04)
Iran Nuclear Deal.(Nov-22)
Super Typhoon Haitan Devastates Philippines.(Nov-07)
US government shutdown.(Oct-01)
Kenya Mall Attack.(Oct-21)
Washington DC Navy Yard Shooting.(Sep-17)
Tokyo Japan wins to host 2020 Olympics.(Sep-07)
Microsoft buys Nokia mobile business at $7.2bn.(Sep-02)
NASDAQ in error for three hours during trading time.(Aug-21)
Syria Chemical Attack Allegation.(Aug-21)
Spain train crash.(Jul-25)
Detroit files for bankruptcy.(Jul-18)
Boeing 777 Crashes at San Francisco Airport.(Jul-06)
Egypt Army Ousts President Mursi.(Jun-22)
US NSA Prism Program.(Jun-06)
Baltimore cruise ship fire in Caribbean.(May-27)
London shocked after brutal machete attack.(May-22)
Huge tornado hits Oklahoma.(May-20)
Scientists successfully cloned human stem cells.(May-17)
Yahoo buy Tumblr for $1.1bn.(May-17)
Bangladesh Factory Collapse, Over 700 Dead.(May-05)
Deadly earthquake hits China Sichuan.(Apr-18)
Boston Marathon Blasts.(Apr-16)
Margaret Thatcher Dies at 87.(Apr-07)
The rise of bitcoin.(Mar-20)
Cyprus plans to tax bank deposits.(Mar-17)
Meteorites injured hundreds in Russia.(Feb-14)
Pope Benedict XVI resigned.(Feb-10)
About 230 Dead During Brazil Nightclub Fire.(Jan-27)
Algeria Hostage Crisis.(Jan-16)
Stampede During New Year Fireworks in Ivory Coast.(Jan-01)

27 December 2013

Mobile Frameworks

Mobile applications are a hot commercial market for software developers. However, with the limitations of the user interface it brings with it a set of unique complexities and functionality. Responsive applications become all the more important on mobile devices and effect the web browser feel as well. There is a lot of competition in market from both mobile devices as well as applications. JavaScript and HTML frameworks are likely to play an even bigger role in the mobile applications development as they move further into native development. However, it has always been the case on mobile development that either C/C++/Objective-C, Java, and perhaps even Erlang have had a strong hold for native applications. One major issue for mobile development is to develop applications that can work across multiple devices seamlessly. In process, it forces one to lose native access to the platform in certain respects. Android and iOS have been the most popular platforms in last few years. The following framework options have been quite workable and popular on mobile platforms.


PhoneGap and Cordova are essentially similar.  However, PhoneGap utilizes Cordova under the covers as it's a distribution. 

Linked Data Platform

A new transformation in W3C specification to provide more flexibility and reduced sense of complexity for linked data. A paradigm shift from heavy weight RDF/SPARQL towards a more flexible and natural approach to viewing resources simply in manner of REST design and implementation. In so doing, emerging into a more closer dynamics of web of data which essentially aspires to become the global shared graph database on the web easing the path towards more semantically rich linked documents and navigation. The LDP tries to provide for a standard set of principles and patterns for the interaction of linked data via HTTP verbs.  The benefits of which become apparent towards remote web queries for URI based resources and the traversal of the web as one big connected graph of information. Documents then become more meaningful as they can hold more semantic information in form of concepts to provide metadata of what things they specifically are about which could be anything from people, places, events, products, and more. Also, the use of JSON-LD is a further accessible approach on the semantic data.

A few interesting projects coming out include: 
Stanbol - semantic data for content management 
Marmotta - a linked data platform 
Callimachus - a data-driven semantic web framework
Lyo - an Eclipse based SDK for Linked Lifecycle Data
Tabulator - semantic data browser and editor adds LDP support
RWW-Play - Linked Data Profile
Node_ldp - linked data platform for Node.js
MyProfile - WebId based authentication/authorization API
ELDA  - an implementation of the Linked Data API

OpenOrg

26 December 2013

Britain Largest Economy

Chances of Britain ever becoming the largest economy in EU, in distant future, would mean that the banking industry in the country will gain more rights to do what they want at the expense of taxpayers. What it really means is that income divide will increase. The rich will get richer. More corporations will setup outsourcing to save costs. Loss of jobs in the long run for British people. Even shortage of housing as more foreigners take ownership of companies and houses. It means more investment into the country at the expense of the taxpayer. If China were to ever supersede America it can only mean at the level of many Chinese people who still struggle to live in the country. Top economy does not mean great things for everyone. Higher taxes and higher interest rates are inevitable. Inflation will be higher but it will never balance in salaries. Cost of living rises in top economy and yet the value and quality of life decreases for many. If America which is near close to a capitalist economy lags behind then this can only mean that Britain and China will move ahead at the expense of long term sustainability. However, the rise of the EU and the failure of many regions within, means that Britain has a surviving chance. The EU has not been good for either the struggling countries nor for the more stable ones. EU has also instigated many unnecessary trade restrictions which does not make it any easier for economic growth. As EU struggles it dissipates its effects into Britain, Asia, and the Americas. Even the emerging economies rely on the EU for much more than just trade. Globalization often means there will be volatility and fluctuations in economic returns from one region to the next. Often immigration of unskilled workers is detrimental for most economies as it digs into public services and means little to no productivity of the economy. One of the fastest ways towards sustainable economic growth is consumer spending. The power of the consumer and taxpayer is key to almost any economy even beyond businesses. It is ultimately the consumer that drives the market economy and that provides the distillation of supply and demand within any business provided products and services. And, it is that very cycle that provides for positive job growth. These days the consumer and taxpayer is ignored while policies and bureaucratic decisions are made under the covers of corporate greed and shareholder value to facilitate more control and less progress for the taxpayer. The balance of power between the taxpayer, governments, and businesses ultimately means the shift towards economic progress for all.

Text Extraction From Image Files

It is possible to extract text from document files, from web pages, from even pdf files. However, one can run into erratic situations when it comes to image based files. For example, how do you extract floor plan details from an image file. For image files, one really requires an OCR (Optical Character Recognizer). There are a few libraries available for this very purpose but for large-scale developments it is probably best to utilize a custom based solution as they can often be quite memory intensive. There are bound to be quite a few available on Python compared to Java.


Google Transit

Google has built up an extensive amount of mapping and layered data. They have even devised a multiple transit data feeds to allow one to obtain information (GTFS). This feed data on transit should really be available in every city of the world to help commuters travel around with ease of accessibility of information. London specifically also has a Transport for London feeds service on a whole range of information available. Unfortunately, as there is no real agreed global standard it creates for added complexities of data aggregation on transit within the various cities of the world who each seem to have their own approaches. Majority of the mapping work on Google Maps is done in JavaScript. But, there are also alternatives available like Leaflet.js, OpenLayers, Yahoo Maps and more. Unfortunately, one major drawback of relying on Google APIs or for that matter any commercial vendor specific APIs for data is that they provide no reliability for long term availability of the service or that it might in fact change over time on which one might have in fact built a web application. It would be useful on some providers to support versions of their APIs that way backward compatibility can save many who rely on their services. Also, travel information can potentially also be semantically applied using linked data from geonames.

Pair Programming

One of the lamest approaches to come out of software engineering in agile methodology is pair programming. It does not work on all occasions and is really a very unperceptive way to be productive in development. Most developers want a sense of autonomy, freedom, personal space, and exploratory approach to their development work. Not be constrained by process and by someone constantly over their shoulder feeding them nonsense. Pair programming can work when it is flexible within co-located desks. But, if two people are physically sitting next to each other it just becomes one really long and unproductive exercise. "Why do you want to do it this way why not that way" - if a senior developer keeps hearing that in their daily work day it is bound to drive them insane. Pair programming can work wonders on a complex piece of legacy code where there is virtually no tests and one needs to do a major service migration. In that respect, it means pairing with someone who either does know the legacy code or someone who can help in process of identifying problematic story cases. It may in that respect also lend towards productivity as one writes out a test case while another is doing the implementation or documentation. It may even help during training of a junior developer. But, really pair programming on a daily work day can be a very droning and frustrating process. For developers, that like to research and investigate new approaches as they go through their every day development, it does not really lend well to pair programming. One cannot test everything before deploying to production. There are always unknowns in a production environment for any number of things to go wrong. Agile approach is meant to work precisely for the reason of making a team agile - quickness, lightness, ease of movement, and nimble. Generally, a mixture of Lean and light weight Scrum use can work in that regard. It seems that many agile development environments have architects and managers that do not quite trust the abilities of their developers for which they start adding more pairing, excessive code reviews, and processes to make sure everything is checked over before it is delivered for deployment to production. Relying too much or adding way too many process-driven approaches into a development team defeats the whole objective of agility that pretty much manoeuvers a team back into the waterfall or seemingly unproductive iterative model. In summary, pair programming is an extreme practice which is not a necessity for teamwork nor for agile software development. In fact, it is most appropriate for those developers that require excessive amount of hand holding or the only way that they actually learn things in their line of work.

Dilbert Jokes on Extreme Programming

25 December 2013

Web Crawling

Web crawler allows one to search and scrap through document URLs based on a specific criteria for indexing. It also needs to be approached from a netiquette friendly way conforming to the robots.txt rules. Scalability can be an issue as well as different approaches can be devised for an optimal outcome. An algorithm driven approach is vital for a constructive approach of meeting requirements that might incorporate either an informed or an uninformed search strategy. At times, they even incorporate a combination as well as heuristics. This ultimately implies that, from an algorithmic point of view of a crawler, the web is seen as a graph search and lends itself well with linked data. They could be conducted in a distributed fashion utilizing multiagent approach or as singular agents. Web crawlers can also be used for monitoring websites usage, security, and dispensing information analytics that might otherwise be hidden from a web master. There are quite a few open source tools and services available for a developer. There is always a period in which testing would need to be done locally to work out the ideal and web friendly approach. There is no one best solution out there if the needs go beyond the limitations of any existing libraries can offer. In that respect, it really means designing one's own custom search strategy. And, perhaps, making it open source to share with the community.

Python:

Java:
LinkedData:

Services:

Also, HBase appears to be in general a very good back-end for a crawler architecture which plays well with Hadoop.

Obviously, there are a lot more options out there most likely of which have a premium. Majority of the premium options have been avoided a mention.

high performance distributed web crawler
high performance distributed web crawler survey 
learning and discovering structure in web pages
UbiCrawler: A Scalable Fully Distributed Web Crawler
Searching the Web