19 May 2013

Semantic Web and Linked Data Storage

Semantic Web often times is solely dependent on an efficient back-end storage and indexing strategy from where most of the processing stems. It seems leaving out the most valuable aspect of a Semantic Web architecture towards the end as a way of interface is a bad move. One should always first think through the data layer first. Semantic Web is like a work flow of services in a pipeline and have to be thought through in that manner as everything depends on resources and the active querying of such resources. In fact, by extending the model by way of linked data VoID interlinks one further extends the data requirements exponentially. 

There are generally three ways of approaching a back-end for semantic web. The first approach is to treat it as a pure W3C like a regular client-server model. The server being the triplestore and the client being the web interface of services. The second approach is usually apply a more granularity using property graphs with the Tinkerpop framework. In this manner a whole range of graph properties and options for NoSQL emerge. The third approach is to apply a standard relational model and to convert that into an RDF repository. In all three cases, an RDF interface layer similar to JDBC is required as well as possibly a search indexing layer. 

The two most common interface layers which also have their own storage layers include Sesame and Jena. Sesame is the more versatile of the two providing more robust features as well a majority of the triplestores are based on this model. Jena appears to be a more strict W3C driven approach. In both models, the provided storage is not sufficient for production requirements as the data can grow exponentially. One obviously has to keep room for current and future data needs. Often times clustering would be required to scale out the SPARQL queries. In almost all cases a read-only SPARQL endpoint has to be provided for users to interface with. In SPARQL 1.1 even an update and an insert has been added on. However, these particular methods should be restricted to admin level. 

Open source triplestores are generally quite limited for production use and so a workaround has to be applied at times to allow for scalability and storage needs. Currently, the top performing triplestores include Virtuoso, OWLIM, and Allegrograph both very much commercial and with quite a large toolset. The next best triplestore would be Bigdata which is a fairly good Open Source option providing clustering, sharding, and full-text indexing needs. It also has a zoo keeper connector. In terms of a property graph one can almost always use Neo4J or OrientDB. OrientDB provides a more liberal license option. Solutions that provide hadoop as the underline back-end storage layer will not perform very well due to the nature of its distributed design approach. The storage layer could be deployed to a clustered 64 bit and 4-8 CPU core production ready environment.

Semantic Web is really starting to take off and more and more interesting options are starting to emerge. However, it is still the case that open source solutions are lacking in production quality and are more experimental for research use. The field is still dominated by commercial players who provide a Swiss army knife of solutions in the field with an obvious premium. There is still a lot there to be done even in aspect of making Semantic Web more accessible for developers as the W3C specifications can be quite complex and in lot of ways there are just too many bewildering set of models to apply in a varied combination of usages. Perhaps, even the introduction of JSON-LD will facilitate the steps in making linked data more accessible for front-end developers. Simplicity and convergence is key in making Semantic Web the next evolution for Big Data and the Internet.

Java:
Sesame
Jena
Tinkerpop
linkeddataapi
any23
marmotta
stanbol
rdf2go
sesametools
groovysparql
pellet
owl-api
jsonld for java

Python:
Redland
RDFLib
Bulbflow
RDFAlchemy
Fuxi
Surf
ORDF
Django-rdf
Djubby
pysparql
sparta
Oort
sparqlwrapper


JavaScript/Nodejs:
RDFQuery
Tabulator

Semantic NLP:
KEA
OpenNLP 
DBPedia Spotlight
Maui

Graph stores:
Neo4j
OrientDB
Allegrograph
Virtuoso
BigData
Ontotext
Titan
Stardog

W3C:
SPARQL 1.1
RDF
JSON-LD

Reconciliation:
GoogleRefine

11 May 2013

Food Places To Avoid In London

In big city like London there is an unprecedented amount of food places on offer and with them comes a high risk factor with the degree to which such foods can be classed as edible to almost hygienically unpredictable. Often such places are made attractive with the bargain pricing factor. However, with that option also comes cheaper quality. It is left to the individual to decide whether quality can out way the price. In my opinion, one should never bargain on the quality of food like one would for an item of clothing or electronics. A lot of these places are made also accessible to the tourist or the person on the move due to their quick self-service options and their competitive pricing model. As an opinion, a few of the common places I have found to lounge in certain degrees of unpredictability and risk to falling sick the following morning are listed below. I hope one does not rely on such fast food places on a daily basis as that would quickly effect one's health over time. The choice of creating and cooking one's meal at home is almost always the best option.

  • McDonald's

They advertise so much on TV that one can almost get brain washed to paying them a visit. They also provide one of the most commercialized burger models around at an extremely competitive price. However, the quality of the food is on a very unpredictable side. Often times one will feel sick straight after they have had a meal almost like getting bad reflux. The food is also quite unhealthy if taken on a regular basis. Best to avoid and try some other more authentic burger places. Sure these other places may be more expensive but well worth it. 

  • Burger King

Another fast food place in competition with McDonald's. One would get a different style of reflux here often times is more jerky and a feeling of being bloated. Surely, an unpredictable place as well. There are so many other better alternative places for burgers around London. Does one really need to bargain on food and their health?

  • Cottage Chicken

A cheaper version of KFC, where the fries are unsavory and the chicken almost feels like its under cooked. KFC would be my advice or better yet get a whole chicken from some where like Waitrose or Sainsbury's. Free range is even better then such factory endorsed chicken fast foods.

  • Pret A Manger

One of the best places to obtain soggy sandwiches with some strange filling combinations which taste like they may almost be expired. If one just needs a quick sandwich while at work a better option would be make a homemade sandwich before heading off, that way one can control on what they eat, the cost, the hassle of standing in a queue, as well as on the quality. Another option might be to visit Paul instead.

  • Eat

Another competitor to Pret A Manger but with a little less quality. As if, Pret A Manger sandwich unpredictability was not enough. They also have a little less selection. Perhaps, one less set of options to choose for getting sick.

  • Certain Kebab Shops

Greasy kebabs shops seem to be appearing almost on every high street these days. They can range from Turkish kebabs to Middle Eastern roll up. Although, turning out to be tasty in the moment, the following morning may just be one of the worst days one ever had on a night out. This is not to say all kebab shops are bad but being vigilant on their preparation and serving often always can be one way of testing the waters before placing an order. Some places around London that are hot spots for a mix of unpredictable to really good kebab shops include the Edgware Road. But, one can also visit Best Mengal and Sofra which are even better in their own right. 

  • Bargain Bucket Style Asian Curries and Buffets

These are often one of those places where one can see and smell a lot of the food aromas out in the open. In fact, that is what they try to do as a way to sell cheaply and quickly. However, late night they basically trying to sell all of it out as it goes to waste. And, even in some of these places one can't be too sure as they may just reheat the food and serve it again the next day. Chinese and Indian foods are often unpredictable for the western palette especially down to the level of spiciness but also the way in which they cook it. Being vigilant in what one eats is the best way to go. Something that looks and smells good doesn't necessarily mean it will taste good let alone be healthy either.

  • Tescos

They serve value meals which are not very good but pretty much bargains in terms of what Tescos classes as a bargain and it is always best to check on the expiry date. My view would be to stay clear of such sandwiches, unless one doesn't really care about what goes inside their body.

  • CostCutter Style Sandwiches

Another one of those places where you get a mixed set of bargains with leftovers later at night. Baked items can be very greasy and the sandwiches something on the cheaper side pretty well at par with Tescos.

  • Tube Station Eat Outs

These places serve more expensive foods and lots to choose from. There is an extensive variety but some of the unpredictable eat outs are situated here also. It is not to say that they are all bad. One safe option for food is M&S they have a fairly good level of quality. Others that do work relatively well are places like Cornish Pasty. Also, would one really want to buy a sandwich from a place like boots where most of the stock they carry is in medicines or cosmetics.

4 May 2013

Subjectivity and Sentiments

Mining for subjectivity from text is a hard science requiring machine learning based approaches to harness information from the varied and large data. Subjectivity is all about finding opinions, affects, and sentiments from texts. This could take the form of processing blogs, reviews, tweets, editorials, and general articles as well as several other sources of textual content. Subjectivity is important in a wide range of domains. It can be valuable for calculating and identifying particular trends and forecasting for real-world applications such as in financial markets, fashion, events, economic indicators, social markers, political voting, chart toppers, and a lot more where understanding attitudes and feelings matter on a particular 'thing' or 'concept'. Automatically finding such information is hard and utilizing machine learning is an area of research analysis for identifying and extracting opinions and sentiments from texts. In order to achieve and develop such an application requires understanding and applying statistical natural language processing.

Sources of information on sentiment analysis are hard to find as well as the subject is relatively new. The below links are a valuable entry point towards further information.

Subjectivity Analysis
sentiment-analysis
opinion mining sentiment analysis survey
Twitter-as-a-Corpus-for-Sentiment-Analysis-and-Opinion-Mining
sentiwordnet
twitter-sentiment-analysis-training-corpus-dataset-2012-09-22
sentiment140

2 May 2013

Holographic Messages For Mobile

I think it be quite neat to have holographic message effects for mobile that could be sent just as one sends a text message only with a lot of 3D realism. It be quite similar to 3D TV only without the need to wear glasses and the effects would literally happen on the circumference of the mobile. Message mates can be funny, comical, creative, and at same time intimate as well as relay the mood between two individuals quite well without requiring the need to type wordy texts. They could come handy during all sorts of occasions from work colleagues, friends, family, to even a partner. A few examples would do justice here. On valentines day someone could send a virtual kiss or a virtual rose that rises up out of the mobile phone with maybe even a message attached. One could even send a virtualized gift on some ones birthday with a pop up clown or a stripogram. Another option could be for sending particular moody messages like when one is angry at someone they could send them a fire message that virtually puts their phone on fire and gives them a slight heated sensation on their hands. Or, perhaps one wants to know when they have received an email without the typical vibrator or a tone going off, they could set it so the phone feels like it is melting or freezing up. Virtualization on mobile could even allow for 3D conferencing, geomapping, flexible learning, remote security of car, home, family, and so much more. Even the idea of allowing people to customize their own effects would be pretty cool. The effects could even include an aspect of ambient intelligence. It be quite amazing to set one up for an android or even an iphone. It certainly doesn't seem impossible especially looking at the introduction of Google Glass.

27 April 2013

Let's Go Programming

Go is an evolutionary programming approach in which programmers can be more productive in making use of multicores and in process having more control and flexibility for concurrency. It is a language that has a mixture of both Java and C. Aspects of garbage collection are notable. The language even signifies less in coding lines compared to Java's verbosity. It feels like an interpreted language in dynamic mode and yet is compiled with static typing. It transcends what a lot of people would want in terms processing big data and working with intelligent algorithms. A programmer with the comparative speeds of C to the clean modularity, concurrency, and garbage collection structures of Java could only mean more productivity. Although, it can be also noted that Go is not as fast as either Java or C in benchmarks yet. However, the aspect of utilizing multicores makes for less of a wastage in a cloud computing environment. Where there is simplicity required, rapid prototyping for delivery, and less of an overshadowing engineering cycle, Go could be quite a suitable language. Go is still very new and does not facilitate much integration with other languages. It also does not seem to have the flexibility of rich open libraries yet like Java. However, in time as for all languages, new releases and community insights would add further to the road map. Go certainly seems like a promising language for the future prototyping. Perhaps, not just yet production ready.

A good reading guide to get started:

22 April 2013

The Ultimate Web API

The use of APIs have been around for decades from web, desktops, to mobiles, and everywhere from software to hardware. They are ubiquitous and pretty much everywhere these days. No doubt even the microwave and the washing machine uses some form of an API or a stack of APIs. Now as a result of REST and HTTP as well as cloud computing even more APIs have started to flourish. The value in an API depends on the usability and scalability of its design. Badly developed APIs are unnaturally designed and cause grief to developers both consumers and providers. Consuming applications often times built as mashups rely on such APIs to provide access to resources and with badly mapped functions and parameters it can impinge on quality of the application as a whole. One can explore a behemoth of APIs on the programmableweb. API development is a craft and an art which one can really only get right from exploring, learning and experience in understanding both the application as well as its users. Below are a few useful books for API design.

20 April 2013

Open Source Licenses

For the uninitiated and even for experienced developers software licenses can be a tricky terrain to walk on. Often times even leaving lawyers a bit misguided. There also are so many to choose from dictating different levels of propensity for open source. How does one decide on what open source software to use or even what open source license to apply to their own software? And, this process seems to be getting more and more complicated as more and more open source software is released on the web which eventually has a sponsored or commercial backing. For so many businesses certain licenses like LGPL can mean a very tricky territory. The following link attempts to bring a lot of clarity to the community license process.

open source initiative

Why Dart Will Never Replace JavaScript

Dart is a new type of client-side programming language to compete or potentially replace JavaScript. And, as I snicker with doubt on the prospect of any language on client-side trying to replace the dominance of JavaScript and the potential culpable replacements, I can only hazard at how miserably they will fail. Along came VBScript and with a quick demise. Henceforth, a new language has developed and entered the periphery called Dart. This language has been under the auspicious development at Google. The developers at Google have even admitted and rejected the idea of Dart as a replacement for JavaScript. Well, what are the goals of such a new language in prospect.

"Create a structured yet flexible programming language for the web". Aren't there already plenty of structured and flexible programming languages out there for the web? 

"Make Dart feel familiar and natural to programmers and thus easy to learn" If that is the case then isn't it always a drag to learn a new language? If the language is so familiar why not improve on the existing languages that programmers are so familiar with and put effort in improving them with the open source community? 

"Make Dart appropriate for the full range of devices". If it is such a fragmented market for mobiles, why fragment it further by creating another language? Going further, why setup separate tools just to support another language on a platform. It seems to me more like a wasted effort towards attaining simplicity by making things more complicated. 

"Provide tools that make Dart run fast across all major modern browsers". Here again it seems just to support the language there has to be so many other tools to develop around it. Further complexity and fragmentation. 

Dart appears to also be a replacement to GWT. Some of the developers on GWT project migrated to the Dart project. It seems like to improve the scripting one will obviously need to access JavaScript from time to time even while using Dart. If one already knows JavaScript, why create another layer of compiler complexity. JavaScript is almost ubiquitous and everywhere, even with its undulating quirks. It can be found on desktops, web, to mobile devices. Anyone, that tries to even hinder at the prospect of replacing such a language would need to convince the world about such an action for which adoption will be extremely slow perhaps even openly rejected by many. As the HTML5 emerges so has the mere importance of JavaScript and the many libraries that are brewing over the web. HTML5 has even relegated Flash/ActionScript to a certain degree especially for the mobile. Almost all web browsers support JavaScript today. Although, Google can be applauded for the continuous effort towards innovation and rediscovery. One cannot but wonder whether Dart will also be another Google project brushed aside by the large open community and business decision eventually meaning its demise.