Showing posts with label intensedebate. Show all posts
Showing posts with label intensedebate. Show all posts

10 June 2016

Brexit Pipeline

Studying Sentiment Analysis in context of Brexit (EU Referendum) is currently an intensive area as the polling stations will very soon be active for voters. Input sources from social media and news feeds can be a focal point for storytelling about the various events. Social media and news feeds can be utilized in form of stream processing which can then be used for machine learning analysis and then indexed for summary into Elasticsearch. A sample workflow example is provided below.  Reader will take notice that the sample workflow is also supported as an example for learning Apache Flink. The workflow can be modified as required for example, one could use a Redis cache layer between the machine learning process and Elasticsearch. Also, could extend with an NLP pipeline (Gate/UIMA) or simply OpenNLP/CoreNLP for extracting information. One could even replace Apache Flink with Spark or GraphLab. Alternatively, one could even replace Kafka with Kinesis and simply apply the AWS data pipeline. Also, the data sources can be stored using S3. Furthermore, one could even use DL4J with Spark on ElasticMapReduce to apply Deep Learning approach in form of convolutional neural network model. Although, Python developers may be more inclined to use Theano, TensorFlow and possibly RabbitMQ. For a graph representation one could use Titan, GraphX, Elasticsearch Graph, Cayley, PowerGraph, Gelly, among others. As one can see there are several ways of implementing a solution on a case-by-case basis to translate the requirements of stories. However, prototype in small is always the best way to go before scaling out incrementally i.e fail fast

Input->Kafka->ApacheFlink->Elasticsearch->Output

Steps:
  1. Collect
  2. Log 
  3. Analyze
  4. Serve & Store
List of Input Sources:

As a side note, GNIP and DataSift provide an entire data source pipeline for building out a firehose of streaming inputs. Live Polling data can also be used to gather voting trends as they happen. However, as the referendum is now past, one can probably get a hold of the dataset or API.

10 January 2014

Linked Data for Discussions

Disqus is an interesting platform for the web communities of discussions. They can be used in fundamentally three different ways: using the Disqus API as a point of integration, via a JavaScript code, or direct integration into the supported blog platforms. It is a nice enough feature getting traffic to a site as well as to collaborate on discussion flows. However, this does not lend itself to all. Disqus requires you to host your comments as a third-party which might be risky for some as they lose control over when as well as how their comments data will be used and made accessible. Also, it has the potential of slowing down a site considerably.  One can also lose their comments once removed from the site. It also relies on JavaScript quite extensively which can add to much security risks. An alternative option is to use IntenseDebate which has a nice integration to Blogger and even Wordpress. However, a lot of these discussion platforms are diverting data away from the hands of the user in a centralized way and making it less connected. What would be a better option is towards a decentralized control of comments and discussions in a distributed fashion towards a linked data approach. That way each site controls its own data on comments as to what to expose and what not to. But, also it allows for comments to be linked with other sites in a web of data. Perhaps, even utilizing an extensible generic comments API. One can then query for comments and find open answers and even develop sentiment analysis. Connecting linked data as a discussion forum means a vast social network of comments, that are annotated, which can be harnessed, and that allow people to globally connect based on interests, influence, and other contexts. It also means people maintain a sense of security and control of their own data. Information is a valuable commodity. But, information is even more valuable when it is contextually enriched with other data sources semantically. Even social networks should really be interconnected. We do have such options as OAuth which allow for socially connected authentication. Even such options as linked data profile. Web of communities essentially means linked data of communities and that essentially means a social web of data at the hands of anyone that wants to query. The load of that query can then be distributed via the decentralized sources and control of data. This also means websites can be even more search engine optimized and accessible for social content. It also means a better semantically available social graph for navigation and analysis to all without losing or compromising on security at the hands of a few.

Other Alternatives:
LiveFyre
Echo
Facebook Comment Box
Google Plus Comments
Realtidbits
Comments Plus
Moot
CommentLuv
TalkaTV
Juvia
Burnzone
SO: Unobtrusive Self-Hosted Comments
Tidlehash
Isso
and more...