TIQView Blog RSS

TIQ Solutions - Spend Quality Time with your Data!




Qlik Luminary

Ralf Becher on QlikCommunity

Member of LXQ - The League of eXtraordinary Qliketeers

Ralf Becher on GitHub

DZone Most-Valuable-Blogger

Profil von Ralf Becher auf LinkedIn anzeigen


My Blogroll


QlikView.next Extension Dependency Wheel - This extension was created on the 1st Qlik Hackathon during Qonnections 2014 in Orlando.

Code on GitHub for QlikView.next and for QlikView 11


QlikView Ad Hoc PDF Reporting Extension from TIQ Solutions:

This short video demonstrates how easy you can create ad hoc PDF files from the current QlikView sheet with our new QlikView Reporting Extension/Webservice..


QlikView Graph Extensions from TIQ Solutions using Neo4j Graph Database:

This video shows our current developments related to Graph Data Analysis in QlikView. Audio comments will follow..


Web Crawling and Text Analysis with Yahoo! Pipes and QlikView

A few years ago I worked a lot with Yahoo! Pipes to manage and aggregate RSS feeds about QlikView, data quality and other fields of interest for my Netvibes dashboard. Somehow, I stopped following this information overload at the end. However, a current project has the requirement to extract opinions from websites, especially from posts in forums and blogs (comments also) for further text, content and sentiment anaylsis.

Now that I’ve remembered Pipes I created a pipe looping thru a RSS feed of a blog to get the full content of the post, not only the teaser text. Although you could also loop thru the feed’s links in QlikView I though it would be a nicer solution to have it all together in one QlikView load statement of a web source.

Thanks to Barry Harmsen, I’m qualified to use his famous blog as a source of inspiration AND data for my example:

The Qlik Fix! (don’t click :D)

Here is the link to the pipe I’ve created: QlikView-Crawling-Example

You can clone the pipe and edit it:

Yahoo! Pipe for QlikView-Crawling-Example

If you render the pipe as RSS and take a look into the web source (Ctrl+U in browser) you will see the snippet of the post content taken out of the website in the the tag <content:encoded>.

The pipe will loop thru the RSS feed and will fetch the sub page of the link address and fetches the page’s content.

The following xPath expression is used (discovered with Firebug from the related <div> tag): //*[@id=”content”] This will cut out the content of the post only, nothing more from the frame around..It will look like this in editor. I marked the importend properties:

Yahoo! Pipe Edit for QlikView-Crawling-Example

In the next step everything is loaded into QlikView and the HTML tags got stripped (I’ve used my example code I posted on GIST also). Now we have the plain text for further use with text analysis and sentiment APIs.

QlikView Application for QlikView-Crawling-Example

You can downloiad the example here: QlikView-Crawling-Example.zip

Please note to install the QlikView Minimalistic HtmlTextBox Object Extension from Stefan Walther (probably the extension with the longest name) before opening the QVW file.

In the next post I will show how to process the plain text with text analysis and sentiment APIs. Have fun so far, keep on Qliking!