TIQView Blog

Month

April 2013

1 post

Using Impala for Self Service BI

The Situation

Large amounts of daily delivered raw data lead into a constantly increasing load and transformation time in enterprise data warehouse environments realized with classical databases. In a current client project (telco) we encounter the problem that it already exceeds the daily processing timeframe and can only be handled by dropping older data every day. Also, ad hoc queries will take hours even on optimized R-OLAP models with daily partitioning and bitmap indexing (Oracle in this case) which will prevent users from doing self service analysis. The assumption is to replace the classical database with a Hadoop cluster to provide adequate load and query time for a large growing (maybe unlimited) data set.

The Use Case

  • raw data from multiple delivery systems as csv files
  • time slots (granularity 15 minutes) as partition criteria
  • long term inspection within big data (one source data amount per day: 400 million rows, 120 GB volume, 96 partitions )
  • possibility to create ‘free’ ad hoc queries through a well know query language (SQL)
  • JDBC drivers to use raw data with third party tools like SQuirrel or QlikView
  • extraction of data sets with more than 1 million rows for further in-memory analysis

Our Decision for Hadoop and Hive / Impala

  • Hadoop HDFS can store an unlimited amount of data
  • massive parallel processing through unlimited amount of nodes
  • Hive query language (HQL) as SQL dialect with less learning
  • Hive meta store for schema definition, used by Impala
  • improved query performance through Impala (omit Map/Reduce overhead and job execution sequence)
  • Capex and Opex (this would fill a long thread indeed..)

Pitfalls and Experiences

The transforming of an existing table into a partitioned one may be complicated by the limitation of Hive partitions and memory space. We tried to import 3 days (96 slices by 15 minutes per day) with time slots and device ids as partition and sub-partition criteria which would create about 14,400,000 partitions (3 * 96 * 50,000).

If one partition criteria equals to a column in the csv file you must either transform this file during import to remove this column or be careful to use the right one in your select to get Impala not to scan the whole table.

As Impala 0.6 does not support timestamp or date types you have to store the time as a big integer (eg. Unix epoche). Also there is no chance to create a where clause with human readable (literal) date and time values. You would need to use Hive function unix_timestamp but this would omit the use of partitions.

Also be careful on querying multiple partitions with an ‘or’ conjunction. It does not use the partitions any more, you have to transfer your query into a form like ‘select … where condition1 union all select … where condition2’.

Conclusion

We have set up an 8 node Hadoop cluster with Cloudera distribution CDH 4.2 over Cloudera Manager 4.5 and Impala 0.6. The node hardware configuration is:

  • 4 x 3.3 GHz Intel Xeon E3-1230v2 (Ivy Bridge)
  • 16 GB RAM
  • 2 x 1,000 GB RAID 1

We have loaded three days of raw data from one source:

  • 20,832 files
  • 1.2 billion records
  • 370 GB file size in total
  • 289 partitions

Typical analysis query:

  • extract raw data slice of 3 hrs per day (for the given 3 days) for one dimension (device), maybe additional filter conditions

Query execution time including transport of result set to client application via JDBC:

  • classical database: some hrs
  • Cloudera Impala: < 2 min

These results are based on Impala 0.6; Impala 0.7 contains several improvements to consider in the next steps..

One more thing..

As we noticed the Apache Thrift project used by Hive and Impala for communication between client and server has deficits on transporting big data volumes. We are currently working on a solution (eg. JDBC driver) which will bypass the protocol on large result sets.

Apr 22, 2013
#cloudera #impala #qlikview #selfservice #bi #bigdata #hadoop

March 2013

1 post

Going Qonnections 2013?

Hi Folks,

it’s only 4.5 weeks to Qonnections 2013. As in the last two years I release my QlikView Qonnections application “QonnView” with the actual Qonnections Session schedule, the Twitter search, a Bahamas Weather Forecast and NAS Flight Information:

Download QonnView2013.zip

QonnView 2013 now comes with a Gantt chart for better session planning..

Hope to see you over there. Have fun on Paradise Island!

- Ralf

Mar 7, 20131 note
#qlikview #qonnections #qonnections2013 #bahamas

February 2013

2 posts

QVD Converter - Create and Process QlikView Data Files everywhere

The QVD Converter is a reusable Java library from TIQ Solutions. The library enables Java applications to create and process QlikView data files (QVD format) from any size independently (e.g.. in Unix environments).

image

The offered Java classes can be used directly in any Java implementation (via import) and are a useful starting point for the development of plug-ins for ETL tools like Pentaho Kettle. With this, QlikView data files are integratable in any data integration process.

image

At this moment, TIQ Solutions develops a QVD JDBC driver based on the QVDReader class. This JDBC driver will make QlikView QVD data files available as queryable SQL data sources for a broad range of applications implementing the JDBC standard.

Read more about it on TIQ Solutions QVD Converter website:

http://www.tiq-solutions.de/display/enghome/QVD+Converter

I’ve pushed my latest QVDReader examples on GitHub:

https://github.com/ralfbecher/QlikView_QVDReader_Examples

Update: I’v added new features for version 1.3:

  • select a subset of fields, skip unwanted data (in preparation for QVD JDBC driver and for better performance)
  • cache of last records symbol values to prevent lookups (performance)
  • more example applications for batch processing (e.g. search over a bunch of QVD files with wildcards for files and search pattern)
Feb 10, 2013
#qlikview #qvd #data #java #integration #library #qvdreader #qvdwriter #qvdconverter #bigdata
Graph Data Visualization in QlikView

Since I wrote the post about “Working with Graph Data from Neo4j in QlikView” here I had the idea in mind to develop a QlikView Extension Object do replace the static Google Chart API graph display I used in the example application.

Now I’ve finished a first version of an extension using the Graph Dracula JavaScript Library (based on Raphaël.js): http://www.graphdracula.net

Here are a few impressions:

You can download the extension and QlikView demo application here.

I will do some further developments in the future related to QlikView graph data integration because I think relational data structures and models will disappear in the future. Reality is different than a table..

TIQ Solutions will provide professional services in this realm, also development and customization of QlikView extension objects.

Feb 4, 20131 note
#qlikview #neo4j #graph #extension #visualization #nosql #bigdata

January 2013

1 post

Our new QlikView JDBC Datasource: Kx Kdb+

As we’ve got an inquiry from a finance institute recently if our QlikView JDBC Connector is providing connectivity for Kx Systems number cruncher Kdb+ we’ve started to develop a test case and came to the result that it’s working pretty well.

We provide an adapted JDBC driver for Kdb+ and we will improve the integration with out QlikView connector. You can download our Kx JDBC driver here to try by your own.

Update: The new version of our QlikView JDBC Connector (v2) will work with the standard Kx JDBC driver, our patch is not needed anymore.

You also can process ‘q’ code in the QlikView load script thru the JDBC connection. Since ‘q’ code can contain embedded semicolons and other special symbols, just set the string into a variable and then execute it like:

   SQL $(qSQL);

Btw. this is the new list of actual working JDBC data sources for QlikView:

  • Hadoop HDFS and Hive (Cloudera, MapR, Amazon EMR)
  • Cloudera Impala (via Beeswax API)
  • SAP HANA
  • SAP ERP/BW (via SAP JCo)
  • Kx Systems Kdb+
  • Microsoft Windows SQL Azure
  • Google Cloud SQL
  • Neo4j Graph Database
  • Apache Derby / Java DB
  • encrypted CSV files
  • COBOL files
  • Pentaho Kettle Transformation Steps

For the most of them no alternative connectivity solution is provided for QlikView.

Jan 15, 2013
#jdbc #qlikview #kdb #kdb+ #kx #kxsystems #finance #dataintegration #nosql

November 2012

4 posts

Connect QlikView to SAP HANA, easily!

SAP HANA comes with an ODBC driver which would be the first choice if you’re working with QlikView. But, somehow the CONNECT statement in the QlikView script leads into a ODBC Connect popup window which cannot suppressed:

This could prevent an automatically data load on a QlikView server.

Fortunately we can use the SAP HANA JDBC driver which works great to bypass this behavior. The installation and configuration is simple:

The JDBC connect string looks like this:

jdbc:sap://<database_computer>[:<port>][/?<option1>[&<option2>]…]

After the connect we can fire SQL queries in the QlikView load script. And, the JDBC driver has an additional benefit: all decimal columns have to be casted to double if you work with ODBC, else they would return only NULL:

SELECT cast(<colum_name> as double) AS field_name …

This is not needed with the JDBC driver! Everything works just fine. You can even use SELECT * FROM…

Update: I’ve heared that there is a problem using SAP HANA analytical views with the ODBC driver. There is no limitation with the JDBC driver. You can access all objects types.

I have built a SAP HANA schema viewer in QlikView to create select statements from analytical views which get not shown inside the QlikView select dialog. This makes it easy to create select statements for QlikView script:

Update 2: Our first test with QlikView Direct Discovery (comes with QlikView version 11.2) on SAP HANA using JDBC connect has shown very impressive query result times on analytical views. According to QlikTech this is not possible with ODBC.

QlikView Direct Discovery and SAP HANA is a good combination. Amazing to see how fast in 1-2 seconds! HANA executes aggregations (and all other SQL statements behind QlikView’s Direct Discovery) on this big amount of records (148M records in my test case):

Where SAP HANA (as a backend) is maybe the fastest SQL database on the market, it integrates seamless with QlikView’s Direct Discovery because the response time is similar to QlikView! So, the user experience is same as it would be a QlikView application without Direct Discovery.

Here’s a short screencast covering this blog post:

Nov 29, 2012
#analytical_views #hana #jdbc #qlikview #sap #direct_discovery
Hadoop HDFS JDBC Driver

TIQ Solutions has released the Hadoop HDFS JDBC Driver. You can download a limited demo here:

Hadoop HDFS JDBC Driver demo for Hadoop version 1.0.3 (14.2 MB)
Hadoop HDFS JDBC Driver demo for Hadoop version 2.0.1 (21.4 MB)

Introduction

The Hadoop HDFS JDBC Driver is designed to connect Hadoop HDFS from an external system (outside the Hadoop cluster) via JDBC and to extract relational data in a line or CSV based format from a HDFS (Hadoop distributed file system).

Indeed, there are already several ways to extract data from Hadoop but no really easy way to use existing query or analysis software on the HDFS itself, on a file level. For instance, if you want to perform a quick check of the last map reduce results you will need an interface, which allows you to keep track of your data in time.


use of HDFS JDBC driver with Squirrel

Preparation

The following settings are mandatory for configuring the Hadoop cluster.Ensure that they are set properly, when using the Hadoop jdbc.

 There are some files in your Hadoop install directory (in the following refered as $HADOOP_HOME), which should be adjusted. Precisly speaking, there are located in $HADOOP_HOME/conf. They exists as .xml files, which contain a set of property elements. A property element is a key value pair, so you have to set a name tag and a value tag.

Take a look at the example configuration file: core-site.xml

<?xml version=”1.0”?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/var/hadoop</value>
  <description>A base for other temporary directories. The executor of the hdfs deamon need to have the permission to write in this directory.</description> </property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://<IP ADDRESS>:<PORT></value>
  <description>The name of the default file system.  A URI whose scheme and authority determine the FileSystem implementation.
  For being able to connect to the HDFS, the IP adress should be taken from the namenode machine of your local network.
  To avoid problems, you should configure a valid DNS setup between your cluster nodes too.
  </description>
</property>
</configuration>

Usage

You can include the driver like any usual JDBC Driver. Make sure, that the used Java library files (.jar) are in a directory called “lib” next to the connectors jar file.

Make sure that you have the appropriate Java libraries for your Hadoop deployment. There are library versions for the Hadoop version 1.0.3 (current stable release, e.g. Cloudera Version CDH3) and the version 2.0.1 (current alpha release, e.g. Cloudera Version CDH4).

Specify the driver class

The JDBC mechanism uses the class name of a driver to reflect and load it during runtime. Thus, you need to tell your application the driver name:

de.tiq.hadoop.TIQHdfsDriver

Connect to your HDFS

For being able to connect, specify the following JDBC URL:

jdbc:hdfs//<theHDFShost>:<PORT>?PARAMS=VALUE

Currently, there are no security protocols implemented.

Settings for your connection

There are parameters (key-value pairs), which are appended to the URL like ordinary HTTP GET parameters after the question mark token:

  • user=<YourHDFSUserName> - connects you to the HDFS with the specified user name, e.g. `…?user=root`
  • recursive=<boolean> - if true, the driver will match files in any subdirectory when querying the HDFS with a regular expression. Use: `…?recursive=true` The default is false.
  • separator=<MaskedSeparatorSign> - set the column separator of your data, the default is a TAB ‘\t’ `…?separator=,` you may also use a URL mask for special characters: `…?separator=%3B`(semicolon)
  • skip_header=<boolean> - if true, it will skip the header lines when concatenating a set of files, e.g. `…?skip_header=true` The default is false.

The whole URL might look like:

jdbc:hdfs://localhost:9000?user=hduser&recursive=true&separator=%3B&skip_header=true&raiseUnsupportedOperationException=false

Create Statements

The main functionality of the driver consists of issuing SQL queries. It will get the data from a file out of the HDFS. I remind you again that the data should be organized in a relational way, e.g. .csv files.

Single file:
For retrieving data from a file of the HDFS, you can set the path to the file in the from clause of the select statement. For a single file the syntax appears to be like this:

select * | CommaSeperatedColumnList From /path/to/hdfs/object

seeing that the from clause represents a path in the hdfs like `/input/data.csv`.

For a single file, it is allowed to omit the .csv suffix.

Set of files:
When concatenating files, please keep in mind that the first file determines the structure of the whole table. All other files will be interpreted as they would have the same column structure. Knowing that, missing columns will be treated as NULL values and additional columns will be lost.

We assume that you don’t want to mix up unrelated data with each other.
If you need join functionality, checkout a map reduce job in hadoop to transform the data in the appropriate structure.

You can state a set of files as a comma separated list, e.g.

select * from /path/file1,/path/file2,/file3

they don’t need to be in the same directory. Between the different files there must not be a whitespace.

The first given path in such an expression is called “base path”. We estimate that you want to extract more files of the base path. You can use the following query:

select * from /path/file1,file2,file3

which will return the data of file1, file2 and file3 out of the /path directory from your HDFS.

Regular expression:
You can use a (Java) regular expression to retrieve a set of files, which are matched by a given pattern. They should contain the same column structure again, because the first file still determines the header.

A pattern is initiated by a hash character (#). For example:

select * from /path/#csv$

would extract all files, which end with the character suffix csv. It will search recursively, if you specify the `recursive=true` parameter in the URL.

Directories:
For providing an easy way to navigate, you can also pose a query to retrieve directory informations. Try:

select * from /

and you see a basic Unix like index of the root directory from your HDFS.


use of HDFS JDBC driver with QlikView


QlikView script example to read out a whole data model from HDFS

Demo mode

The demo version of this software is limited to fetch 1000 rows of data.
As a demo, this program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

If you have any questions, feel free to ask us:

ralf.becher (at) tiq-solutions (dot) de

Update:

This images shows the configuration of the Hadoop HDFS JDBC Driver and the additional needed Java libraries in the QlikView JDBC Connector:

Nov 20, 2012
#hadoop #hdfs #jdbc #driver #suirrel #qlikview #bigdata #nosql
Connect QlikView with Google Cloud SQL

Google Cloud SQL is a new service which runs mySQL databases in Google’s cloud. You can try out this service without charge until June 2013 under a introductory trial.

Usually, you would need a database if you develop applications with the Google App Engine platform. Of course, there should be a possibility to connect Cloud SQL from external applications. This is possible with the command line tool and via JDBC. Here comes the next use case for our QlikView JDBC Connector.

To connect Google Cloud SQL you need to create OAuth 2.0 tokens to authorize applications to connect to your instance on your behalf. This can be done by using the command line tool (section Setup).

After following the command line tool setup steps the OAuth 2.0 tokens are created and stores in Windows registry under the key:

HKCU\Software\JavaSoft\Prefs\com.google.cloud.sqlservice

You need to copy this entries for the user who will be running the application on the deployment machine.

In the next step we configure the Google Cloud SQL JDBC driver in the connector:

Now we can start connecting Cloud SQL in QlikView. Just open the script editor choose JDBC Connector and click connect. Enter the JDBC connect string:

jdbc:google:rdbms://<your_instance_name>/<your_database_name>?
connector.driverClass=com.google.cloud.sql.Driver;

Leave the user/password properties free.

Now we can select the objects from an example mySQL schema I’ve created before with the Google API Console SQL Prompt. Use the dialog as usual to build your SELECT statements to load the data into QlikView:

As we can see it takes only some additional configuration steps to connect QlikView with Google Cloud SQL. Then you can retrieve data into your QlikView application as you would work with a standard mySQL instance.

Nov 18, 2012
#google cloud sql #jdbc #jdbcconnector #mysql #qlikview #cloud
Honey, I shrunk the QVDs

Although, not in this dramatic way the scientistof the film did, but at least without loss of any information.

During the development of my Java QVD creation library I have analyzed a lot of QlikView created QVD files (who else could create them before..) and found out that the utilization of numeric data types in QlikView (and in stored QVD files) has some room for optimization.

Especially if you have a lot of integer numericals where number formats are not needed. This means you could also store currency amounts in that way by multiplying with 100. You always can choose the formats in the QlikView application. I’m wondering why this is stored in the QVD files and takes place in the QlikView application memory (dual?). Ok, I can imagine some use cases like internationalization etc.

In one of my test cases I have used a CSV file (a very common format in Big Data environments like Hadoop) as a source for my QVD creation with the following characteristics:

Size: 1.09 GB
Records: 15.8 M
Columns: 21

In the next step we can compare the slight differences in the created QVD files:

QVD file QlikView STORE created: 630 MB
QVD file Java Converter created: 598 MB

Result: -32 MB (5% less disk space)

QVD Statistics:
Fields: 21
Symbols: 7997448
Records: 15806515
Datapoints: 225563265
Nullvalues: 106373550

Memory usage in a QlikView application (.qvw) discovered in QlikView’s memory statistics (.mem file):

QVD file QlikView STORE created: 686.6 MB
QVD file Java Converter created: 665.6 MB

Result: -21 MB (3% less RAM)

Ok, saving 21 MB memory usage is not a big deal here but it indicates that the QVD stored data types have an influence on the QlikView application also. I think this should be definitely considered in Big Data environments.

We should keep in mind that we can can move the memory boundary a little bit.

Nov 9, 2012
#qlikview #qvd #qvdreader #qvdwriter #qvdconverter #java #optimization #optimize

October 2012

3 posts

QlikView 11 for Developers: The Book

It was a pleasure to contribute with the review on this brand new book by Mike Garcia and Barry Harmsen published by Packt Publishing:

QlikView 11 For Developers



With this great book you will learn the techniques and best practices to create QlikView applications efficiently and to improve your QlikView development skills. This will help even experienced developers in their everyday development challenge. I highly recommend this book.

Thanks to Mike and Barry!

Oct 26, 2012
#qlikview #book #review
Hadoop HDFS JDBC driver with QlikView

This is a first test of our brand new developed HDFS JDBC driver. It can be used in combination with our QlikView JDBC Connector to query the HDFS file system and to load CSV files (raw data or MapReduce results) directly from HDFS into QlikView.

We have two deployments for HDFS version 1.0.3 CDH3 and 2.0.1 CHD4.

A full tutorial will be released soon..

Oct 17, 2012
#hadoop #hdfs #jdbc #qlikview #bigdata
QlikView Data Integration in a Java World

A brief introduction on our innovative QlikView data integration components:

TIQ Solutions - QlikView Data Integration in a Java World from TIQ Solutions GmbH

Oct 12, 2012
#qlikview #java #bigdata #jdbc #json #qvx #qvd

September 2012

1 post

Load a random sampling data set for Data Profiling in QlikView

A well know best practice in data profiling is to load only a random data set if you have large amounts of data. This will help to speed up profiling and saves time on load and processing especially if you want to check all data on a daily base.

We can do this in QlikView easily with the rand() function as shown in this load script example:

// in this case we want to load a 10% sampling set of the data
data:
LOAD ….<your fields>…
FROM <your QVD file>
WHERE ceil(rand() * 100) <= 10; 

This feature is included in the new TIQViewQVD version 2.6:

As you can see the records (%key is the physical record number) got selected randomly. The result would not be an exact percentage of sampling because we have a random factor. Sometimes you will get a small amount of records more or less.

You can download TIQViewQVD Limited Free Version here.

Hint: There is also the sample prefix to a Load or Select (SQL) statement which can be used for loading a random sample of records from the data source.

My tests showed that the resulting amount of sample rows were more inaccurate, mostly 10-20% higher as expected.

Sep 10, 20121 note
#qlikview #tiqview #dataprofiling #sampling

August 2012

4 posts

Facing Square Brackets in Field Names of QlikView Sources

I stepped into this in my current project. Our data sources are CSV flat files with field names in the header row. Unfortunately there are square brackets used for the field names.

Unfortunately QlikView cannot handle these brackets. The table file wizard shows the header line with the correct field names with square brackets:

but the created script is wrong and cannot be proccessed:

There are two natural ways to fix this problem by hand coding:

  1. using alias name (good luck if you have hundreds of fields)
  2. use double qoutes (not very nice for QlikView chart expressions)

I found a simple way to solve it with a three step approach:

  1. read field names as a list from the header row (use subfield function to transpose) into a mapping table and replace square brackets with paranthesis (or else)
  2. load data with no labels (field names @1, @2 etc.)
  3. rename field names using a mapping table

This is the code example (download here):

————————-

// load header row
Header:
FIRST 1 LOAD @1:n as First_Line
FROM flatfile.csv
(fix, codepage is 1252);

// load list of fields for renaming, replace square brackets with paranthesis
Map_Fields:
MAPPING LOAD '@' & RowNo() as Field1, Replace(Replace(SubField(First_Line, '|'), '[', '('), ']', ')') as Field2
Resident Header;

Drop Table Header; 

// load data without field names (save scripting work with aliases)
data:
LOAD @1, 
     @2, 
     @3, 
     @4, 
     @5, 
     @6, 
     @7
FROM flatfile.csv
(txt, codepage is 1252, no labels, delimiter is '|', msq, header is 1 lines);

//
RENAME Fields using Map_Fields;

————————-

You can use this mimic for several files also. It can help saving your coding time if you’re facing lovely square brackets.

Aug 29, 2012
#qlikview #flatfile #source #square_bracket
Connect to Hadoop Hive from within QlikView

This is an example to show how you can connect QlikView with Hadoop Hive by using the QlikView JDBC Connector. You need to install and activate the demo version from TIQ Solutions.

An easy way to start with Hadoop is the Cloudera Distribution Including Apache Hadoop (CDH) which is availble for download here: CDH4 (latest version is CDH4.0.1)

After starting the CentOS VM with Hadoop you can create the Beeswax for Hive examples (via web app Hue), which are two tables:

The Hive server service is already running on default port 10000. If not start the service:

/usr/bin/hive —service hiveserver

Don’t forget to find out the IP address of your VM (call ifconfig).

Next steps are on the client side. Download Hive JDBC libraries and extract the files from the hive_jdbc-0.8.1.zip archive to a folder. Add all libraries in the shown order to the CLASSPATH with the config dialog of the QlikView JDBC Connector:

Now connect to the Hive instance from QlikView by chosing the JDBCConnector dll

and click Connect Button where you have to specify the the connection credentials:

  • Hadoop host IP address
  • Hive port (default is 10000)
  • JDBC driver class (as URL parameter, in bold)

Copy this connect string into the connect dialog window:

jdbc:hive://192.168.254.163:10000/default?connector.driverClass=org.apache.hadoop.hive.jdbc.HiveDriver;

The dialog will past this custom connect statement into the script if you click OK:

CUSTOM CONNECT TO “Provider=JDBCConnector_x64.dll;jdbc:hive://192.168.254.163:10000/default?connector.driverClass=org.apache.hadoop.hive.jdbc.HiveDriver;XUserId=ZCNDJPC;XPassword=aSBFNZC;”;

Now you can use the table wizard to create your select statements (click select):

That’s it. Everything is done so far. You can start loading you QlikView application now.

I have prepared a sample QlikView application QVHive.qvw you can download here.

Aug 28, 20122 notes
#cdh4 #cloudera #hadoop #hive #jdbc #qlikview #bigdata
Stream Data from Pentaho Kettle into QlikView via JDBC

Pentaho has released the Thin Kettle JDBC Driver which is a part of the core libraries from version 5.0-M1 or higher.

This allows us to stream data from a Pentaho Kettle or PDI transformation right into QlikView without the need for any data persistence inbetween. This means we can connect to any output step of a transformation to gather the data.

I followed the configuration steps Matt Casters has described here. Basically, it is the creation of the carte config file: carte-config.xml

<slave_config> 
  <slaveserver>   
    <name>slave4-8084</name>   
    <hostname>localhost</hostname>   
    <port>8084</port>   
  </slaveserver> 

  <services> 
    <service>   
      <name>Service</name>    
      <filename>C:/Pentaho-Repo/examplecsv.ktr</filename>    
      <service_step>Output</service_step>  
    </service>
</slave_config>

You can add multiple <service> sections if you want to connect to more than one transformation or step. This configuration is used for this basic example transformation:

Then I configured the following Pentaho jar files (which are part of the distribution) in the QlikView JDBC Connector for this use case:

  • lib/kettle-core.jar
  • libext/commons/commons-httpclient-3.1.jar
  • libext/commons/commons-codec-1.4.jar
  • libext/commons/commons-lang-2.6.jar
  • libext/commons/commons-logging-1.1.jar
  • libext/pentaho/kettle-vfs-20100924.jar
  • libext/log4j-1.2.16.jar

After the start of the carte service (Carte.bat carte-config.xml) you can connect from QlikView with this connect string (user/password is cluster/cluster):

CUSTOM CONNECT TO “Provider=JDBCConnector_x64.dll;jdbc:pdi://localhost:8084/kettle?debugtrans=C:/Pentaho-Repo/jdbc.ktr&connector.driverClass=org.pentaho.di.core.jdbc.ThinDriver;XUserId=GdKfdRRNVbcIXSJOSB;XPassword=CFQeQRRNVbcIXSJOTH;”;

I have to mention here that the URL paramter connector.driverClass is used in the connect string to specify the JDBC driver class: org.pentaho.di.core.jdbc.ThinDriver

Then you can select from the configured service in QlikView Script with an SQL query as it would be a regular database table:

test:
select * from Service; // retrieves the data from step Output

This basic example shows how to stream data from a Pentaho Kettle (or PDI) transformation into QlikView. It is just the beginning of a new integration scenario if you can imagine how many Pentaho Input steps (how about the SAP Input or the Big Data section?) are available now for use in this case.

Also, if you already have Pentaho in production you can stream in data from existing transformations to tie both sides together more tightly.


Aug 20, 2012
#bigdata #dataintegration #jdbc #kettle #pentaho #qlikview #datafederation
How to use QlikView to discover data quality issues?

As a data quality management consultant at TIQ solutions I know and have used a lot of the data quality tools and suites of the market from big vendors and open source projects.

As I started to work with QlikView in 2006 (the first contact with QlikView was on version 3 or so in 1999) I realized very quickly the power of an associated in-memory analytics application to discover issues in the data.

This led into the decision to build some more professional data quality related QlikView applications which are availabe under the name TIQView: a set of QlikView applications helping consultants to deal with data quality issues.

I joined QlikCommunity in April 2009 and created this Data Quality group after the website relaunch to have a space for discussion and know how exchange on QlikView related data quality use cases, topics and questions.

Please contribute and tell us your story about the dirty data you’re facing in your project (I’m sure it was not only once) and share how you handled it with QlikView. Or, maybe you have questions about best practices in this realm and you’re looking for advice. Don’t hesitate to ask..

I hope I will find the time to post more of my experiences with this great Business AND Data Quality Discovery software. In the meantime you can try out TIQView and make your comments or post feature request.

Cheers,
Ralf

Aug 15, 2012
#qlikview #dataquality #dataprofiling #qlikcommunity

July 2012

3 posts

TIQ Solutions‘ QlikView Data Integration Landscape

Our QlikView data integration components are getting more and more versatile now.

(click on picture to enlarge)

Right now we have the following two products ready for use:

1. The QlikView JDBC Connector (already on QlikMarket):

It connects you to Java Database Connectivity (JDBC) data sources which is the industry standard for database-independent connectivity for a wide range of databases and tabular data sources (e.g. Java-based databases such as Apache Derby or Big Data environments like Hadoop Hive).

A new interesting development in this area has just started. Pentaho is working on a lightweight JDBC driver to call Kettle transformations. With this solution you are able to stream data into QlikView from any step of a Kettle transformation. This means that all Kettle input data sources and transformation steps can be used directly in QlikView via JDBC calls!

2. The JSON Proxy Server

It is a Java based program which converts any JSON formatted data from various sources such as RESTful web APIs, NoSQL databases or JSON files into an XML format. The converted XML data stream can be used as a web file source in reporting and analysis applications that do not support JSON format (such as QlikView). It supports all major 1.0a and 2.0 OAuth APIs out-of-the-box.

We work on several new use cases for the JSON proxy. It tends to be that this (JSON) porxy server evolves into a versatile any-to-any data and/or format conversation platform. It gives us also an environment where we can easy plug in new Java code. I think we will see some more interesting implementations of Business Intelligence related Java SDKs later this year.

The other QlikView related data integration components we provide are Java libraries which can deal with QlikView data files:

  • QVXConverter
  • QVDConverter

The QVXConverter is already used in several Java environments (you can read a bit more here), the QVDConverter’s reading part (the QVD Reader) is ready for testing. Don’t hesitate to contact us if you want to try out these components. Hopefully I get feedback from some keen QlikView enthusiasts in the next time.

We keep on working to close the white spots in the QlikView data integration landscape. Feel free to contact us to guide you through unknown territories..


Jul 16, 2012
#api #bigdata #dataintegration #java #jdbc #json #kettle #qlikview #qvd #qvx #restful #qvdreader #qvdwriter
QlikView JDBC Connector with Java 1.7 RE

If you want to try out our QlikView JDBC Connector (also on QlikMarket now) with the new Java 1.7 runtime environment you probably facing an error like “could not load jvm.dll”.

We found out there is an MSVC dll missing. To solve this problem install the Microsoft Visual C++ 2010 Redistributable Package and configer the Java 1.7 jvm.dll in the JDBC Connector.

Important note for the new Java 1.7 runtime environment:

You have to download and install the Microsoft Visual C++ 2010 Redistributable Package (x86) also!

Download for 32-bit:
http://www.microsoft.com/de-de/download/confirmation.aspx?id=5555

Download for 64-bit:
http://www.microsoft.com/de-de/download/confirmation.aspx?id=14632

Jul 12, 2012
#qlikview #jre #java #msvc #connector #jdbc
QlikView data integration in a Java world with QVX & QVD

After creating a Java lib called QVXConverter to read and write QlikView data eXchange files (QVX) which got integrated in several solutions:

  • Hadoop QVX Converter (Incentro HQC)
  • Pentaho Kettle QVX Writer Plugin (QlikCommunity)
    both from from Incentro, Netherlands
  • Lavastorm Analytics

I will start to work on a Java implementation of the QVD format.

The QVX file format still don’t shows the wanted loading performance in QlikView with large files even if it’s optimized for numericals like numbers and date/datetime and a blocksize is given.

Therefor the QVD file format - which can be used in an optimized QlikView load - must do the job in Big Data environments. Hence the QVD format is a bit more complex (but very delicious) it needs a different implementation approach and intelligent data caching (maybe I use Ehcache) in a Java application.

But I think this is doable and I can use some snippets of the QVXConverter because of some similarities..

Update: The QVDReader is finished! Now I can read and convert any QVD file outside of QlikView, under all OS platforms which can process Java.
Btw. Ehcache was not the right solution. I had to use a persisten key/value store to cover unlimited QVD file size. I have chosen JDBM3 which runs very stable so far.

Update 2: This is the very first demo of the Java-based QVDReader implementation. You can convert QlikView QVD files to CSV. Try it out!
I’m waiting for your feedback..

Update 3: Now I was able to increase performance of the QVDReader by 300% using a better JDBM3 implementation. It seems now more likely to go into a productive state..

Update 4: Our Java-based QVDWriter implementation is working now! This means we’re able to create QVD files in nearly any environment independently from QlikView.

Jul 6, 2012
#bigdata #dataintegration #java #qlikview #qvd #qvx #qvdreader #qvdwriter
Next page →
2012 2013
  • January 1
  • February 2
  • March 1
  • April 1
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
2011 2012 2013
  • January
  • February
  • March
  • April
  • May
  • June 9
  • July 3
  • August 4
  • September 1
  • October 3
  • November 4
  • December
2010 2011 2012
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
2009 2010 2011
  • January
  • February
  • March
  • April
  • May 3
  • June
  • July
  • August
  • September
  • October
  • November
  • December
2009 2010
  • January
  • February
  • March
  • April 1
  • May
  • June
  • July
  • August 3
  • September
  • October
  • November 9
  • December