I recently worked on a project that involved improving the performance of a user developed QlikView document load script. The problem I was told was that the reload took approximately 12 hours to complete. The request that I received was to examine the document and improve the data model and load strategy.
One issue that needed to be considered was that the user was using QlikView desktop in one location and the database server was across the country in another. In this case, all of the data to be loaded was moving across the corporate WAN with much more limited bandwidth than the typical LAN environment. To optimize the performance, we needed to minimize the size of the data being loaded.
Upon examination of the QlikView data model I found one large (fact) table completely denormalized. The model had many dimensional description fields loaded into this single fact table greatly expanding the row size. Another issue was that the SQL query was a very complex grouped statement with many aggregated, calculated measures in both the select list and in the group by clause. We suspected that this was a major cause of the poor performance.
To improve the load performance, we determined that the calculated measures should be performed in the QlikView load script and the aggregations should be performed by QlikView rather than the database query.
Since the non-aggregated query was returning 138 million rows, this led to a discussion of the most efficient way for QlikView to create the calculated measures.
We determined that our load script options were:
- Calculate the measures in the main QVD (preceding) load
This option prevents the QVD from loading in “qvd optimized” mode. This is an outcome that QlikView developers are taught to avoid.
- Add a preceding load to the standard preceding load to calculate the measures
This was the most popular option with the QlikView developers in the discussion. This seems to be the way most developers are taught.
- Use a load from RESIDENT from a temp table
Using the load from RESIDENT option allows for a “qvd optimized” load of the base table and then the calculated measures could be added and the temp table dropped. Obviously, we will have created two very large tables in QlikView before the temp table is dropped so sufficient memory is necessary.
So rather than selecting a strategy based on previous training I decided to benchmark each strategy to determine the best way in this scenario.
- Lenovo W510, i7 1.6 (base) GHz
- 16 GB RAM
- QlikView 11.2 SR1
Load based on 50,000,000 row QVD base
From the results of the benchmark we see a clear winner in terms of load performance and memory efficiency while performing calculations during a QVD load. The single preceding load is best in terms of performance and memory efficient. This seems to be due largely to a more efficient use of the available CPU power. All CPUs where running at nearly 100% in this scenario. In the “preceding load on a preceding load” scenario, the CPUs ran with less than 50% utilization.
In summary, keep it simple and perform all the calculations you can in the first LOAD statement as this seems to be the most time-efficient process.