Skip to main content



      Home
Home » Archived » BIRT » Data Cube Count Distinct missing(Using count for a summary field in a data cube yields the wrong result, why is count distinct missing?)
Data Cube Count Distinct missing [message #1031941] Tue, 02 April 2013 07:44 Go to next message
Eclipse UserFriend
I have a dataset which returns rows of stories along with some data, see attached csv:
----------
storyid, date, category, views
1, 2013-01-01, news, 30
1, 2013-01-02, news, 465
..
2, 2013-01-01, sport, 65
2, 2013-01-02, sport, 423
..
3, 2013-01-01, games, 5
3, 2013-01-03, games, 54
----------

I want to analyze the data in regard to the categories and the time periods: How many views has a category per month and how many stories are published in a category in a month? So I create a data cube and add these dimensions as groups. As summary fields I pick the views (sum) and the storyid (count). See attached rptdesign.

Now everything is fine except the number of stories per category per month: Since the only option is to use COUNT as function in the summary field for storyid too many stories are counted. That happens because there are multiple rows for the same storyid containing data for different dates.

I assume a COUNTDISTINCT would work but it seems that function is not available for a cube summary field. Am i right? If yes (why oh why?) is there a way to circumevent this problem?

[Updated on: Tue, 02 April 2013 07:46] by Moderator

Re: Data Cube Count Distinct missing [message #1032231 is a reply to message #1031941] Tue, 02 April 2013 14:51 Go to previous messageGo to next message
Eclipse UserFriend
You could probably create a computed column in your dataSet to keep track of this for you, then, use this in your cube. In this example, would all of the counts be 1, since no category has multiple ids?
Re: Data Cube Count Distinct missing [message #1034548 is a reply to message #1032231] Fri, 05 April 2013 11:16 Go to previous messageGo to next message
Eclipse UserFriend
Thank you for the quick reply!

Michael Williams wrote on Tue, 02 April 2013 14:51
In this example, would all of the counts be 1, since no category has multiple ids?


Yes in the example above the expected result is 1 story per category.

I attached another example (cube_countdistinct_v2.csv) with more stories per category (this is dummy data of course). The expected result is as displayed in the attached cube_countdistinct_v2_result.png .

Before your answer I came up with the solution to calculate the uniqueness of every storyid in the summary function of the cube (I found pointers to this on birt-exchange). Part of my question is if this is the way to go (seems a bit hackish to me, seems to me that yes). See attached cube_countdistinct_v2.rptdesign . The code is:

var storyids = vars["storyids"];
var storyid = dataSetRow["comp_storyid"];
var date = dataSetRow["comp_date"];
var dateKey = Formatter.format(date, "YYYYMM");
var val = 0;
if (storyids[dateKey] == undefined) {
   storyids[dateKey] = {};
   storyids[dateKey][storyid] = 1;
   val = 1;
} else {
   if (storyids[dateKey][storyid] == undefined) {
      storyids[dateKey][storyid] = 1;
      val = 1;
   }
}
val


The performance impact is well observable (speed and in-memory-usage). Do you recommend something similar to use as a computed column? Will this increase performance (mabye because it is outside of the cube)? Is it advisable to use a report variable to hold the "hashmap" for the storyids?
Re: Data Cube Count Distinct missing [message #1034582 is a reply to message #1031941] Fri, 05 April 2013 12:10 Go to previous messageGo to next message
Eclipse UserFriend
My thoughts were very similar to that on how I'd do the computed column. Here is the computed column code I had that works the same:

monthYear = Formatter.format(row["comp_date"], "MMYY");
temp = reportContext.getPersistentGlobalVariable(row["storyid"] + row["category"] + monthYear);
if(temp == null){
	reportContext.setPersistentGlobalVariable(row["storyid"] + row["category"] + monthYear, "set");
	1;
}
else{
	0;
}


Then, you just use this in your cube as a summed measure.

[Updated on: Fri, 05 April 2013 12:12] by Moderator

Re: Data Cube Count Distinct missing [message #1036524 is a reply to message #1034582] Mon, 08 April 2013 09:49 Go to previous messageGo to next message
Eclipse UserFriend
Your code works flawlessly in my simplified example. When I applied it to my real report I noticed it stopped working: The real report uses a joint dataset as the source for the cube and that seems to be the problem. Interestingly it works when using the custom code inside the cube and not for a computed column in the joint dataset.

See attached report with a joint dataset made of the 2 attached csv datasources (the same data as before just separated). It shows 2 tables, one is not working (extra code as computed column) and the other is working (extra code inside the cube). I'll go with the working solution Smile but asked myself: Did i do something wrong or is this a bug or an unsupported feature?

[Updated on: Mon, 08 April 2013 10:36] by Moderator

Re: Data Cube Count Distinct missing [message #1036946 is a reply to message #1036524] Mon, 08 April 2013 22:13 Go to previous messageGo to next message
Eclipse UserFriend
Very weird. That definitely seems like a bug. Please log a bug report for this in the bugzilla.
Re: Data Cube Count Distinct missing [message #1037261 is a reply to message #1036946] Tue, 09 April 2013 06:53 Go to previous messageGo to next message
Eclipse UserFriend
I filed a bug report.
Re: Data Cube Count Distinct missing [message #1037519 is a reply to message #1037261] Tue, 09 April 2013 12:58 Go to previous message
Eclipse UserFriend
Thanks!
Previous Topic:Firefox not showing the selected fonts for report
Next Topic:Table Footer at end of Every Page
Goto Forum:
  


Current Time: Thu Apr 24 03:11:35 EDT 2025

Powered by FUDForum. Page generated in 0.03987 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top