Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » BIRT » Data Cube Count Distinct missing(Using count for a summary field in a data cube yields the wrong result, why is count distinct missing?)
Data Cube Count Distinct missing [message #1031941] Tue, 02 April 2013 11:44 Go to next message
Marius Lessiak is currently offline Marius LessiakFriend
Messages: 6
Registered: September 2012
Junior Member
I have a dataset which returns rows of stories along with some data, see attached csv:
----------
storyid, date, category, views
1, 2013-01-01, news, 30
1, 2013-01-02, news, 465
..
2, 2013-01-01, sport, 65
2, 2013-01-02, sport, 423
..
3, 2013-01-01, games, 5
3, 2013-01-03, games, 54
----------

I want to analyze the data in regard to the categories and the time periods: How many views has a category per month and how many stories are published in a category in a month? So I create a data cube and add these dimensions as groups. As summary fields I pick the views (sum) and the storyid (count). See attached rptdesign.

Now everything is fine except the number of stories per category per month: Since the only option is to use COUNT as function in the summary field for storyid too many stories are counted. That happens because there are multiple rows for the same storyid containing data for different dates.

I assume a COUNTDISTINCT would work but it seems that function is not available for a cube summary field. Am i right? If yes (why oh why?) is there a way to circumevent this problem?

[Updated on: Tue, 02 April 2013 11:46]

Report message to a moderator

Re: Data Cube Count Distinct missing [message #1032231 is a reply to message #1031941] Tue, 02 April 2013 18:51 Go to previous messageGo to next message
Michael Williams is currently offline Michael WilliamsFriend
Messages: 1687
Registered: July 2009
Senior Member

You could probably create a computed column in your dataSet to keep track of this for you, then, use this in your cube. In this example, would all of the counts be 1, since no category has multiple ids?

Michael

Add interactivity, scheduling, sharing, live Excel export, and JS integration to your BIRT app with the new, free BIRT server, BIRT iHub F-Type.
Re: Data Cube Count Distinct missing [message #1034548 is a reply to message #1032231] Fri, 05 April 2013 15:16 Go to previous messageGo to next message
Marius Lessiak is currently offline Marius LessiakFriend
Messages: 6
Registered: September 2012
Junior Member
Thank you for the quick reply!

Michael Williams wrote on Tue, 02 April 2013 14:51
In this example, would all of the counts be 1, since no category has multiple ids?


Yes in the example above the expected result is 1 story per category.

I attached another example (cube_countdistinct_v2.csv) with more stories per category (this is dummy data of course). The expected result is as displayed in the attached cube_countdistinct_v2_result.png .

Before your answer I came up with the solution to calculate the uniqueness of every storyid in the summary function of the cube (I found pointers to this on birt-exchange). Part of my question is if this is the way to go (seems a bit hackish to me, seems to me that yes). See attached cube_countdistinct_v2.rptdesign . The code is:

var storyids = vars["storyids"];
var storyid = dataSetRow["comp_storyid"];
var date = dataSetRow["comp_date"];
var dateKey = Formatter.format(date, "YYYYMM");
var val = 0;
if (storyids[dateKey] == undefined) {
   storyids[dateKey] = {};
   storyids[dateKey][storyid] = 1;
   val = 1;
} else {
   if (storyids[dateKey][storyid] == undefined) {
      storyids[dateKey][storyid] = 1;
      val = 1;
   }
}
val


The performance impact is well observable (speed and in-memory-usage). Do you recommend something similar to use as a computed column? Will this increase performance (mabye because it is outside of the cube)? Is it advisable to use a report variable to hold the "hashmap" for the storyids?
Re: Data Cube Count Distinct missing [message #1034582 is a reply to message #1031941] Fri, 05 April 2013 16:10 Go to previous messageGo to next message
Michael Williams is currently offline Michael WilliamsFriend
Messages: 1687
Registered: July 2009
Senior Member

My thoughts were very similar to that on how I'd do the computed column. Here is the computed column code I had that works the same:

monthYear = Formatter.format(row["comp_date"], "MMYY");
temp = reportContext.getPersistentGlobalVariable(row["storyid"] + row["category"] + monthYear);
if(temp == null){
	reportContext.setPersistentGlobalVariable(row["storyid"] + row["category"] + monthYear, "set");
	1;
}
else{
	0;
}


Then, you just use this in your cube as a summed measure.


Michael

Add interactivity, scheduling, sharing, live Excel export, and JS integration to your BIRT app with the new, free BIRT server, BIRT iHub F-Type.

[Updated on: Fri, 05 April 2013 16:12]

Report message to a moderator

Re: Data Cube Count Distinct missing [message #1036524 is a reply to message #1034582] Mon, 08 April 2013 13:49 Go to previous messageGo to next message
Marius Lessiak is currently offline Marius LessiakFriend
Messages: 6
Registered: September 2012
Junior Member
Your code works flawlessly in my simplified example. When I applied it to my real report I noticed it stopped working: The real report uses a joint dataset as the source for the cube and that seems to be the problem. Interestingly it works when using the custom code inside the cube and not for a computed column in the joint dataset.

See attached report with a joint dataset made of the 2 attached csv datasources (the same data as before just separated). It shows 2 tables, one is not working (extra code as computed column) and the other is working (extra code inside the cube). I'll go with the working solution Smile but asked myself: Did i do something wrong or is this a bug or an unsupported feature?

[Updated on: Mon, 08 April 2013 14:36]

Report message to a moderator

Re: Data Cube Count Distinct missing [message #1036946 is a reply to message #1036524] Tue, 09 April 2013 02:13 Go to previous messageGo to next message
Michael Williams is currently offline Michael WilliamsFriend
Messages: 1687
Registered: July 2009
Senior Member

Very weird. That definitely seems like a bug. Please log a bug report for this in the bugzilla.

Michael

Add interactivity, scheduling, sharing, live Excel export, and JS integration to your BIRT app with the new, free BIRT server, BIRT iHub F-Type.
Re: Data Cube Count Distinct missing [message #1037261 is a reply to message #1036946] Tue, 09 April 2013 10:53 Go to previous messageGo to next message
Marius Lessiak is currently offline Marius LessiakFriend
Messages: 6
Registered: September 2012
Junior Member
I filed a bug report.
Re: Data Cube Count Distinct missing [message #1037519 is a reply to message #1037261] Tue, 09 April 2013 16:58 Go to previous message
Michael Williams is currently offline Michael WilliamsFriend
Messages: 1687
Registered: July 2009
Senior Member

Thanks!

Michael

Add interactivity, scheduling, sharing, live Excel export, and JS integration to your BIRT app with the new, free BIRT server, BIRT iHub F-Type.
Previous Topic:Firefox not showing the selected fonts for report
Next Topic:Table Footer at end of Every Page
Goto Forum:
  


Current Time: Mon Dec 22 10:07:56 GMT 2014

Powered by FUDForum. Page generated in 0.01934 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software