Skip to Content

technology.epf

About this document

This document is a R notebook, dynamically created from the numbers extracted on the project. It lists all datasets published for the project, providing basic numbers, figures and a quick summary, and serves as a test case to make sure that all the required data is present and roughly consistent with requirements. All plots and tables are computed from the actual data as provided in the downloads.

To re-execute the document, simply start a R session, load rmarkdown and render the page with the project ID as a parameter:

require('rmarkdown')
render("datasets_report.Rmarkdown", params = list(project_id = "technology.epf"), output_format="html_document")

This website uses the blogdown R package, which provides a different output_format for the hugo framework.

This report was generated on 2022-08-14.

Downloads

All data is retrieved from Alambic, an open-source framework for development data extraction and processing.

This project’s analysis page can be found on the Alambic instance for the Eclipse forge, at https://eclipse.alambic.io/projects/technology.epf.

Downloads are composed of gzip’d CSV and JSON files. CSV files always have a header to name the fields, which makes it easy to import in analysis software like R:

data <- read.csv(file='myfile.csv', header=T)
names(data)

List of datasets generated for the project:

  • Git
    • Git Commits (CSV) – Full list of commits with id, message, time, author, committer, and added, deleted and modifed lines.
    • Git Commits Evol (CSV) – Evolution of number of commits and authors by day.
    • Git Log (TXT) – the raw export of git log.
  • Bugzilla
  • Eclipse Forums
    • Forums Posts (CSV) – list of all forum posts for this project.
    • Forums threads (CSV) – list of all forum threads for this project.
  • Eclipse PMI
    • PMI Checks (CSV) – list of all checks applied to the Project Management Infrastructure entries for the project.

Git

Git commits

Download: git_commits_evol.csv.gz

data <- read.csv(file=file_git_commits_evol, header=T)

File is git_commits_evol.csv, and has 3 columns for 20 entries.

data$commits_sum <- cumsum(data$commits)
data.xts <- xts(x = data[,c('commits_sum', 'commits', 'authors')], order.by=as.POSIXct(as.character(data[,c('date')]), format="%Y-%m-%d"))

time.min <- index(data.xts[1,])
time.max <- index(data.xts[nrow(data.xts)])
all.dates <- seq(time.min, time.max, by="days")
empty <- xts(order.by = all.dates)

merged.data <- merge(empty, data.xts, all=T)
merged.data[is.na(merged.data) == T] <- 0

p <-dygraph(merged.data[,c('commits')],
        main = paste('Daily commits for ', project_id, sep=''),
        width = 800, height = 250 ) %>%
      dyRangeSelector()
p


Git log

Download: git_log.txt.gz

File is git_log.txt, and full log has 89 lines.


Bugzilla

Bugzilla issues

Download: bugzilla_issues.csv.gz

data <- read.csv(file=file_bz_issues, header=T)

File is bugzilla_issues.csv, and has 17 columns for 3223 issues.

Bugzilla open issues

Download: bugzilla_issues_open.csv.gz

data <- read.csv(file=file_bz_issues_open, header=T)

File is bugzilla_issues_open.csv, and has 17 columns for 331 issues (all open).

Bugzilla evolution

Download: bugzilla_evol.csv.gz

data <- read.csv(file=file_bz_evol, header=T)

File is bugzilla_evol.csv, and has 3 columns for 859 weeks.

Let’s try to plot the monthly number of submissions for the project:

Versions

Download: bugzilla_versions.csv.gz

data <- read.csv(file=file_bz_versions, header=T)

File is bugzilla_versions.csv, and has 2 columns for 28 weeks.

Components

Download: bugzilla_components.csv.gz

data <- read.csv(file=file_bz_components, header=T)

File is bugzilla_components.csv, and has 2 columns for 6 weeks.

data.sorted <- data[order(data$Bugs, decreasing = T),]

g <- gvisColumnChart(data.sorted, options=list(title='List of product components', legend="{position: 'none'}", width="automatic", height="300px"))
plot(g)

Eclipse Forums

Forums posts

Download: eclipse_forums_posts.csv.gz

data <- read.csv(file=file_forums_posts, header=T)

File is eclipse_forums_posts.csv, and has 6 columns for 3238 posts. The evolution of posts

data$created.date <- as.POSIXct(data$created_date, origin="1970-01-01")
posts.xts <- xts(data, order.by = data$created.date)

time.min <- index(posts.xts[1,])
time.max <- index(posts.xts[nrow(posts.xts)])
all.dates <- seq(time.min, time.max, by="weeks")
empty <- xts(order.by = all.dates)

merged.data <- merge(empty, posts.xts$id, all=T)
merged.data[is.na(merged.data) == T] <- 0

posts.weekly <- apply.weekly(x=merged.data, FUN = nrow)
names(posts.weekly) <- c("posts")

p <- dygraph(
  data = posts.weekly[-1,],
  main = paste('Weekly forum posts for ', project_id, sep=''),
  width = 800, height = 250 ) %>%
  dyAxis("x", drawGrid = FALSE) %>%
  dySeries("posts", label = "Weekly posts") %>%
  dyOptions(stepPlot = TRUE) %>%
  dyRangeSelector()
p

The list of the 10 last active posts on the forums:

data$created.date <- as.POSIXct(data$created_date, origin="1970-01-01")
posts.table <- head(data[,c('id', 'subject', 'created.date', 'author_id')], 10)
posts.table$subject <- paste('<a href="', posts.table$html_url, '">', posts.table$subject, '</a>', sep='')
posts.table$created.date <- as.character(posts.table$created.date)
names(posts.table) <- c('ID', 'Subject', 'Post date', 'Post author')

print(
    xtable(head(posts.table, 10),
        caption = paste('10 most recent posts on', project_id, 'forum.', sep=" "),
        digits=0, align="lllll"), type="html",
    html.table.attributes='class="table table-striped"',
    caption.placement='bottom',
    include.rownames=FALSE,
    sanitize.text.function=function(x) { x }
)
10 most recent posts on technology.epf forum.
ID Subject Post date Post author
1853375 Termination review schedule for EPF Project 2022-06-29 19:48:22 25918
1853184 Proposed termination review for EPF project 2022-06-22 07:29:08 25918
1848296 Re: The EPF Wiki seems to be down (epf.eclipse.org) 2021-11-26 17:27:41 205578
1848294 Re: The EPF Wiki seems to be down (epf.eclipse.org) 2021-11-26 16:27:09 233405
1848276 Re: The EPF Wiki seems to be down (epf.eclipse.org) 2021-11-26 11:28:58 233405
1848271 Re: The EPF Wiki seems to be down (epf.eclipse.org) 2021-11-26 09:33:57 205578
1848233 The EPF Wiki seems to be down (epf.eclipse.org) 2021-11-25 16:32:50 233405
1832307 Generate static HTML and host on GitHub Pages? 2020-09-16 04:00:31 230115
1830352 Re: Where are the project downloads? 2020-07-23 20:47:57 229654
1818224 Re: SAFe in Composer format 2019-12-10 13:04:52 205578


Forums threads

Download: eclipse_forums_threads.csv.gz

data <- read.csv(file=file_forums_threads, header=T)

File is eclipse_forums_threads.csv, and has 8 columns for 1189 threads. A wordcloud with the main words used in threads is presented below.

The list of the 10 last active threads on the forums:

data$last.post.date <- as.POSIXct(data$last_post_date, origin="1970-01-01")
threads.table <- head(data[,c('id', 'subject', 'last.post.date', 'last_post_id', 'replies', 'views')], 10)
threads.table$subject <- paste('<a href="', threads.table$html_url, '">', threads.table$subject, '</a>', sep='')
threads.table$last.post.date <- as.character(threads.table$last.post.date)
names(threads.table) <- c('ID', 'Subject', 'Last post date', 'Last post author', 'Replies', 'Views')

print(
    xtable(threads.table,
        caption = paste('10 last active threads on', project_id, 'forum.', sep=" "),
        digits=0, align="lllllll"), type="html",
    html.table.attributes='class="table table-striped"',
    caption.placement='bottom',
    include.rownames=FALSE,
    sanitize.text.function=function(x) { x }
)
10 last active threads on technology.epf forum.
ID Subject Last post date Last post author Replies Views
1111095 Termination review schedule for EPF Project 2022-06-29 19:48:22 1853375 0 1682
1111038 Proposed termination review for EPF project 2022-06-22 07:29:08 1853184 0 17411
1109535 The EPF Wiki seems to be down (epf.eclipse.org) 2021-11-26 17:27:41 1848296 4 1640
1105215 Generate static HTML and host on GitHub Pages? 2020-09-16 04:00:31 1832307 0 3237
1101617 SAFe in Composer format 2019-12-10 13:04:52 1818224 1 6500
1101173 Composer is down again 2019-11-05 16:57:10 1816727 3 1559
1099034 EPF Composer in AMASS Platform 2019-11-05 17:00:25 1816728 3 37065
1098887 Where are the project downloads? 2020-07-23 20:47:57 1830352 5 6714
1097487 Current EPF Composer Usage 2019-02-12 12:23:08 1802593 3 2401
1096871 Knowing the effected files when a task/Capability pattern is modified in EPF Composer 2019-01-09 12:27:07 1800889 1 2502

PMI

PMI Checks

Download: eclipse_pmi_checks.csv.gz

data <- read.csv(file=file_pmi_checks, header=T)

File is eclipse_pmi_checks.csv, and has 3 columns for 17 commits.

checks.table <- head(data[,c('Description', 'Value', 'Results')], 10)

print(
    xtable(checks.table,
        caption = paste('Extract of the 10 first PMI checks for ', 
                        project_id, '.', sep=" "),
        digits=0, align="llll"), type="html",
    html.table.attributes='class="table table-striped"',
    caption.placement='bottom',
    include.rownames=FALSE,
    sanitize.text.function=function(x) { x }
)
Extract of the 10 first PMI checks for technology.epf .
Description Value Results
Checks if the URL can be fetched using a simple get query. https://bugs.eclipse.org/bugs/enter\_bug.cgi?product=EPF OK: Create URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. https://bugs.eclipse.org/bugs/buglist.cgi?product=EPF OK: Query URL could be successfully fetched.
Sends a get request to the given CI URL and looks at the headers in the response (200 404..). Also checks if the URL is really a Hudson instance (through a call to its API). Failed: could not get CI URL \[\].
Checks if the Dev ML URL can be fetched using a simple get query. https://dev.eclipse.org/mailman/listinfo/epf-dev OK: Dev ML URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. http://www.eclipse.org/epf/general/developers\_documentation.php OK: Documentation URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. http://www.eclipse.org/epf/downloads/downloads.php OK: Download URL could be successfully fetched.
Checks if the Forums URL can be fetched using a simple get query. http://www.eclipse.org/forums/eclipse.technology.epf OK. Forum \[eclipse.technology.epf\] correctly defined.\\OK: Forum \[eclipse.technology.epf\] URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. http://www.eclipse.org/epf/general/getting\_started.php OK: Documentation URL could be successfully fetched.
Checks if the Mailing lists URL can be fetched using a simple get query. Failed: no mailing list defined.
Checks if the URL can be fetched using a simple get query. http://www.eclipse.org/epf/plan/epf\_1-5-1-8\_project\_plan.xml OK: Plan URL could be successfully fetched.