Skip to Content

technology.collections

About this document

This document is a R notebook, dynamically created from the numbers extracted on the project. It lists all datasets published for the project, providing basic numbers, figures and a quick summary, and serves as a test case to make sure that all the required data is present and roughly consistent with requirements. All plots and tables are computed from the actual data as provided in the downloads.

To re-execute the document, simply start a R session, load rmarkdown and render the page with the project ID as a parameter:

require('rmarkdown')
render("datasets_report.Rmarkdown", params = list(project_id = "technology.collections"), output_format="html_document")

This website uses the blogdown R package, which provides a different output_format for the hugo framework.

This report was generated on 2022-08-14.

Downloads

All data is retrieved from Alambic, an open-source framework for development data extraction and processing.

This project’s analysis page can be found on the Alambic instance for the Eclipse forge, at https://eclipse.alambic.io/projects/technology.collections.

Downloads are composed of gzip’d CSV and JSON files. CSV files always have a header to name the fields, which makes it easy to import in analysis software like R:

data <- read.csv(file='myfile.csv', header=T)
names(data)

List of datasets generated for the project:

  • Git
    • Git Commits (CSV) – Full list of commits with id, message, time, author, committer, and added, deleted and modifed lines.
    • Git Commits Evol (CSV) – Evolution of number of commits and authors by day.
    • Git Log (TXT) – the raw export of git log.
  • Jenkins CI
  • Eclipse PMI
    • PMI Checks (CSV) – list of all checks applied to the Project Management Infrastructure entries for the project.
  • ScanCode

Git

Git commits

Download: git_commits_evol.csv.gz

data <- read.csv(file=file_git_commits_evol, header=T)

File is git_commits_evol.csv, and has 3 columns for 648 entries.

data$commits_sum <- cumsum(data$commits)
data.xts <- xts(x = data[,c('commits_sum', 'commits', 'authors')], order.by=as.POSIXct(as.character(data[,c('date')]), format="%Y-%m-%d"))

time.min <- index(data.xts[1,])
time.max <- index(data.xts[nrow(data.xts)])
all.dates <- seq(time.min, time.max, by="days")
empty <- xts(order.by = all.dates)

merged.data <- merge(empty, data.xts, all=T)
merged.data[is.na(merged.data) == T] <- 0

p <-dygraph(merged.data[,c('commits')],
        main = paste('Daily commits for ', project_id, sep=''),
        width = 800, height = 250 ) %>%
      dyRangeSelector()
p


Git log

Download: git_log.txt.gz

File is git_log.txt, and full log has 15209 lines.


Jenkins

Builds

Download: jenkins_builds.csv.gz

data <- read.csv(file=file_jenkins_builds, header=T)

File is jenkins_builds.csv, and has 7 columns for 379 commits.

ID Name Time Result
62 coverage-nightly \#62 1.518303e+12 ABORTED
2016-06-06\_21-00-37 coverage-nightly \#61 1.465261e+12 FAILURE
2016-06-05\_21-00-37 coverage-nightly \#60 1.465175e+12 FAILURE
2016-06-04\_21-00-38 coverage-nightly \#59 1.465088e+12 FAILURE
2016-06-03\_21-00-37 coverage-nightly \#58 1.465002e+12 FAILURE
2016-06-02\_21-00-37 coverage-nightly \#57 1.464916e+12 FAILURE
2016-06-01\_21-00-37 coverage-nightly \#56 1.464829e+12 FAILURE
144 deploy \#144 1.657033e+12 SUCCESS
143 deploy \#143 1.656424e+12 SUCCESS
142 deploy \#142 1.655828e+12 SUCCESS


Jobs

Download: jenkins_jobs.csv.gz

data <- read.csv(file=file_jenkins_jobs, header=T)

File is jenkins_jobs.csv, and has 15 columns for 12 commits.

Name Colour Last build time Health report
coverage-nightly aborted 1.518303e+12 20
deploy blue 1.657033e+12 100
deploy-p2-maven red 1.557007e+12 0
gsc-ec-converter upload blue 1.464575e+12 100
hipp-setting-analysis blue 1.467796e+12 50
javadoc blue 1.657035e+12 100
master blue 1.659236e+12 100
new-version blue 1.657122e+12 80
publish-p2-repo blue 1.657034e+12 100
release blue 1.657032e+12 100


PMI

PMI Checks

Download: eclipse_pmi_checks.csv.gz

data <- read.csv(file=file_pmi_checks, header=T)

File is eclipse_pmi_checks.csv, and has 3 columns for 17 commits.

checks.table <- head(data[,c('Description', 'Value', 'Results')], 10)

print(
    xtable(checks.table,
        caption = paste('Extract of the 10 first PMI checks for ', 
                        project_id, '.', sep=" "),
        digits=0, align="llll"), type="html",
    html.table.attributes='class="table table-striped"',
    caption.placement='bottom',
    include.rownames=FALSE,
    sanitize.text.function=function(x) { x }
)
Extract of the 10 first PMI checks for technology.collections .
Description Value Results
Checks if the URL can be fetched using a simple get query. Failed: no URL defined for create\_url.
Checks if the URL can be fetched using a simple get query. Failed: no URL defined for query\_url.
Sends a get request to the given CI URL and looks at the headers in the response (200 404..). Also checks if the URL is really a Hudson instance (through a call to its API). https://hudson.eclipse.org/collections/ OK. Fetched CI URL.\\OK. CI URL is a Hudson instance. Title is \[built-in\]
Checks if the Dev ML URL can be fetched using a simple get query. https://dev.eclipse.org/mailman/listinfo/collections-dev OK: Dev ML URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. https://www.eclipse.org/collections/\#refGuide OK: Documentation URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. https://www.eclipse.org/collections/\#start OK: Download URL could be successfully fetched.
Checks if the Forums URL can be fetched using a simple get query. http://eclipse.org/forums/eclipse.collections OK. Forum \[Eclipse Collections forum\] correctly defined.\\OK: Forum \[Eclipse Collections forum\] URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. https://www.eclipse.org/collections/\#learn OK: Documentation URL could be successfully fetched.
Checks if the Mailing lists URL can be fetched using a simple get query. Failed: no mailing list defined.
Checks if the URL can be fetched using a simple get query. Failed: no URL defined for plan.

ScanCode

Authors

Download: scancode_authors.csv.gz

data <- read.csv(file=file_sc_authors, header=T)

File is scancode_authors.csv, and has 2 columns for 2 commits.

Author Count
unknown 3339
collect 2
suppressPackageStartupMessages(library(googleVis))
options(gvis.plot.tag='chart')

data.sorted <- data[order(data$count, decreasing = T),]

p <- gvisPieChart(data.sorted,
              options = list(
                title=paste("Authors for project `r project_id` ", sep=""),
                sliceVisibilityThreshold=0, height=280,
                pieHole= 0.4))

print(p, 'chart')


Copyrights

Download: scancode_copyrights.csv.gz

data <- read.csv(file=file_sc_copyrights, header=T)

File is scancode_copyrights.csv, and has 2 columns for 24 commits.

Copyrights Count
Copyright (c) Goldman Sachs 2027
Copyright (c) Goldman Sachs and others 627
unknown 567
Copyright (c) The Bank of New York Mellon 52
Copyright (c) Shotaro Sano and others 11
Copyright (c) Bhavana Hindupur 9
Copyright (c) The Eclipse Foundation 8
Copyright (c) Ivan Sopov and others 7
Copyright (c) Shotaro Sano 7
Copyright (c) Eclipse Foundation, Inc. 5
suppressPackageStartupMessages(library(googleVis))
options(gvis.plot.tag='chart')

data.sorted <- data[order(data$count, decreasing = T),]

p <- gvisPieChart(data.sorted,
              options = list(
                title=paste("Copyrights for project `r project_id` ", sep=""),
                sliceVisibilityThreshold=0, height=280,
                pieHole= 0.4))

print(p, 'chart')


Holders

Download: scancode_holders.csv.gz

data <- read.csv(file=file_sc_holders, header=T)

File is scancode_holders.csv, and has 2 columns for 25 commits.

Holders Count
Goldman Sachs. \~ 2027
Goldman Sachs and others. \~ 627
unknown 568
The Bank of New York Mellon 52
Shotaro Sano and others 11
Bhavana Hindupur 9
The Eclipse Foundation 8
Ivan Sopov and others. \~ 7
Shotaro Sano 7
Eclipse Foundation, Inc. 5
suppressPackageStartupMessages(library(googleVis))
options(gvis.plot.tag='chart')

data.sorted <- data[order(data$count, decreasing = T),]

p <- gvisPieChart(data.sorted,
              options = list(
                title=paste("Holders for project `r project_id` ", sep=""),
                sliceVisibilityThreshold=0, height=280,
                pieHole= 0.4))

print(p, 'chart')


Licences

Download: scancode_licences.csv.gz

data <- read.csv(file=file_sc_licences, header=T)

File is scancode_licences.csv, and has 2 columns for 10 commits.

Licence Count
epl-1.0 OR bsd-new 2769
unknown 557
bsd-new 30
epl-1.0 28
cpl-1.0 AND other-permissive 6
bsd-new OR epl-2.0 5
apache-2.0 1
cpl-1.0 1
eclipse-sua-2011 1
mpl-1.1 1
suppressPackageStartupMessages(library(googleVis))
options(gvis.plot.tag='chart')

p <- gvisPieChart(data,
              options = list(
                title=paste("Licences for project `r project_id` ", sep=""),
                sliceVisibilityThreshold=0, height=280,
                pieHole= 0.4))

print(p, 'chart')


Programming Languages

Download: scancode_programming_languages.csv.gz

data <- read.csv(file=file_sc_pl, header=T)

File is scancode_licences.csv, and has 2 columns for 7 commits.

Programming Language Count
Java 2661
Python 371
unknown 234
Scala 54
HTML 19
ActionScript 3 1
Objective-C 1
suppressPackageStartupMessages(library(googleVis))
options(gvis.plot.tag='chart')

p <- gvisPieChart(data,
              options = list(
                title=paste("Programming languages for project `r project_id` ", sep=""),
                sliceVisibilityThreshold=0, height=280,
                pieHole= 0.4))

print(p, 'chart')


Special files

Download: scancode_special_files.csv.gz

data <- read.csv(file=file_sc_sf, header=T)

File is scancode_special_files.csv, and has 2 columns for 40 commits.

Holders Type
LICENSE-EDL-1.0.txt legal
LICENSE-EPL-1.0.txt legal
pom.xml manifest
README.md readme
README\_EXAMPLES.md readme
acceptance-tests/pom.xml manifest
eclipse-collections/pom.xml manifest
eclipse-collections/src/main/resources/LICENSE-EDL-1.0.txt legal
eclipse-collections/src/main/resources/LICENSE-EPL-1.0.txt legal
eclipse-collections-api/pom.xml manifest