Giter Site home page Giter Site logo

vizgrimoirer's Introduction

vizGrimoireR

R package vizgrimoire, to make life easier to those using Metrics Grimoire tools, and maybe vizGrimoireJS.

Some more documentation can be found in the vizGrimoireR wiki.

Dependencies and installation from source

See the vizGrimoireR wiki for up to date information about dependencies and how to install the vizgrimoire R library from sources.

General issues

Each class is defined in the corresponding file, with a name starting with "Class", followed by the name of the class. For example, class Query is defined in file ClassQuery.

Query class hierarchy

Hierarchy of R classes to deal with queries on SQL databases created by Metrics Grimoire.

Query: Root of the hierarchy

Methods:

  • run: Returns a data frame with selected rows and field

ITSTicketsTimes: class for handling the many times of each ticket

This class, when initialized, makes a query on an ITS (issue tracking system) database, and stores the result as a data frame with the many times relevant for each ticket (open, closed, changed, etc.)

Methods:

  • initalize (constructor): Accepts a query (by default uses its own one, which should work). Stores as columns in the dataset several times: time to fix (first fix), time to fix (last fix), time to fix (in hours), etc.

  • JSON: Dumps a JSON file

  • QuantilizeYears: Obtains a data frame with yearly quantiles data. Each column in the data frame will correspond to the quantiles for each year.

Example:

issues_closed <- new ("ITSTicketsTimes")
quantiles_ttofixm_year <- QuantilizeYears (issues_closed, quantiles_spec)
plotTimeSerieYearN (quantiles_ttofixm_year, as.character(quantiles_spec),
                'its-quantiles-year-time_to_fix_min')

ITSMonthly: class for dealing with monthly parameters

This class provides a framework for quering a database looking for aggregated monthly parameters (such as tickets open and ticker openers per month). Most of the functionality is here (initialize the object, create JSON files, etc.), but each child specializes its particularities, which are mainly the query needed to extract the data from the database.

Methods

  • initalize (constructor): uses the query in the children to get a monthly data frame. Each row corresponds to the data for a month. Each column is either one of the parameters queried, or some auxiliary value: id (year*12+month), year, month and a char format to show the month (such as Jun 2001).

  • Query: just a void class, a placeholder for children specifying the query to be performed for the specific data they contain

  • JSON: writes the object into a JSON file

ITSMonthlyOpen: class for tickets open, openers per month

Inherits from ITSMonthly.

Object with information about tickets open, and ticket openers, per month.

Methods

  • Query: returns the SQL query to obtain the data for the object

Example of use

open.monthly <- new ("ITSMonthlyOpen")
JSON(open.monthly, "its-open-monthly.json")

ITSMonthlyChanged: class for tickets changed, changers per month

Inherits from ITSMonthly.

Object with information about tickets changed, and ticket changers, per month.

Methods

  • Query: returns the SQL query to obtain the data for the object

ITSMonthlyClosed: class for tickets closed (first close) per month

Inherits from ITSMonthly.

Object with information about tickets closed (first close) per month.

Methods

  • Query: returns the SQL query to obtain the data for the object

ITSMonthlyLastClosed: class for tickets closed (last close) per month

Inherits from ITSMonthly.

Object with information about tickets closed (last close) per month.

Methods

  • Query: returns the SQL query to obtain the data for the object

ITSMonthlyVarious: class for all monthly parameters related to tickets

Inherits from ITSMonthly.

Object with information about all monthly parameters related to tickets. Internally, it instantiated objects of all the sister classes, and merges them. Therefore, no query is done directly by this class: sister classes are the ones actually querying the database.

Methods

  • initalize (constructor): Instatiates objects of the sister classes, and merges them to obtain a data frame with all monthly parameters relevant to tickets.

Time series class hierarchy

Hierarchy for dealing with specialized time series

TimeSeries: Root of the hierarchy

Still to be written

TimeSeriesYears: Class for annual time series

Inherits from ts (should inherit from TimeSeries)

Methods

  • initalize (constructor): Accepts time serie to initialize, along with the list of columns and the labels to use for those columns.

  • Plot: Plots columns in object, using labels (if specified)

  • JSON: Dumps objet to file, as JSON

Times class hierarchy

Hierarchy for handling a vector with times for certain events (for example, time to fix for a list of tickets)

Times: Root of the hierarchy

Inherits from vector

Methods:

  • initalize (constructor): accepts vector with times, and strings with units and label

  • PlotDist: Plots distribution of times (several histograms and density of probability)

Example of use:

issues_closed <- new ("ITSTicketsTimes")
tofix <- new ("Times", issues_closed$ttofix, "days",
              "Time to fix, first close")
PlotDist (tofix, 'its-distrib_time_to_fix')

vizgrimoirer's People

Contributors

canasdiaz avatar dicortazar avatar jgbarah avatar magicbroom avatar markdo avatar rodrigoprimo avatar sduenas avatar softmarina avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

vizgrimoirer's Issues

Add timezone analysis for mailing lists

Messages in mailing lists include headers with dates & times, which include timezone (TZ) information. TZ can be used to infer the geographical area where the poster of the message was working while posting. Of course, this only allows for locating people in one of 24 TZs, and only assuming the time in their computer have correct TZ information. To make things worse, summer time, used in many countries, causes some indetermination when allocation to TZs, since we really don't know if we're in x or in x+1. But some analysis can be done, and some results can be obtained

unifypeople script is not creating tables first time

It the user is not so wise to use the --incremental=no flag, the unifypeople script is not creating tables first time and crashes.

If the tables don't exist, they should be created no matter what incremental mode is selected.

< Incremental means more than before! >
 -------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

Add multireport option

Right now R scripts should be run one time for each kind of report. The proposal is to support adding several reports at the same time show: "-r repos,companies" is supported for example.

Error in GetLastActivityITS (bad SQL)

Error when invoking GetLastActivityITS:

RS-DBI driver: (could not run statement: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'ANDfield='closed'' at line 6)
Calls: GetLastActivityITS ... .valueClassTest -> is -> is -> mysqlExecStatement -> .Call

Help on running MetricsGrimoire Tool for GitHub example

Hi Guys,

I am trying to get running this example: https://github.com/VizGrimoire/VizGrimoireR/wiki/Example-of-use-with-GitHub-projects

But getting the following error on this step:
[Run MetricsGrimoire tools, creating the corresponding databases, and prepare vizGrimoireR as a package ready to run]
~/src/vizGrimoire/VizGrimoireR/examples/github/vg-github.py --user jgb --passwd XXX
--dir /tmp/temp --removedb --ghuser jgbarah --ghpasswd XXXGH
--vgdir ~/src/vizGrimoire VizGrimoire/VizGrimoireR

Running MetricsGrimoire tool (cvsanaly)
Traceback (most recent call last):
File "vg-github.py", line 461, in
run_mgtools (["cvsanaly", "bicho"], repos, dbPrefix)
File "vg-github.py", line 164, in run_mgtools
run_mgtool (tool, project, dbname)
File "vg-github.py", line 144, in run_mgtool
call(opts)
File "/usr/lib/python2.7/subprocess.py", line 523, in call
return Popen(_popenargs, *_kwargs).wait()
File "/usr/lib/python2.7/subprocess.py", line 711, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1340, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

I can see the script is cloning the repo succesfully at /tmp/temp directory and also creating the database on MySQL (with no tables yet).

Also can see the tools were installed at /tmp/mg
Bicho CVSAnalY MailingListStats mg-paths.sh RepositoryHandler

Further, echo $PATH is
/tmp/mg/CVSAnalY:/tmp/mg/Bicho/bin:/tmp/mg/MailingListStats:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

and echo $PYTHONPATH is
/tmp/mg/CVSAnalY:/tmp/mg/RepositoryHandler:/tmp/mg/Bicho:/tmp/mg/MailingListStats:

Much appreciated your help. Gracias!

evol_countries function missing

Repositories per month

Executing the scm-analysis.R script I saw this error:

repositories <- evol_repositories(nperiod, conf$startdate, conf$enddate)
[1] "WARNING: evol_repositories is a deprecated function, use instead EvolRepositories"
data_repositories <- completePeriod(repositories, nperiod, conf)

if (conf$reports == 'companies') {

  • companies <- evol_companies(nperiod, conf$startdate, conf$enddate)
    
  • data_companies <- completePeriod(companies, nperiod, conf)
    
  • }
    if (conf$reports == 'countries') {
  • countries <- evol_countries(nperiod, conf$startdate, conf$enddate)
    
  • data_countries <- completePeriod(countries, nperiod, conf)
    
  • }
    Error: no se pudo encontrar la función "evol_countries"

completePeriodId fails

When using weeks instead of months, the completePeriodIds does not properly order and provides wrong information, with wrong order of the dataset.

The functions EvolCommits, EvolAuthors, etc seem to work well, but when data is filtered by completePeriodIds and unixtime and id fields added, this fails.

AggAllParticipants doesn't need dates

Currently, AggAllParticipants in ITS.R is called as:

AggAllParticipants (startdate, enddate)

but it doesn't seem to use (nor needs) dates at all. If I'm right, those two arguments should be removed.

Error in GetDiffCommitsDays / GetDates

When running the current examples/github/scm-analysis-github.R script, I get an error:

$str_enddate
[1] "2014-01-03"
$str_startdate
[1] "2012-08-21"
[1] "Evolutionary"
Error in charToDate(x) : 
  character string is not in a standard unambiguous format
Calls: GetDiffCommitsDays ... GetDates -> as.Date -> as.Date.character -> charToDate

I suspect some trouble when converting dates in GetDates, I'm still researching...

Queries mix 'mailing_list_url' and 'mailing_list' fields in WHERE clause

In MLS.R file, when 'mailing_list' filter is returned by reposField() function, reposNames() will return a list with the names of the mailing lists. If this names are passed as arguments of MLS functions, empy results will be returned because queries use, in WHERE clauses, 'mailing_list_url' field as filter but not 'mailing_list' field.

Quotes in strings with dates should not be needed

We're having some weird way of specifying strings with dates. Usually, a string with a date should be something like "2013-12-31". However, in some cases, we have to use "'2013-12-31'" (note the backslash-quote as a part of the string).

An example is the argument startdate (and enddate) for GetSCMEvolutionaryData.

This seems to be due to some lower functions needing the quotes in the string to compose queries for the database. I've traced this to at least two functions: GetSQLPeriod and GetSQLGlobal in Auxiliary.R. Instead of adding the quotes themselves (they are needed to compose the SQL queries), they relay on the parameters already having the quotes.

My proposal is that we should avoid all of this, since it is weird, difficult to explain, error-prone and probably has only historical reasons. Dates should always be strings, with no need for extra quotes.

However, this could break scripts calling all these functions, so the change should be done with care. As soon as I have some time, I could produce a patch for this. But meanwhile, opinions are welcome.

Improve metrics for MLS

Right now in MLS we have sent and senders metrics. We need to add other metrics like:

  • Messages that init a thread
  • Avg Messages per thread
  • Avg persons per thread

and others

IRC.R Refactoring

Typically, functions should follow the same workflow. However in IRC, depending on the function,

  • this is using its own workflow (so building its own query). (StaticNumSentIRC)
  • in other cases it aimed at using the common workflow (but for some reason this is using SCM query constructors) (GetSentIRC)
  • and in other cases those are using some internal workflow (GetListPeopleIRC)

Those using SCM functions will probably fail if another type of analysis different from the basic analysis is performed.

In addition extra testing should be added in any case...

Add new metrics for MLS Data Source

We have added new metrics for MLS:

  • threads
  • sent_init
  • sent_responses
  • sent_no_response (only static)
  • senders_init
  • senders_responses
  • thread_size_avg (only static)
  • thread_persons_avg (only static)

ccb8834 and commits before

ProcessAges, GetAges in ClassDemographics.R no longer working

It seems that when the class was refactored for working with mls, scm and its (adding among others the Aging and Birth functions) something was broken: now ProcessAges, GetAges are not working, at least the way they worked....

For example, this code is not working anymore:

�demos <- new ("Demographics", type="mls", months=6)
ProcessAges (demos, "2007-10-01", "/tmp/htmltidy-")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.