bupaverse / edear Goto Github PK

Exploratory and descriptive analysis of event based data.

Home Page: https://bupaverse.github.io/edeaR/

License: Other

R 100.00%

edear's Introduction

bupaverse

The bupaverse is an open-source, integrated suite of R-packages for handling and analysing business process data, developed by the Business Informatics research group at Hasselt University, Belgium. Profoundly inspired by the tidyverse package, the bupaverse package is designed to facilitate the installation and loading of multiple bupaverse packages in a single step. Learn more about bupaverse at the bupaR.net homepage.

Installation

You can install bupaverse from CRAN with:

install.packages("bupaverse")

Development Version

You can install the development version of bupaverse from GitHub with:

# install.packages("devtools")
devtools::install_github("bupaverse/bupaverse")

Usage

library(bupaverse) will load the core bupaverse packages:

bupaR: Core package for business process analysis.
edeaR: Exploratory and descriptive analysis of event-based data.
eventdataR: Repository of sample process data.
processcheckR: Rule-based conformance checking and filtering.
processmapR: Visualise event-based data using, i.a., process maps.

An overview of the loaded packages and conflicts with other packages is shown after loading bupaverse:

library(bupaverse)
#> 
#> .______    __    __  .______      ___   ____    ____  _______ .______          _______. _______
#> |   _  \  |  |  |  | |   _  \    /   \  \   \  /   / |   ____||   _  \        /       ||   ____|
#> |  |_)  | |  |  |  | |  |_)  |  /  ^  \  \   \/   /  |  |__   |  |_)  |      |   (----`|  |__
#> |   _  <  |  |  |  | |   ___/  /  /_\  \  \      /   |   __|  |      /        \   \    |   __|
#> |  |_)  | |  `--'  | |  |     /  _____  \  \    /    |  |____ |  |\  \----.----)   |   |  |____
#> |______/   \______/  | _|    /__/     \__\  \__/     |_______|| _| `._____|_______/    |_______|
#>                                                                                                 
#> ── Attaching packages ─────────────────────────────────────── bupaverse 0.1.0 ──
#> ✔ bupaR         0.5.2     ✔ processcheckR 0.2.0
#> ✔ edeaR         0.9.1     ✔ processmapR   0.5.2
#> ✔ eventdataR    0.3.1     
#> ── Conflicts ────────────────────────────────────────── bupaverse_conflicts() ──
#> ✖ bupaR::filter()          masks stats::filter()
#> ✖ processmapR::frequency() masks stats::frequency()
#> ✖ edeaR::setdiff()         masks base::setdiff()
#> ✖ bupaR::timestamp()       masks utils::timestamp()
#> ✖ processcheckR::xor()     masks base::xor()

edear's People

Contributors

Stargazers

Watchers

Forkers

sentewolf urvikalia teofiln

edear's Issues

edeaR filter_infrequent_flows is not robust against event logs with `label` column and integer `activity_instance` column

I don't have the time to make a repex but want to post this so I don't forget.
Debugged down the issue with the function to be a column named label.

I think it goes wrong here (which is actually in base bupaR):

add_start_activity <- function(log, label = "Start") {    
    log %>%
      group_by_case() %>%
      arrange(.data[[timestamp(log)]]) %>%
      slice_events(1) %>%
      ungroup_eventlog() %>%
      mutate(!!timestamp(log) := .data[[timestamp(log)]] - 1,
             !!activity_id(log) := factor(label, levels = c(as.character(bupaR::activity_labels(log)), label)),
             !!activity_instance_id(log) := stri_c(.data[[case_id(log)]], "start", sep = "-")) -> start_states
    
    return(add_start_end_activity_bind_logs(log, start_states, label))
  }

It seems dplyr somehow evaluates some of the label variables to the column and not the parameter.
The other issue is the stri_c command to define the activity instance clashes if the instance is an int column.

Probably this should be moved to bupaR but this is how I got to the error.

resource involvement tooltip

I have created a resource-level resource involvement plot and wrapped it with ggplotly function.

plot_obj <- patients %>% resource_involvement("resource") %>% plot()
ggplotly(plot_obj)

In the plot, tooltip renders the same element multiple times. In ggplotly documentation, the tooltip default is set to "all", thereby showing all the aesthetic mappings(including the unofficial "text" aesthetic).

Assuming, resource activity plot maps the same element multiple times to the aesthetics attributes [ edeaR/resource_involvement.r , code snippet attached for reference], the tooltip element is rendered multiple times.

the ask is, Is there a way that this can be handled at the plot generation level i.e resource_involvement function, or any alternate way to solve this problem?

Error in filter_infrequent_flows()

Hi,

I've been trying to apply filter_infrequent_flows() function to eliminate those infrequent flows in process discovery. However, everytime I get returned the same error. You can try with the logs 'patients' or 'traffic_lines' (this last one is used as an example in the documentation).

patients %>% filter_infrequent_flows(min_n = 6) %>%process_map()

patients_act %>% filter_infrequent_flows(min_n = 6) %>%process_map()

traffic_fines %>%
     filter_infrequent_flows(min_n = 5) %>%
     process_map()

All of these examples return the same error.

"Error in `mutate()`:
ℹ In argument: `next_act = lead(activity, default = "END_ACT")`.
ℹ In group 1: `case_id = "A1"`.
Caused by error in `lead()`:
! Can't convert from `default` <character> to `x` <factor<e129f>> due to loss of generality.
• Locations: 1"

Is there another workaround I could apply to remove infrequent flows/traces? I think I could get the top most frequent cases (or otherway around) and use it to filter the log, but I should test it first.

The version of the packages I've been working with:
"[1] processcheckR_0.1.4 processmapR_0.5.2 eventdataR_0.3.1 edeaR_0.9.3 bupaR_0.5.3
[6] bupaverse_0.1.0"

filter all cases limited to first n steps

would it be possible to add a filter that would limit the analysis (e.g. process map) to the first n steps? i can see there are already filters for the first n cases but it would be great to be able to consider all cases filtered on their start of the process only for example.
many thanks

filter resources according to date or time

Hello, in the below, i can filter the eventlog according to the resource names. but, what if i want to filter the process time of resources according to the their process times.? for example, how can i add filter in this below code if i want to filter the resources according to the their process time( the resources which spend more than 5 hours for an acitivity)? i searched long time but couldnt find the solution. thanks

patients %>% filter_resource(c("r1","r2","r3")) %>%
processing_time("resource", units="hours") %>%
plot()

Error with number_of_repetitions() in 0.9.3

After an upgrade to bupaR 0.5.3 and edeaR 0.9.3, there is an error that did not occur with 0.5.2 and 0.9.2 respectively:

> patients %>% group_by_case() %>% number_of_repetitions()
Error in `mutate()`:
ℹ In argument: `data = map(data, fun, ...)`.
Caused by error in `map()`:
ℹ In index: 1.
Caused by error in `!case_id_(eventlog)`:
! invalid argument type
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/dplyr:::mutate_error>
Error in `mutate()`:
ℹ In argument: `data = map(data, fun, ...)`.
Caused by error in `map()`:
ℹ In index: 1.
Caused by error in `!case_id_(eventlog)`:
! invalid argument type
---
Backtrace:
    ▆
 1. ├─patients %>% group_by_case() %>% number_of_repetitions()
 2. ├─edeaR::number_of_repetitions(.)
 3. ├─edeaR:::number_of_repetitions.grouped_eventlog(.)
 4. │ └─bupaR:::apply_grouped_fun(...)
 5. │   └─... %>% mutate(raw = map(data, attr, "raw"))
 6. ├─dplyr::mutate(., raw = map(data, attr, "raw"))
 7. ├─dplyr::mutate(., data = map(data, fun, ...))
 8. ├─dplyr:::mutate.data.frame(., data = map(data, fun, ...))
 9. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
10. │   ├─base::withCallingHandlers(...)
11. │   └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
12. │     └─mask$eval_all_mutate(quo)
13. │       └─dplyr (local) eval()
14. ├─purrr::map(data, fun, ...)
15. │ └─purrr:::map_("list", .x, .f, ..., .progress = .progress)
16. │   ├─purrr:::with_indexed_errors(...)
17. │   │ └─base::withCallingHandlers(...)
18. │   ├─purrr:::call_with_cleanup(...)
19. │   └─edeaR (local) .f(.x[[i]], ...)
20. │     └─eventlog %>% all_repetitions_case
21. ├─edeaR:::all_repetitions_case(.)
22. │ └─... %>% ...
23. ├─dplyr::mutate(...)
24. ├─base::merge(., cases, all.y = T)
25. ├─dplyr::select(., -trace_length)
26. ├─dplyr::mutate(., relative = absolute/trace_length)
27. ├─dplyr::summarize(., absolute = n(), trace_length = first(trace_length))
28. ├─dplyr::group_by(., !!case_id_(eventlog))
29. ├─edeaR:::all_repetitions(.)
30. │ └─... %>% as.data.frame()
31. ├─base::as.data.frame(.)
32. ├─data.table::as.data.table(.)
33. ├─dplyr::rename(...)
34. ├─edeaR:::rework_base(.)
35. │ └─... %>% as.data.frame
36. ├─base::as.data.frame(.)
37. ├─data.table::as.data.table(.)
38. ├─dplyr::rename(...)
39. └─bupaR:::rename.log(...)

end_activities

Good day,

I encounter some problems with the end_activities() function. The start_activities() function works without any problems but when I run the end_activities() function on the eventlog I always get the following error message:

"error in check_attributes(eventlog, attributes$attribute_name, attributes$attribute_values) :
No case_id provided nor found"

I also tried with the dataset of patients from the eventdataR package with the following coding:
plot(end_activities(patients, "activity")) where I get the same error message as above

Thanks in advance for your help :)

Units in diagram 'Idle Time' are not shown

Hi guys,

as seen in the picture below (and also in the documentation https://www.bupar.net/exploring.html) the time units are not properly shown in the diagram for idle time.

Since we'd like to present customers the Shiny App as an web interface, it would be nice to have a consistent design of the diagrams.

Would be awesome to have this fixed!
Thanks fpr your reply!

Cheers,
Jan

The filter_precedence() function fails when activity labels contain parentheses.

The filter_precedence functions fails when activity labels contain parentheses.

Below is a minimal example where the filter is expected to return the entire event log:

library(lubridate)
library(bupaR)
library(edeaR)

events <- data.frame(
  case = c(1, 1),
  timestamp = c(as_datetime("2018-01-01 12:00:00"), as_datetime("2018-02-01 12:00:00")),
  activity = c("first event (1)", "second event (2)"),
  activity_instance = c(1, 2),
  status = c("complete", "complete"),
  resource = c("me", "you")
)

event_log <- events %>%
  eventlog(
    case_id = "case",
    timestamp = "timestamp",
    activity_id = "activity",
    activity_instance_id = "activity_instance",
    lifecycle_id = "status",
    resource_id = "resource"
  )


event_log %>%
  filter_precedence(antecedents = "first event (1)",
                    consequents = "second event (2)")

Implement filter_trace_id

We are dealing with a preliminar study where we want to identify specific patterns on a set of cases. In order to actuate over the process, it is interesting to subset those cases by specific traces. To achieve this the function filter_activity_presence can be used, passing the whole set of trace activities to the filter. If the traces are known (for instance, from the trace_explorer raw output) the id of the trace could be used instead. This feature is not implemented, and it seems that the trace explorer output does not map the whole set of cases to the traces (is this true?), which would enable the use of dplyr::filter in a custom function.

Is there a way to perform this with the existing features of the (by the way, amazing!) packages? Is this traspassing the scope expected for the packages?

Maybe I am missing something. I have work on the DataCamp course but I think this approach is not solved there.

Thank you in advance for your effort.

Error in UseMethod("rescale") : no applicable method for 'rescale' applied to an object of class "difftime"

Hi,

I got this error when I try to calculate idle_time / processing_time on the "case" level en try to plot it.

processing_time(salariswijziging, level="case", units = "days") # %>% plot()

There is output from processing_time:

naam processing_time

1 ######### 88.07454 days
2 ######### 71.51288 days
3 ######### 71.01331 days
4 ######### 67.00580 days
5 ######### 66.32168 days
6 ######### 66.27387 days
7 ######### 66.23900 days
8 ######### 60.21536 days
9 ######### 59.75009 days

But plotting results in a error.

salariswijziging

Log of 422 events consisting of:

3 traces
33 cases
211 instances of 9 activities
58 resources
Events occurred from 2022-01-20 14:56:56 until 2022-06-15 11:04:24

Variables were mapped as follows:

Case identifier: naam
Activity identifier: taak
Resource identifier: door
Activity instance identifier: activity_instance_id_by_bupar
Timestamp: timestamp
Lifecycle transition: lifecycle_id

A tibble: 422 × 8

type naam taak door .order activ…¹ lifec…² timestamp

1 Salariswijziging (Profit ########g… JMS … Lied… 1 1 start 2022-01-20 14:56:56
2 Salariswijziging (Profit) ######## Cont… Patr… 2 2 start 2022-01-20 14:56:57
3 Salariswijziging (Profit) ######## Beve… Andy… 3 3 start 2022-01-26 12:51:18
4 Salariswijziging (Profit) ######## Beve… Jess… 4 4 start 2022-01-26 12:51:18
5 Salariswijziging (Profit) ######## Brie… Suze… 5 5 start 2022-02-01 11:54:27
6 Salariswijziging (Profit) ######## Beve… Jess… 6 6 start 2022-02-02 13:39:09
7 Salariswijziging (Profit) ######## Vraa… Lied… 7 7 start 2022-02-03 10:29:52
8 Salariswijziging (Profit) ######## Aanp… Lied… 8 8 start 2022-02-08 10:13:03
9 Salariswijziging (Profit) ######## Cont… Lied… 9 9 start 2022-02-08 15:04:57
10 Salariswijziging (Profit) ########… Beoo… Lied… 10 10 start 2022-02-15 10:11:54

… with 412 more rows, and abbreviated variable names ¹activity_instance_id_by_bupar,

²lifecycle_id

ℹ Use `print(n = ...)` to see more rows

Do you have any idea?

Greetings Tim

Get time between two subsequent activities for all cases

Hi,

first of all, thanks for the great bupaR package. I really enjoy process mining with it.

I would like to calculate the time (I guess you call it idle time) between two subsequent activities for all cases that run though it. Some might need to run through it several times. I only found possibilities to calculate the idle time on an aggregated level but I would need the time per case. Because, afterwards, I would like to work with the sample and visualize the distribution of times.

Looking forward to your reply. Thanks!

Best Regards
Meras

edeaR::resource_involvement with level="resource-activity" returns the expected output for level="resource"

This is for edeaR 0.9.1 (current cran version)

edeaR::resource_involvement(log = anEventLog, level = "resource-activity") matches the output of edeaR::resource_involvement(log = anEventLog, level = "resource")

It does not contain "resource-activity combinations" but only resources. See the input event log contained in the file anEventLog.rda contained in the attached zip file and the output contained in the file results.rda in the attached zip file.

anEventLog.zip
results.zip

Plot for Processing Time with Level = Resource-Activity not consistent with other Levels

Hey guys,

as mentioned in the title, I cannot find a proper way to plot the processing time for level "resource-activity".

Since the plots for processing time with level "resource" and "activity" can be plotted quite intuitively, the plot for level "resource-activity" throws an error when it's plotted with the same attributes.

processing_time(
patients,
level = "resource",
append = F,
append_column = NULL,
units = "mins",
sort = T,
work_schedule = NULL
) %>% plot()

This renders the following:

With having the same attributes but only change the attribute "level" to "resource-activity" gives back the following:

processing_time(
patients,
level = "resource-activity",
append = F,
append_column = NULL,
units = "mins",
sort = T,
work_schedule = NULL
) %>% plot()

The only way to escape this error (for me) is to set "append" to "TRUE", which will render a matrixplot of the table "patients" (which I think is not the goal).

processing_time(
patients,
level = "resource-activity",
append = T,
append_column = NULL,
units = "mins",
sort = T,
work_schedule = NULL
) %>% plot()

Here my Sys.info() and R.Version
sysname release version nodename machine
"Windows" "10 x64" "build 19041" "LAPTOP-2GC29OTT" "x86-64"

platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 6.3
year 2020
month 02
day 29
svn rev 77875
language R
version.string R version 3.6.3 (2020-02-29)
nickname Holding the Windsock

It would be very helpful to get feedback from your end if there is any development planned or if it can be fixed in the source code.

Thanks again for your help!

Best,
Jan

Does edeaR work with dplyr versions >= 0.8.0?

I'm setting a consistent problem with edeaR using dplyr 0.8.3 whenever I try to use edeaR with grouped_eventlogs I get the various errors (only with grouped though):

Here are two simple examples (taken from the console of Rstudio:

> library(bupaR)
> patients %>% group_by_activity() %>% throughput_time()
Error in check_attributes(eventlog, attributes$attribute_name, attributes$attribute_values) : 
  activity_id not found in data.frame

> library(bupaR)
> library(lubridate)
> patients %>% group_by(month(time)) %>% throughput_time()
Error: `.data` is a corrupt grouped_df, the `"groups"` attribute must be a data frame
In addition: Warning messages:
1: `cols` is now required.
Please use `cols = c(raw)` 
2: `cols` is now required.
Please use `cols = c(data)`

here is my sessionInfo if it helps:

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lubridate_1.7.4     petrinetR_0.2.1     processmonitR_0.1.0 xesreadR_0.2.3      processmapR_0.3.3   eventdataR_0.2.0    edeaR_0.8.3         bupaR_0.4.2        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2         lattice_0.20-38    tidyr_1.0.0        visNetwork_2.0.8   zoo_1.8-6          utf8_1.1.4         assertthat_0.2.1   zeallot_0.1.0      digest_0.6.22      mime_0.7           R6_2.4.0           backports_1.1.5   
[13] httr_1.4.1         ggplot2_3.2.1      pillar_1.4.2       rlang_0.4.1        lazyeval_0.2.2     shinyTime_1.0.1    rstudioapi_0.10    data.table_1.12.6  miniUI_0.1.1.1     DiagrammeR_1.0.1   downloader_0.4     readr_1.3.1       
[25] stringr_1.4.0      htmlwidgets_1.5.1  igraph_1.2.4.1     munsell_0.5.0      shiny_1.4.0        compiler_3.5.1     influenceR_0.1.0   rgexf_0.15.3       httpuv_1.5.2       pkgconfig_2.0.3    htmltools_0.4.0    tidyselect_0.2.5  
[37] tibble_2.1.3       gridExtra_2.3      XML_3.98-1.20      fansi_0.4.0        viridisLite_0.3.0  crayon_1.3.4       dplyr_0.8.3        later_1.0.0        grid_3.5.1         jsonlite_1.6       xtable_1.8-4       gtable_0.3.0      
[49] lifecycle_0.1.0    magrittr_1.5       scales_1.0.0       cli_1.1.0          stringi_1.4.3      viridis_0.5.1      promises_1.1.0     ggthemes_4.2.0     xml2_1.2.2         brew_1.0-6         vctrs_0.2.0        RColorBrewer_1.1-2
[61] tools_3.5.1        forcats_0.4.0      glue_1.3.1         purrr_0.3.3        hms_0.5.2          Rook_1.1-1         fastmap_1.0.1      yaml_2.2.0         colorspace_1.4-1   plotly_4.9.0

filter_precedence "eventually_follows" doesn't seem to work

Traces from
patients %>% traces

1 Registration,Triage and Assessment,X-Ray,Discuss Results,Check-out 258 0.516
2 Registration,Triage and Assessment,Blood test,MRI SCAN,Discuss Results,Check-out 234 0.468
3 Registration,Triage and Assessment,Blood test,MRI SCAN,Discuss Results 2 0.004
4 Registration,Triage and Assessment,X-Ray 2 0.004
5 Registration,Triage and Assessment 2 0.004
6 Registration,Triage and Assessment,X-Ray,Discuss Results 1 0.002
7 Registration,Triage and Assessment,Blood test 1 0.002

When I try to filter all traces that start from "Triage and Assessment" and eventually follow to "Discuss Results" I get none. Whereas the table above shows row 1, 2, 3, 6 are traces where this rule applies.

patients %>% filter_precedence(antecedents = "Triage and Assessment", consequents = "Discuss Results", precedence_type = "eventually_follows", filter_method = "all") %>% traces()

This should give 4 traces, instead it gives 0.

"directly_follows" works fine (e.g.
patients %>% filter_precedence(antecedents = "Triage and Assessment", consequents = "X-Ray", precedence_type = "directly_follows", filter_method = "all") %>% traces()

processing time column duplication issue

Hi,

I tried to filter data using processing time but was getting column duplication issues and the same thing for obtaining processing time for event log. I have used example-log1,csv provided in the github repo of bupaR

please find sample code below

`example_log_1<-read.csv("example_log_1.csv")

example_log_1$timestamp <- as.POSIXct(example_log_1$timestamp)

example_log_1 %>% #a data.frame with the information in the table above
eventlog(
case_id = "patient",
activity_id = "activity",
activity_instance_id = "activity_instance",
lifecycle_id = "status",
timestamp = "timestamp",
resource_id = "resource"
) -> example_log_1
example_log_1 %>% processing_time("log") %>% plot()`

Please find the error below

Error: Column namespatient, activity, and resourcemust not be duplicated. Runrlang::last_error()to see where the error occurred.

R Version

R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

bupar Version

bupaR_0.4.3
edeaR_0.8.4

Thanks and Regards,
Amar

Filter precedence - package error

the filter_precedence function has stopped working thanks