1kastner / conflowgen Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 7.0 2.48 MB

A generator for synthetic container flows at maritime container terminals with a focus on yard operations

License: MIT License

Python 96.36% Batchfile 0.35% Jupyter Notebook 3.29%

container-terminal logistics maritime sea-container synthetic-data

conflowgen's People

Contributors

Stargazers

Watchers

Forkers

bbargstaedt 1grasse ram24prasath sharmas1ddharth lucedes27 harshithaprabhuswamy shubhs-93

conflowgen's Issues

Add pre-defined seeds for random values in inits and store them in the DB

Currently, a re-invocation of the generation process leads to slightly different results every time. This is because randomness is used at several places. There are two different instances:

The Python standard random library: Here, the seeds could be e.g. stored in a separate peewee model and used in the class init
The SQLite random sort mechanism (see e.g. here: SQLite does not support seeding their random number stream so we must avoid it. Probably we need to read in all the data into a big list, sort it with the Python standard library, and then start to further digest it

Once this is implemented, the resulting container flows should be reproducibly given the same input data.

Add dwell time probability distribution during vehicle selection

Currently, for each container that is not picked up by a truck the vehicle is chosen based on its free capacity (see e.g. here. This leads to approximately uniformly distributed container dwell times, as no preference of earlier or later vehicles exist.

This approach could be further improved by including a factor to each of the weights assigned to its respective vehicle based on the dwell time, e.g. preferring vehicles that depart earlier over the ones that depart later. In other words, the existing weights will be updated before usage. When combining the container dwell time distribution and the weights determined by the free capacity of each vehicle, multiplication might be more suitable than addition to ensure that a probability of 0 stays 0.

Possible assumptions for the dwell time distribution:

Add CFF file for citation

See https://citation-file-format.github.io/ for reference.

Add ship properties

Each ship has certain properties such as the length and number of bays. These could be included into the model (https://github.com/1kastner/conflowgen/blob/main/conflowgen/domain_models/vehicle.py#L128 and https://github.com/1kastner/conflowgen/blob/main/conflowgen/domain_models/vehicle.py#L143). That would require the user to provide more information in the beginning. This information would not be digested in this tool though. However, it helps to have a single source of data for the next processing steps (e.g., running a simulation or mathematical optimization).

It is further feasible to have a small add-on that could determine the number of ship-to-shore gantry cranes which typically discharge and load such a ship. It could be based on a table such as shown at https://www.sciencedirect.com/science/article/pii/S2352146517306968.

Add unit tests for get_graph methods of analysis reports

The created graphs are not routinly checked for. This leads to a poor reported test coverage, see e.g. https://app.codecov.io/gh/1kastner/conflowgen

How could this be solved on an automated basis?

Use transparent background for all plotting

Currently, matplotlib uses white as the default background color. However, it should be transparent so that it can be used in combination with other bright colors (like a bright gray) as well.

Use generic distribution validator instead of repeated custom code

Currently, for each distribution an independent implementation of a validator exists. This is burdensome and it should be centralized as the same pattern repeatedly occurs. At #70, a generic distribution validator was introduced but it is not yet used for all distributions. This should be solved soon.

Use-case specific tests might be still useful though to check the adequacy of the error messages.

EDIT: The approach of #70 was elaborated in #73.

Link analyses from API documentation to analyses notebook

Each preview mentions its occurence in the preview Jupyter Notebook, see https://conflowgen.readthedocs.io/en/latest/api.html#conflowgen.ContainerFlowByVehicleTypePreviewReport for reference. The same pattern needs to be applied to the analyses. They are already listed in the API page and the Jupyter Notebook exists where we can see them in action. However, the link is currently still missing.

Automatically identify end of ramp-up and beginning of ramp-down process

Currently, the start date and end date are provided to synthetically generate the data. All reported previews and analyses always take the whole period into account. However, at times it could be more interesting to restrict oneself to the data during the "normal" operations period. Then, the ramp-up and ramp-down period should be neglected when it comes to the estimation of operational figures.

This is supposed to be an additional, optional feature. There might be different results depending on the approach of identification. Thus, by default the figures should be reported as it is the case currently.

Add flexible weights of empty containers

Some weights are not drawn from a distribution, e.g. empty containers have a fixed container weight category - it is always the lowest for the respective container length. Possibly, the container type (reefer, ...) should also be considered as e.g. empty reefers are a bit heavier than an empty standard container. The user should have an interface to determine the container weight category for each combination of storage requirement (=container type) and container length unless it is a laden container (then the distribution is used) and this information should be stored in the database.

This requires changes at the ContainerWeightDistributionManager which would serve as the interface for the user.

The translation table of container length + storage requirement to fixed container weight should be stored in the database. This requires a new peewee model. Most likely, the repository pattern should be used.

EDIT: For clarification, currently the fixed weight categories are hardcoded. With the weight distributions that are seeded by default, this works fine. The user might be surprised if they defined their own weight distributions and they find out that their weight categories are not honoured when it comes to the weight categories of the empty containers.

Add subchapter on TruckArrivalDistributionManager in the input_distribution Jupyter Notebook

Just like in https://conflowgen.readthedocs.io/en/latest/notebooks/input_distributions.html#Container-Length-Distribution, but for the truck arrival distribution. The documentation source file resides in https://github.com/1kastner/conflowgen/tree/main/docs/notebooks and it can be edited with the Jupyter Notebook UI or JupyterLab.

Add subchapter on ContainerStorageRequirementDistributionManager in the input_distribution Jupyter Notebook

Just like in https://conflowgen.readthedocs.io/en/latest/notebooks/input_distributions.html#Container-Length-Distribution, but for the container storage requirementl distribution. The documentation source file resides in https://github.com/1kastner/conflowgen/tree/main/docs/notebooks and it can be edited with the Jupyter Notebook UI or JupyterLab.

Add UI

The API is designed to allow a user interface to repeatedly update and change the information before the generate method is invoked. Thus, it could also be used by a UI (in theory). It could be great to make further progress at this place.

First attempts were made to use the ipywidgets library in combination with the voila dashboard. It generally worked but the required interaction between the UI elements was rather complex and thus tricky.

Remove transportation_buffer from constructor/update method of previews and analyses if not needed

In any preview of analysis, only those arguments should be asked for that are actually used. With the current inheritance pattern, this might not always be honoured - please check this first! According to that, the inheritance structure might change.

Increase test coverage for ExportContainerFlowService

See ExportContainerFlowService

Refine hack of getting git lfs into RTD

In #74, at the end of docs/conf.py a dirty hack was introduced. git-lfs is downloaded, installed, and used to download the git lfs artifacts. This is done because git-lfs is not installed by default on RTD. This approach has some shortcomings:

currently, this step is executed for all Linux users. It should be restricted to RTD. For that purpose, an environment variable has been set (see screenshot below)
we do not check if git-lfs has been previously installed. In that case, this step could be skipped as well

Add preview for used yard capacity

For each hour, the number of containers that enter and leave the yard based on the schedules are estimated (without having the container instances). For reference, only the container dwell time distribution is used. This supports validating the assumptions and comparison with the analysis.

Add unittest for DatabaseChooser to check for opening/closing behavior

The class is currently poorly covered. There should be either no or a single open connection to an SQL database. We should never happen to work with a closed connection object. This should be checked by using mocking - see https://docs.python.org/3/library/unittest.mock.html for reference.

Add running all analyses to the documentation

See #74 for reference. Just use the run_all_analyses instead in a separate Jupyter Notebook.

Add preview and analysis for rail gate throughput

For each time unit (e.g., hour), all containers that are entering the yard and all containers leaving the yard through the rail gate are reported. This supports validating the assumptions.

Fix wrongly layouted "Returns" statements

Sometimes, the Returns keyword is not properly parsed. Then it is in normal font, like here: https://conflowgen.readthedocs.io/en/latest/api.html#conflowgen.DatabaseChooser.list_all_sqlite_databases
Instead, it should always look like this: https://conflowgen.readthedocs.io/en/latest/api.html#conflowgen.setup_logger.

Add warning when schedule of the same name is redefined

In the demo scripts, port_call_manager.has_schedule() is used to ensure that a service of the same name and vehicle type does not exist twice. This is currently the only protection layer which exists. We could introduce strict measures (e.g. SQL constraints) but maybe there are instances when two different days a week are reasonable. In that case, having two schedule entries might be what you want (this requires a thorough check of the source code though). Anyway, the user should be warned in case this happened by accident.

Make help_text accessible to users (documentation or separate meta-data file?)

The peewee models are documented with the help of the help_text attribute, see e.g. here. This help text is part of the meta data accessible programatically but it is not visible for the user who does not have access to this model. However, the users are confronted with the fields once they use the export function.

There are different paths to go:

Is it possible to include this in the documentation in an automated fashion (no copy&paste of the texts!)?
Is it better to have a meta-data file alongside the CSV/XLSX/XLS file at export which explains the meanings of each column? How to set up such a file?

Which one is best? And we need to implement it...

Add preview for truck gate throughput

For each time unit (e.g., hour), all containers that are entering the yard and all containers leaving the yard through the truck gate are reported. This supports validating the assumptions. Only the total number of moved containers and the truck arrival distributioonn are used here, the container dwell times are neglected.

Add demo script for transshipment hub

At https://github.com/1kastner/conflowgen/tree/main/demo we could have more demo scripts of different types of ports. The results could be further validated by public sources.

Validate container weight distribution of MacGregor

At https://www.erneuerbar-mobil.de/sites/default/files/2020-01/Abschlussbericht_16EM1004_final.pdf, there is another container weight distribution at page 71 (Abbildung 7). It would be interesting to add a validation. The latter mixes empty containers and full containers of all different lengths. In addition, it uses slightly different weight categories. Still it could be interesting to plot them side-by-side and check if the weight distribution is approximately the same.

Add API access to change the container weight distribution

The container weight distribution is currently only seeded. The default values cannot be overwritten through the API. The general pattern already exists (see e.g. TruckArrivalDistributionManager) which just needs to be transferred. Matching the current naming scheme, its name would be ContainerWeightDistributionManager residing in the file called container_weight_distribution_manager.py.

Add dwell time analysis

How long are the container dwell times in the generated data, when they arrive by vehicle type A and leave with vehicle type B?

See the yard capacity analysis for reference - 4f51804
Extracting commonly used parts form yard_capacity_analysis.py might be helpful?

Import plotting libraries as normal libraries

With #74, the plotting libraries lost their status of an optional dependency. Thus, they should be imported at the top of each file, just like all other dependencies. The special exception # pylint: disable=import-outside-toplevel should be removed from the project.

Depends on #74 being merged.

Enclose optional dependency imports in try/except and add an appropriate message to the user

In the long term, the question is whether plotly should remain an optional dependecy or whether moving it to the mandatory dependencies is less work.

Move transport_buffer to database

Currently, we have the transport buffer all over the code. It is a number that shows up repeatedly in the code at different places. This looks pretty much like technical debt. It should be encapsulated and stored in the database, e.g. to make one more step in the direction of reproducibility.

Add convenience function to read in Excel file for vehicles

Similar to https://github.com/1kastner/conflowgen/blob/main/examples/Python_Script/demo_DEHAM_CTA.py but no values are generated on the fly. In addition, a template script is provided. This helps people who are less familiar with Python to still use this project.

Add exceptions to API

Currently, exceptions are cluttered all over the code and they are not an official part of the API. If people use the API, they should be able to invoke the methods in a try/except manner and thus the exceptions should be available to them. To achieve this, first of all exceptions need to be consolidated.

The major source for exceptions are invalid input parameters. The most complex data type currently handled are distributions that are represented as dictionairies with additional constraints to be checked. Thus, this task depends on #71 and #65.

Plot all pie charts of ModalSplitAnalysisReport on a single graph

Currently, plt.show is invoked in the method several times in the method get_report_as_graph of ModalSplitAnalysisReport . Instead, one large plot should be created that is returned to the user. The user then can still invoke plt.show if they want to.

Add delay of vessels

Recent events have lead to major delays, some statistics are e.g. reported at https://www.worldcargonews.com/news/news/schedule-reliability-at-all-time-low-in-august-67360

This could be covered, there is even already a field in the model: https://github.com/1kastner/conflowgen/blob/main/conflowgen/domain_models/vehicle.py#L69

The information of delay is not considered while assigning the containers to the outgoing vessels:

conflowgen/conflowgen/container_flow_data_generation_process/large_scheduled_vehicle_for_onward_transportation_manager.py

Line 188 in f58ed5b

def _get_arrival_time_of_container(container: Container) -> datetime.datetime:

thus the delay could even be added in the VehicleFactory at https://github.com/1kastner/conflowgen/blob/f58ed5b90fc0bbab77d9931685434d114fcb4b21/conflowgen/domain_models/factories/vehicle_factory.py

Most likely it is best to have another iteration (a new service on the container_flow_data_generation_process level) which re-assigns the containers that would harm their maximum dwell time to another vehicle in a very similar (maybe identical?) way like in the LargeScheduledVehicleForOnwardTransportationManager.

Possible first assumptions might be taken from:

https://www.porttechnology.org/news/vessel-schedule-reliability-reaches-new-lows/ / https://web.archive.org/web/20220315084747/https://www.porttechnology.org/news/vessel-schedule-reliability-reaches-new-lows/

Add documentation regarding the internal assignment process of containers to outbound journeys

Given all containers that arrive at the container terminal, how are these assigned to their outbound journeys?

The concept should be included in the documentation as a separate chapter
Add a link to the generate method (see the API docs)
Maybe a visualisation can support the understanding

Make previews and especially analyses smarter - avoid unnecessary recalculation by caching

Currently, some previews and post-hoc analyses are re-calculated several times. This is because they are re-used for other previews or post-hoc analyses. For each of the reports, the code is re-executed which turns out expensive, e.g. it is iterated over all containers (there could be millions) several times to calculate the very same value. This should be avoided in a smart way. Be aware though that the API is meant to always report the true state. We can e.g. first generate the output data, run an analysis, re-run the generation process and run the analysis again. Thus, caching must be reset in once the underlying data is changed.

Ideas:

Use a singleton pattern so that every preview and analysis only exist once in the system.
- Use a Pythonic approach such as discussed at https://stackoverflow.com/questions/31875/is-there-a-simple-elegant-way-to-define-singletons
- Dependency injection such as https://python-dependency-injector.ets-labs.org/introduction/di_in_python.html could be another option. As most de-coupling is done by writing values to the database and later read from it, this might be an overkill though.
Cache the results of the previews and analyses
- Use https://docs.python.org/3/library/functools.html - but remember that the cache needs to be cleared once changes happen to the data in the database. It would be best to automate this. Leaving it to users might create unwanted bugs and bugs are worse than longer run times.
- Find another smart option that requires little code, is highly maintainable and solves the issue.

Alternative: Leave everything as it is because it is not too much time wasted. A bug that leads to reporting wrong numbers would be much worse than waiting for some seconds more to get the results.

Add preview for throughput at quay side

For each week, all containers that are entering the yard and all containers leaving the yard at the quay side are reported. This supports validating the assumptions.

Side notes:
A more granular perspective (e.g., each hour) is not helpful here because all containers of a vessel are delivered at once and all containers are picked up at once - it is the task of the later model (e.g., a simulation model) to represent the discharging and loading process in more detail.

Use mocking for TestModeOfTransportDistributionManager, separate testing of validator

The DatabaseChooser at https://github.com/1kastner/conflowgen/blob/main/conflowgen/tests/api/test_database_chooser.py is a good example of how to do mocking correct. In the API package, we only want to check whether the API class does its work. The used class instances etc. should not really be invoked. For those, we have other unit tests.

The test https://github.com/1kastner/conflowgen/blob/main/conflowgen/tests/api/test_mode_of_transport_distribution_manager.py is a bad example because the unit test does not isolate the ModeOfTransportDistributionManager. The validator which is currently tested here and which resides in https://github.com/1kastner/conflowgen/tree/main/conflowgen/domain_models/distribution_validators should get its own test at https://github.com/1kastner/conflowgen/tree/main/conflowgen/tests/domain_models.

Use normalize_nested_distribution instead of repeated code

The function normalize_nested_distribution at https://github.com/1kastner/conflowgen/blob/main/conflowgen/domain_models/distribution_repositories/__init__.py allows to normalize the input distribution. This should be used for nested distributions (dict of dict) instead of repeated code all over the repository.

The normalization should be used in the respective distribution manager (API level) and then the respective repository should enforce that the sum of all fractions sum up to 1 with the help of a validator (this should already be the case but it should be re-checked).

Add scaling functionality to the previews and analyses to report daily/monthly/annual statistics

Each preview and analysis report should offer the option to scale the reported numbers to daily, monthly, or annual statistics as those are most likely the ones which are available for a container terminal or which can be approximated for a container terminal based on the port statistics. This scaling makes it easier to verify the assumptions with practitioners.

Currently, the previews and analyses only report the figures between the start date and end date of the period to generate the synthetic data for.

Add manually defined closing periods of the terminal

At some moments, a terminal yard needs to be closed. In some countries (such as Germany), this happens at important holidays (a subset of the holidays officially established). In addition, strikes or Corona lockdowns could be an additional reason why vehicles are not scheduled to arrive at certain periods. Agnostic to the reason of the closing, in ConFlowGen the user just defines time windows during which no vehicles can arrive at the terminal. Vehicles that would arrive at that day are removed from the schedule.

Use pattern of run_all_posthoc_analyses to previews

See #38 for reference.

Stop using fstring interpolation for logging

See .pylintrc with the code W1203

Add API use cases to documentation

Currently, https://conflowgen.readthedocs.io/en/latest/ is rather technical. The step-by-step guide does not show how to use (and why to use) the preview and analysis reports. This could be solved by having an interactive part in the documentation.

Outdated thoughts:

In the simplest case, the documenting person runs the code and pastes the output into an .rst file. However, once the classes are modified this must be updated manually.
It is better if the Python code can be run automatically and the output of a cell or code block can be included into the documentation. The general concept has been tried out here:

Updated thoughts:
Now, nbsphinx has been chosen for #31, #32, and #33. To minimize the dependencies, the same should be used here. By using Raw cells with reST, even Sphinx roles and directives can be used!

This issue can be split up into one issue each for a preview and analysis report regarding one topic.

Add subchapter on ModeOfTransportDistributionManager in the input_distribution Jupyter Notebook

Just like in https://conflowgen.readthedocs.io/en/latest/notebooks/input_distributions.html#Container-Length-Distribution, but for the mode of transport distribution. The documentation source file resides in https://github.com/1kastner/conflowgen/tree/main/docs/notebooks and it can be edited with the Jupyter Notebook UI or JupyterLab.

Add API option for truck arrival management

Typically, there are different approaches to truck arrival management:

Using truck appointment systems with vessel-dependent time windows - here, freight forwarders who pick up a container register for a time window before a vessel arrives at the berth. Sufficient time buffer must be given in case of small delays. This has the benefit that containers can be stacked depending on the planned truck arrivals (not covered in this tool, just for information).
Using quota-based truck appointment systems - here, freight forwarders are informed once the container has passed the check at the ship-to-shore gantry crane. They can immediately book a free time slot but often enough takes some hours or days. Thus, containers need to be stacked without that additional information (not covered in this tool, just for information).
Using no truck appointment system and just inform the freight forwarder like previously explained.

The fields at https://github.com/1kastner/conflowgen/blob/main/conflowgen/domain_models/arrival_information.py are not yet fully used to their potential. The decision of one of the approaches could be considered at the truck creation process, e.g. at

There should be a user API at which a few options are offered which are considered for the trucks - especially when picking up containers that were delivered by vessel.

Reduce complexity of LargeScheduledVehicleForOnwardTransportationManager

The reload_properties method of the class LargeScheduledVehicleForOnwardTransportationManager uses a lot of parameters that makes handling this method quite difficult. It previously required adjustments in the per-file-ignores in the .flake8 file. The code should be simplified to increase maintainability and reduce the number of exceptions when checking the code quality (both flake8 and pylint at this time).

Mention related possible projects that are out of scope of this tool

There are some very interesting ideas that are somehow related to this project but currently out of scope, such as:

Currently, the API requires the user to specify the arrival patterns of larger vehicles. There could be a related project / sub-project that automatically generates these schedules that in turn serve as an input for this tool
Currently, this tool produces data that need to be further checked. There are some reported numbers. In addition, a related project / subproject could automatically check whether a container terminal with certain properties (number of quay cranes, yard capacity, truck gate throughput, ...) would be sufficient.

Add estimated dwell time preview

For how many days/hours containers will stay in the yard on average? This is strongly related to #5.