Giter Site home page Giter Site logo

1kastner / conflowgen Goto Github PK

View Code? Open in Web Editor NEW
11.0 11.0 7.0 2.48 MB

A generator for synthetic container flows at maritime container terminals with a focus on yard operations

License: MIT License

Python 96.36% Batchfile 0.35% Jupyter Notebook 3.29%
container-terminal logistics maritime sea-container synthetic-data

conflowgen's People

Contributors

1grasse avatar 1kastner avatar bbargstaedt avatar lucedes27 avatar ram24prasath avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

conflowgen's Issues

Add pre-defined seeds for random values in inits and store them in the DB

Currently, a re-invocation of the generation process leads to slightly different results every time. This is because randomness is used at several places. There are two different instances:

  • The Python standard random library: Here, the seeds could be e.g. stored in a separate peewee model and used in the class init
  • The SQLite random sort mechanism (see e.g. here: SQLite does not support seeding their random number stream so we must avoid it. Probably we need to read in all the data into a big list, sort it with the Python standard library, and then start to further digest it

Once this is implemented, the resulting container flows should be reproducibly given the same input data.

Add dwell time probability distribution during vehicle selection

Currently, for each container that is not picked up by a truck the vehicle is chosen based on its free capacity (see e.g. here. This leads to approximately uniformly distributed container dwell times, as no preference of earlier or later vehicles exist.

This approach could be further improved by including a factor to each of the weights assigned to its respective vehicle based on the dwell time, e.g. preferring vehicles that depart earlier over the ones that depart later. In other words, the existing weights will be updated before usage. When combining the container dwell time distribution and the weights determined by the free capacity of each vehicle, multiplication might be more suitable than addition to ensure that a probability of 0 stays 0.

Possible assumptions for the dwell time distribution:

Add ship properties

Each ship has certain properties such as the length and number of bays. These could be included into the model (https://github.com/1kastner/conflowgen/blob/main/conflowgen/domain_models/vehicle.py#L128 and https://github.com/1kastner/conflowgen/blob/main/conflowgen/domain_models/vehicle.py#L143). That would require the user to provide more information in the beginning. This information would not be digested in this tool though. However, it helps to have a single source of data for the next processing steps (e.g., running a simulation or mathematical optimization).

It is further feasible to have a small add-on that could determine the number of ship-to-shore gantry cranes which typically discharge and load such a ship. It could be based on a table such as shown at https://www.sciencedirect.com/science/article/pii/S2352146517306968.

Use transparent background for all plotting

Currently, matplotlib uses white as the default background color. However, it should be transparent so that it can be used in combination with other bright colors (like a bright gray) as well.

Use generic distribution validator instead of repeated custom code

Currently, for each distribution an independent implementation of a validator exists. This is burdensome and it should be centralized as the same pattern repeatedly occurs. At #70, a generic distribution validator was introduced but it is not yet used for all distributions. This should be solved soon.

Use-case specific tests might be still useful though to check the adequacy of the error messages.

EDIT: The approach of #70 was elaborated in #73.

Automatically identify end of ramp-up and beginning of ramp-down process

Currently, the start date and end date are provided to synthetically generate the data. All reported previews and analyses always take the whole period into account. However, at times it could be more interesting to restrict oneself to the data during the "normal" operations period. Then, the ramp-up and ramp-down period should be neglected when it comes to the estimation of operational figures.

This is supposed to be an additional, optional feature. There might be different results depending on the approach of identification. Thus, by default the figures should be reported as it is the case currently.

Add flexible weights of empty containers

Some weights are not drawn from a distribution, e.g. empty containers have a fixed container weight category - it is always the lowest for the respective container length. Possibly, the container type (reefer, ...) should also be considered as e.g. empty reefers are a bit heavier than an empty standard container. The user should have an interface to determine the container weight category for each combination of storage requirement (=container type) and container length unless it is a laden container (then the distribution is used) and this information should be stored in the database.

This requires changes at the ContainerWeightDistributionManager which would serve as the interface for the user.

The translation table of container length + storage requirement to fixed container weight should be stored in the database. This requires a new peewee model. Most likely, the repository pattern should be used.

EDIT: For clarification, currently the fixed weight categories are hardcoded. With the weight distributions that are seeded by default, this works fine. The user might be surprised if they defined their own weight distributions and they find out that their weight categories are not honoured when it comes to the weight categories of the empty containers.

Add UI

The API is designed to allow a user interface to repeatedly update and change the information before the generate method is invoked. Thus, it could also be used by a UI (in theory). It could be great to make further progress at this place.

First attempts were made to use the ipywidgets library in combination with the voila dashboard. It generally worked but the required interaction between the UI elements was rather complex and thus tricky.

Refine hack of getting git lfs into RTD

In #74, at the end of docs/conf.py a dirty hack was introduced. git-lfs is downloaded, installed, and used to download the git lfs artifacts. This is done because git-lfs is not installed by default on RTD. This approach has some shortcomings:

  • currently, this step is executed for all Linux users. It should be restricted to RTD. For that purpose, an environment variable has been set (see screenshot below)
  • we do not check if git-lfs has been previously installed. In that case, this step could be skipped as well

grafik

Add preview for used yard capacity

For each hour, the number of containers that enter and leave the yard based on the schedules are estimated (without having the container instances). For reference, only the container dwell time distribution is used. This supports validating the assumptions and comparison with the analysis.

Add warning when schedule of the same name is redefined

In the demo scripts, port_call_manager.has_schedule() is used to ensure that a service of the same name and vehicle type does not exist twice. This is currently the only protection layer which exists. We could introduce strict measures (e.g. SQL constraints) but maybe there are instances when two different days a week are reasonable. In that case, having two schedule entries might be what you want (this requires a thorough check of the source code though). Anyway, the user should be warned in case this happened by accident.

Make help_text accessible to users (documentation or separate meta-data file?)

The peewee models are documented with the help of the help_text attribute, see e.g. here. This help text is part of the meta data accessible programatically but it is not visible for the user who does not have access to this model. However, the users are confronted with the fields once they use the export function.

There are different paths to go:

  1. Is it possible to include this in the documentation in an automated fashion (no copy&paste of the texts!)?
  2. Is it better to have a meta-data file alongside the CSV/XLSX/XLS file at export which explains the meanings of each column? How to set up such a file?

Which one is best? And we need to implement it...

Add preview for truck gate throughput

For each time unit (e.g., hour), all containers that are entering the yard and all containers leaving the yard through the truck gate are reported. This supports validating the assumptions. Only the total number of moved containers and the truck arrival distributioonn are used here, the container dwell times are neglected.

Validate container weight distribution of MacGregor

At https://www.erneuerbar-mobil.de/sites/default/files/2020-01/Abschlussbericht_16EM1004_final.pdf, there is another container weight distribution at page 71 (Abbildung 7). It would be interesting to add a validation. The latter mixes empty containers and full containers of all different lengths. In addition, it uses slightly different weight categories. Still it could be interesting to plot them side-by-side and check if the weight distribution is approximately the same.

Add API access to change the container weight distribution

The container weight distribution is currently only seeded. The default values cannot be overwritten through the API. The general pattern already exists (see e.g. TruckArrivalDistributionManager) which just needs to be transferred. Matching the current naming scheme, its name would be ContainerWeightDistributionManager residing in the file called container_weight_distribution_manager.py.

Add dwell time analysis

How long are the container dwell times in the generated data, when they arrive by vehicle type A and leave with vehicle type B?

  • See the yard capacity analysis for reference - 4f51804
  • Extracting commonly used parts form yard_capacity_analysis.py might be helpful?

Import plotting libraries as normal libraries

With #74, the plotting libraries lost their status of an optional dependency. Thus, they should be imported at the top of each file, just like all other dependencies. The special exception # pylint: disable=import-outside-toplevel should be removed from the project.

Depends on #74 being merged.

Move transport_buffer to database

Currently, we have the transport buffer all over the code. It is a number that shows up repeatedly in the code at different places. This looks pretty much like technical debt. It should be encapsulated and stored in the database, e.g. to make one more step in the direction of reproducibility.

Add exceptions to API

Currently, exceptions are cluttered all over the code and they are not an official part of the API. If people use the API, they should be able to invoke the methods in a try/except manner and thus the exceptions should be available to them. To achieve this, first of all exceptions need to be consolidated.

The major source for exceptions are invalid input parameters. The most complex data type currently handled are distributions that are represented as dictionairies with additional constraints to be checked. Thus, this task depends on #71 and #65.

Add delay of vessels

Recent events have lead to major delays, some statistics are e.g. reported at https://www.worldcargonews.com/news/news/schedule-reliability-at-all-time-low-in-august-67360

This could be covered, there is even already a field in the model: https://github.com/1kastner/conflowgen/blob/main/conflowgen/domain_models/vehicle.py#L69

The information of delay is not considered while assigning the containers to the outgoing vessels:

def _get_arrival_time_of_container(container: Container) -> datetime.datetime:
thus the delay could even be added in the VehicleFactory at https://github.com/1kastner/conflowgen/blob/f58ed5b90fc0bbab77d9931685434d114fcb4b21/conflowgen/domain_models/factories/vehicle_factory.py

Most likely it is best to have another iteration (a new service on the container_flow_data_generation_process level) which re-assigns the containers that would harm their maximum dwell time to another vehicle in a very similar (maybe identical?) way like in the LargeScheduledVehicleForOnwardTransportationManager.

Possible first assumptions might be taken from:

Make previews and especially analyses smarter - avoid unnecessary recalculation by caching

Currently, some previews and post-hoc analyses are re-calculated several times. This is because they are re-used for other previews or post-hoc analyses. For each of the reports, the code is re-executed which turns out expensive, e.g. it is iterated over all containers (there could be millions) several times to calculate the very same value. This should be avoided in a smart way. Be aware though that the API is meant to always report the true state. We can e.g. first generate the output data, run an analysis, re-run the generation process and run the analysis again. Thus, caching must be reset in once the underlying data is changed.

Ideas:

Alternative: Leave everything as it is because it is not too much time wasted. A bug that leads to reporting wrong numbers would be much worse than waiting for some seconds more to get the results.

Add preview for throughput at quay side

For each week, all containers that are entering the yard and all containers leaving the yard at the quay side are reported. This supports validating the assumptions.

Side notes:
A more granular perspective (e.g., each hour) is not helpful here because all containers of a vessel are delivered at once and all containers are picked up at once - it is the task of the later model (e.g., a simulation model) to represent the discharging and loading process in more detail.

Use mocking for TestModeOfTransportDistributionManager, separate testing of validator

The DatabaseChooser at https://github.com/1kastner/conflowgen/blob/main/conflowgen/tests/api/test_database_chooser.py is a good example of how to do mocking correct. In the API package, we only want to check whether the API class does its work. The used class instances etc. should not really be invoked. For those, we have other unit tests.

The test https://github.com/1kastner/conflowgen/blob/main/conflowgen/tests/api/test_mode_of_transport_distribution_manager.py is a bad example because the unit test does not isolate the ModeOfTransportDistributionManager. The validator which is currently tested here and which resides in https://github.com/1kastner/conflowgen/tree/main/conflowgen/domain_models/distribution_validators should get its own test at https://github.com/1kastner/conflowgen/tree/main/conflowgen/tests/domain_models.

Use normalize_nested_distribution instead of repeated code

The function normalize_nested_distribution at https://github.com/1kastner/conflowgen/blob/main/conflowgen/domain_models/distribution_repositories/__init__.py allows to normalize the input distribution. This should be used for nested distributions (dict of dict) instead of repeated code all over the repository.

The normalization should be used in the respective distribution manager (API level) and then the respective repository should enforce that the sum of all fractions sum up to 1 with the help of a validator (this should already be the case but it should be re-checked).

Add scaling functionality to the previews and analyses to report daily/monthly/annual statistics

Each preview and analysis report should offer the option to scale the reported numbers to daily, monthly, or annual statistics as those are most likely the ones which are available for a container terminal or which can be approximated for a container terminal based on the port statistics. This scaling makes it easier to verify the assumptions with practitioners.

Currently, the previews and analyses only report the figures between the start date and end date of the period to generate the synthetic data for.

Add manually defined closing periods of the terminal

At some moments, a terminal yard needs to be closed. In some countries (such as Germany), this happens at important holidays (a subset of the holidays officially established). In addition, strikes or Corona lockdowns could be an additional reason why vehicles are not scheduled to arrive at certain periods. Agnostic to the reason of the closing, in ConFlowGen the user just defines time windows during which no vehicles can arrive at the terminal. Vehicles that would arrive at that day are removed from the schedule.

Add API use cases to documentation

Currently, https://conflowgen.readthedocs.io/en/latest/ is rather technical. The step-by-step guide does not show how to use (and why to use) the preview and analysis reports. This could be solved by having an interactive part in the documentation.

Outdated thoughts:

Updated thoughts:
Now, nbsphinx has been chosen for #31, #32, and #33. To minimize the dependencies, the same should be used here. By using Raw cells with reST, even Sphinx roles and directives can be used!

This issue can be split up into one issue each for a preview and analysis report regarding one topic.

Add API option for truck arrival management

Typically, there are different approaches to truck arrival management:

  • Using truck appointment systems with vessel-dependent time windows - here, freight forwarders who pick up a container register for a time window before a vessel arrives at the berth. Sufficient time buffer must be given in case of small delays. This has the benefit that containers can be stacked depending on the planned truck arrivals (not covered in this tool, just for information).
  • Using quota-based truck appointment systems - here, freight forwarders are informed once the container has passed the check at the ship-to-shore gantry crane. They can immediately book a free time slot but often enough takes some hours or days. Thus, containers need to be stacked without that additional information (not covered in this tool, just for information).
  • Using no truck appointment system and just inform the freight forwarder like previously explained.

The fields at https://github.com/1kastner/conflowgen/blob/main/conflowgen/domain_models/arrival_information.py are not yet fully used to their potential. The decision of one of the approaches could be considered at the truck creation process, e.g. at

There should be a user API at which a few options are offered which are considered for the trucks - especially when picking up containers that were delivered by vessel.

Reduce complexity of LargeScheduledVehicleForOnwardTransportationManager

The reload_properties method of the class LargeScheduledVehicleForOnwardTransportationManager uses a lot of parameters that makes handling this method quite difficult. It previously required adjustments in the per-file-ignores in the .flake8 file. The code should be simplified to increase maintainability and reduce the number of exceptions when checking the code quality (both flake8 and pylint at this time).

Mention related possible projects that are out of scope of this tool

There are some very interesting ideas that are somehow related to this project but currently out of scope, such as:

  1. Currently, the API requires the user to specify the arrival patterns of larger vehicles. There could be a related project / sub-project that automatically generates these schedules that in turn serve as an input for this tool
  2. Currently, this tool produces data that need to be further checked. There are some reported numbers. In addition, a related project / subproject could automatically check whether a container terminal with certain properties (number of quay cranes, yard capacity, truck gate throughput, ...) would be sufficient.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.