tpilz / lumpr Goto Github PK

View Code? Open in Web Editor NEW

11.0 4.0 12.0 1.72 MB

Landscape Unit Mapping Program for R

License: GNU General Public License v3.0

R 84.09% Fortran 5.73% PHP 9.60% Shell 0.57%

lumpr r grass-gis databases odbc hydrological-modelling wasa-sed landscape-discretisation foss

lumpr's Introduction

lumpR

Landscape Unit Mapping Program for R

DESCRIPTION

This project deals with an R-package called "lumpR". The package provides functions for a semi-automated approach of the delineation and description of landscape units and partition into terrain components. It can be used for the pre-processing of semi-distributed large-scale hydrological and erosion models using catena-representation (WASA-SED, CATFLOW). It is closely connected to and uses functionalities of GRASS GIS. Additional pre-processing tools beyond the scope of the original LUMP algorithm are included.

INSTALLATION

command line installation:

install.packages("devtools") 
library(devtools)
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS=TRUE) #tell git_install() to ignore warnings. Otherwise, it gets stuck at each warning
install_github("tPilz/lumpR")

from zip/tar:
- download zip/tar from github: >LINK<
- install via R-GUI

The main branch relies on GRASS7. The migration of the packaga to GRASS8 is underway, but not fully tested: [https://github.com/tpilz/lumpR/tree/grass8]

MORE INFORMATION

Have a look at our wiki for more detailed information: >LINK<

FEEDBACK and BUGS

Feel free to comment via github issues: >LINK<

LICENSE

lumpR is distributed under the GNU General Public License version 3 or later. The license is available in the file GPL-3 of lumpR's source directory or online: >LINK<

NOTE

This package was also known as LUMP and has been renamed by Jan 9th 2017 to distinguish it from the LUMP algorithm published by Francke et al. (2008).

REFERENCES

A paper describing lumpR along with an example study was published in GMD:

Pilz, T., Francke, T., and Bronstert, A.: lumpR 2.0.0: an R package facilitating landscape discretisation for hillslope-based hydrological models, Geosci. Model Dev., 10, 3001-3023, doi: 10.5194/gmd-10-3001-2017, 2017.

See also the accompanying github repository: https://github.com/tpilz/lumpr_paper

For the original LUMP algorithm see:

Francke, T., Güntner, A., Mamede, G., Müller, E. N., and Bronstert, A.: Automated catena-based discretization of landscapes for the derivation of hydrological modelling units, Int. J. Geogr. Inf. Sci., 22, 111-132, doi:10.1080/13658810701300873, 2008.

lumpr's People

Contributors

Stargazers

Watchers

Forkers

lbergi hc10024 pslota lhmet-forks allayur odgersn a-mue waternk philpaulp edvayl erottler alban-doko

lumpr's Issues

first steps: install_github

not sure if this is general to all versions of devtools, but while installing in the cluster, the following syntax was not accepted:

install_github("https://github.com/tpilz/LUMP")

This was accepted though:

install_github("tpilz/LUMP")

Actual coordinates of most representative catena

Could you please give a guidnance how to add a printout of the actual (x,y) coordinates of the (central line? ) of all catenas to rstats.txt in addition to id. It is easy to get id of the most representative catenas, but to explore them in the field (we do soil profiling) actual coordinates would be an asset. Thanks!

lumpR & WASA-snow

It would be nice to have lu2.dat automatically produced by lumpR, as needed by WASA-snow.
The parameters "aspect" and "slope" would be needed, if "do_rad_corr"=T in snow_params.crtl (thus, a radiation correction is executed)

Automatically determine optimum number of clusters

"The function pamk( ) in the fpc package is a wrapper for pam that also prints the suggested number of clusters based on optimum average silhouette width"

create database based on function output

Function that creates a (e.g. mysql) database and writes parameters estimated by existing functions into that database. I am not sure if there is a practical way to realise that in an R function.

calc_subbas: Snapping to flowlines can be problematic

Snapping to flow lines (river) can result in outlet points, which are not intended (see pic: circle: original outlet point; arrow: snapped point; red cells: river from flowaccumualtion; orange: erroneously constructed basin; black line: expected watersshed).

parameter database: correcting fractions of entities (normalization)

The current praxis in "filter_small_areas" drops the respective tables and uploads the corrected ones.
Fine with me, but this will probably violate constraints/relations - do you still use them anywhere.
If so, using the SQL-queries in my lates version of database.R may be the better option to do the correction.

rainy_season() Fortran code

External code function rainy_season() uses is old Fortran 77 with implicit declarations, hardly any comments etc. Thus, re-coding in, e.g. Fortran 90/95 and some comments would be a nice improvement though it is not absolutely necessary.

parameter database: Include reservoir/river tables

Include reservoir table into parameter database.

Possibly affects:

db_create()
db_update() as this comes with a new database version
db_fill() based on output of function wasa_reservoir_par()
db_check() ?!

Suggestions for performance improvement

to be updated

Some suggestions for performance improvement:

area2catena() / prof_class(): catena_file could be written/imported as Rdata file (maybe optionally via argument flag) which would speed up writing/reading operations; relevant for large catchments and/or high resolution -> commit 24e281c
prof_class(): source code is still rather messy and could be improved (consider tidyverse philosophy)

to be updated

remarks lumpR description & output

2 small remarks:

the do.dat currently produced by lumpR contains space marks after the [ brackets in line 6 and 7
in soter.dat, frgw_delay[day] is estimated with 11 decimal places (for significant digits, maybe no decimal places are needed)

parameter database: sometimes error occurs when closing sqlite database

Function odbcClose() from package RODBC which is used internally within db_* functions sometimes causes an error when using sqlite 3 (reprocuded under opensuse using unixODBC and Windows):

library(RODBC)
# connect to ODBC sqlite database
con <- odbcConnect(dbname, believeNRows=F)
# fetch data from database table
dat <- sqlFetch(con, table)
# close database
odbcClose(con)

Fehler in odbcGetErrMsg(channel) : 
  erstes Argument ist kein offener RODBC-Kanal
Zusätzlich: Warnmeldungen:
1: In odbcClose(con) : [RODBC] Fehler in SQLDisconnect
2: In odbcClose(con) : [RODBC] Fehler in SQLFreeconnect

However, channel is still open as this works:

library(RODBC)
con <- odbcConnect(dbname, believeNRows=F)
dat <- sqlFetch(con, table)
dat2 <- sqlFetch(con, other_table)

And this causes no trouble:

library(RODBC)
con <- odbcConnect(dbname, believeNRows=F)
odbcClose(con)

Using another DBMS works without errors as well.

Classified catena output

Could you please provide the txt output with partitioning class (x-distance, y-relative elevation gain) for each catena of the resulting classes that are currently printed to the final section of plots_prof_class.pdf. This is needed for direct processing in the code different from WASA model

doMC breaks Windows-compatibility

Afaik doMC only runs on *nix. It should be moved to the "suggests" section in the DESCRIPTION-file and the related calls made optional. Otherwise, the packege cannot be installed under windows.

Revise messaging

The print() command was used to generate informative messages during function execution which was a mistake. It should be replaced by message(). The whole messaging system of the package should be revised.

write a wiki

Write a wiki at github.

Problem with installation of lumpR

On a latest version of RGui and Rtools running on Windows 10 x64 machine I can not install the current version of lumpR
Error massage is the following
Error: package or namespace load failed for 'lumpR' in namespaceExport(ns, exports): undefined exports: db_compute_musleK
I use the standart set of commands from command line of RGui
install.packages("devtools")
library(devtools)
install_github("tPilz/lumpR")

reduce GRASS messages

When executing GRASS commands within function a lot of GRASS messages appear on the screen which seems a bit messy. That should be reduced.

parameter database: db_check should only check

In the first place, "db_check" should only check and issue an report. Only when setting an argument fix=TRUE the changes should be made. Thus, one may be able to detect the reason for certain inconsistencies better and probably change them manually.

prof_class(): problems when classification method = 1

When classification method = 1 (i.e. specifying the overall number of classes and weighting factor for each attribute) the calculation of the dissimilarity matrix using function daisy() causes problems. If classification method = -1 the matrix is re-calculated ("quick and dirty computation of distance matrix to allow the rest of the script to be run without problems"). The procedure should be revised.

Gracefully close ODBC-connection in case of critical errors

already done for db_check: modify error handler to gracefully close ODBC-connection before aborting.
Should be done also for al other functions opening the ODBC-connection

sample dataset and application

create a sample dataset and application illustrating LUMP

Performance db-cleaning operations

I went thru the db-cleaning operations and noticed some things that could be changed if performance becomes an issue:

the removal of obsolete entities could be made by a selective SQL-statement, instead of dropping and reloading the entire table (or was there any reason for this)
given the former, the obsolete entries could be queried by a SQL-query instead of requesting the entire table
the entire routine could be made more generic by iterating through this list of tables (subbas, r_subbas_contains_lu, lu, r_lu_contains_svc, ...) and checking for each entriy in the main table, if is is contained in the subsequent one.

parameter database: backup database before changes

It would be nice to be able to backup the entire database before any cleaning operations. With an Access file, this is easy; for any other DB-format respective function could be helpful.

use g.region to focus on core area

Currently, we mainly employ the MASK to define the area of interest.
g.region is used only once in lump_grass_prep.R to define the region, but I assume it would be better to save the region with a specific name and set this region explicitly any of the subsequent steps. This would ensure more consistent behaviour when resuming the workflow inbetween:
g.region --overwrite zoom=MASK save=saved_region #creates region from MASK
g.region region=saved_region #recalls saved region

RAM limitation observation with area2catena() and large study areas

Dear Tobias and Till,
We found out something about study area size and RAM limitation working with lumpR, that might be interesting for you:
My study area is quite large: with 90*90 m resolution it has 75545028 cells. (details further down). Area2catena works with 6 layers, that means that
Total Cells: 75545028 * Layer (6)= 453 270 168 Cells are used, transfered and processed in the function. That process reaches the limits of my computer RAM (8 GB). Using the recommend parallelization (more cores) leads to some kind of an overflow of the RAM requirements. Observing a task manager you can see, that the CPU drops to 1-5% while RAM is full (95%). The function will never end (I interrupted it after 5 days of computing during the long easter weekend). You need to force break the whole machine.
The solution that worked for us is to only use 1 core. With that, the function is successfully computed. But: the computation duration is about 12 h and the RAM is at limit (nothing works anymore, R session gives you the dubious "cannot allocate memory" message with any action you try and you need to restart the computer).
A possible explanation is, that by needing more RAM than your computer offers, processes are transferred to swap (deferred for later). Transferring data to swap and back needs much time, what can be an explanation for long computing duration.

Summary: For study areas larger than ours you might use a server or a computer that has more RAM than 8 GB.

Details:
Ehas= ca 28.000 (with the parameter settings quite all of them are included)

Region:
projection: 1 (UTM)
zone: -23
datum: wgs84
ellipsoid: wgs84
north: 8394608.94608946
south: 7573260.73260733
west: 150000
east: 895015.88956751
nsres: 90.00090001
ewres: 89.99950345
rows: 9126
cols: 8278
cells: 75545028

Mask:

Type of Map: raster Number of Categories: 1
Data Type: CELL
Rows: 9126
Columns: 8278
Total Cells: 75545028
Projection: UTM (zone -23)
N: 8394608.94608946 S: 7573260.73260733 Res: 90.00090001
E: 895015.88956751 W: 150000 Res: 89.99950345
Range of data: min = 1 max = 1

Data Description:
generated by r.mapcalc

Comments:
if(isnull(elev_riv), null(), 1)

+----------------------------------------------------------------------------+

DATA:

Digital Elevation Model:

|----------------------------------------------------------------------------|
| |
| Type of Map: raster Number of Categories: 255 |
| Data Type: FCELL |
| Rows: 9126 |
| Columns: 8278 |
| Total Cells: 75545028 |
| Projection: UTM (zone -23) |
| N: 8394608.94608946 S: 7573260.73260733 Res: 90.00090001 |
| E: 895015.88956751 W: 150000 Res: 89.99950345 |
| Range of data: min = 299 max = 2076 |
| |
| Data Description: |
| generated by r.mapcalc |
| |
| Comments: |
| if(mask_with_dam == 100, dem_shrink + 100, dem_shrink) |
| |
+----------------------------------------------------------------------------+

Flow Accum

|----------------------------------------------------------------------------|
| |
| Type of Map: raster Number of Categories: 255 |
| Data Type: DCELL |
| Rows: 9126 |
| Columns: 8278 |
| Total Cells: 75545028 |
| Projection: UTM (zone -23) |
| N: 8394608.94608946 S: 7573260.73260733 Res: 90.00090001 |
| E: 895015.88956751 W: 150000 Res: 89.99950345 |
| Range of data: min = 1 max = 7048270 |
| |
| Data Description: |
| generated by r.mapcalc |
| |
| Comments: |
| abs(flow_accum_t) |
| |
+----------------------------------------------------------------------------+

Eha

|----------------------------------------------------------------------------|
| |
| Type of Map: raster Number of Categories: 0 |
| Data Type: CELL |
| Rows: 9126 |
| Columns: 8278 |
| Total Cells: 75545028 |
| Projection: UTM (zone -23) |
| N: 8394608.94608946 S: 7573260.73260733 Res: 90.00090001 |
| E: 895015.88956751 W: 150000 Res: 89.99950345 |
| Range of data: min = 21189 max = 53450 |
| |
| Data Description: |
| generated by r.grow |
| |
| Comments: |
| r.grow input="eha_t2" output="eha" radius=100 metric="euclidean" |
| |
+----------------------------------------------------------------------------+

dist_riv

|----------------------------------------------------------------------------|
| |
| Type of Map: raster Number of Categories: 255 |
| Data Type: DCELL |
| Rows: 9126 |
| Columns: 8278 |
| Total Cells: 75545028 |
| Projection: UTM (zone -23) |
| N: 8394608.94608946 S: 7573260.73260733 Res: 90.00090001 |
| E: 895015.88956751 W: 150000 Res: 89.99950345 |
| Range of data: min = 0 max = 197.989052746315 |
| |
| Data Description: |
| generated by r.mapcalc |
| |
| Comments: |
| dist_riv_t / 90.000202 |
| |
+----------------------------------------------------------------------------+

elev_riv

|----------------------------------------------------------------------------|
| |
| Type of Map: raster Number of Categories: 255 |
| Data Type: FCELL |
| Rows: 9126 |
| Columns: 8278 |
| Total Cells: 75545028 |
| Projection: UTM (zone -23) |
| N: 8394608.94608946 S: 7573260.73260733 Res: 90.00090001 |
| E: 895015.88956751 W: 150000 Res: 89.99950345 |
| Range of data: min = -78 max = 669 |
| |
| Data Description: |
| generated by r.stream.distance |
| |
| |
+----------------------------------------------------------------------------+

svc

|----------------------------------------------------------------------------|
| |
| Type of Map: raster Number of Categories: 960 |
| Data Type: CELL |
| Rows: 9126 |
| Columns: 8278 |
| Total Cells: 75545028 |
| Projection: UTM (zone -23) |
| N: 8394608.94608946 S: 7573260.73260733 Res: 90.00090001 |
| E: 895015.88956751 W: 150000 Res: 89.99950345 |
| Range of data: min = 0 max = 960 |
| |
| Data Description: |
| generated by r.cross |
| |
| |
+----------------------------------------------------------------------------+

Kind regards,
Lisa and Josee

calc_subbas(): Single flow vs. Multiple flow algorithm

At the moment for subbasin delineation using the GRASS function r.watershed the single flow direction (SFD) algorithm is used. In literature it is usually stated that multiple flow direction (MFD) algorithm should be superior. In my tests, however, when employing MFD, (still using GRASS 6.4.5) there have been problems as the generated flow accumulation map diverges at some points (causing trouble with the identification of river cells and catchment outlets) and generated subbasins are more variable in size.

One could think a litter deeper on that problem. So far I decided to use the SFD algorithm.

Suggestions for revision

The code in its current form is too messy and chances are too high of introducing new bugs with commits. A list of general points that should be considered for code revision:

Implement object-oriented programming
- e.g. object of class lumpr which contains meta-information, which would make it easier to process the chain of lumpR functions (user only needs to pass the lumpr object from function to function)
- S3 would probably be the most straightforward way; see this general introduction
replace the messy tryCatch() calls by on.exit(), where it makes sense
re-organise code and package structure following these guidelines
- let roxygen2 organise NAMESPACE to prevent bugs such as b7e5021
- more sub-functions
improve performance
- consider these guidelines
create an example dataset and write vignettes covering different topics
- use data stored as binaries to avoid license conflicts (and reduce memory)?!
add unit tests to reduce the chance of introducing new bugs with commits
- see this nice introduction based on package testthat
implement default parameter (function argument) values as much as possible to simplify the workflow for new users as much as possible (I think at the moment it's just too confusing for new users)

empirical formulas for subbasin and LU parameters

Have a look at empirical formulas for parameter estimation at lump_grass_post()

check function arguments

Implement argument checks for all functions (correct data type, structure etc.).

Test different OS

Package developed on Linux opensuse 13.1 and GRASS 6.4.3.

Should be tested for (at least) Windows and MAC.

db_compute_musleK extra options

Wish list of additional features:

option to automatically set musle_p to 1, if desired
automatically copy (from "horizons.dat"), column "coarse_frag" for 1st soil horizon into data base table "soil_veg_components", column "coarse_frac"
inconsistent nomenclature of "horizons.dat" column "coarse_frag" and "svc.dat" column "coarse_frac" ?

Error in data cleaning in db_check delete obsolete

db_check(dbname,

```
     check=c("delete_obsolete"),
```
```
     fix=T,
```
```
     verbose=T)
```

When executing following error message returns:

% Write changes into database and update 'meta_info' (might take a while) ...
Error in [.data.frame(dat_tbl, , key_t[1]) : undefined columns selected

The corresponding data base can be found at: V:\xchange\erwin\4TPilz

sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] lumpR_2.2.0 [...]

master branch broken

commits 6060cbb, 1f23161, and 7e9f1b5 are buggy. Currently the master branch is broken. This should be fixed (even so grass 7 imlementation is already in preparation).

Installation from R-commandline

install_github("https://github.com/tpilz/LUMP")
fails with
Fehler in username %||% getOption("github.user") %||% stop("Unknown username.") :
Unknown username.

I guess this is a github issue. Would be nice to work, though.

Compatibility to GRASS 7

The package needs to be adjusted to be compatible with GRASS 7. This, however, is not downward compatible to GRASS 6.4 as many argument names of functions changed etc. I.e., I guess it will take a few days to adjust lumpR. Nevertheless, it should be worth as GRASS 7 shall be more efficient and maybe lumpR will run faster.

memory issues with calc_subbas

For large catchments (e.g. Sao Francisco) calc_subbas issues this error:

Calculate drainage and river network...
ABSCHNITT 1a (von 4): Initiiere den Speicher.
Aktuelle Region Zeilen: 20217, Spalten: 14047
FEHLER: G_malloc: Kann nicht 2272853112 Byte auf init_vars.c:134 reservieren

According to other users, this is a memory issue, created by GRASS GIS
r.watershed

It is recommended to activate the flag
-m
(Enable disk swap memory option: Operation is slow
Only needed if memory requirements exceed available RAM; see manual on how to calculate memory requirements, http://grass.osgeo.org/grass70/manuals/r.watershed.html#in-memory-mode-and-disk-swap-mode)

prof_class: cluster analysis

In function prof_class() option of cluster analysis is deactivated and variance-based method is used. Maybe cluster analysis could be re-activated in a revised form?

Potential simplifications

A list of potential simplifications to make life easier, especially for new users:

-- to be updated --

calc_subbas(): determine reasonable parameter values automatically (e.g., use resolution and region size as proxies)
-- to be updated --

revise parallelisation of area2catena

In the current form, the parallelisation of area2catena seems to require the replication of the large grids. Instead, the parallelisation of calls just using the data required for single EHAs could improve performance significantly.

include contents of database.R (database interface)

Proposed subroutines:

create_db (con, ver_no)
update_db(con, ver_no)
fill_db(filenames, ...)
check_db [probably not a true function, but needs to be completed line by line)
export_db(con, dest_dir) (from make_wasa_input minus the checking)

include manual parameter specification

include options to add parameters manually in case data are available at lump_grass_post()

memory issues with prof_class()

prof_class() runs out of memory with large files of rstats.txt.
This has been partially reported in #16

Improve generation of do.dat

Some more parts in do.dat could be set automatically according to content of db:

doreservoirs
doacudes
dosnow (? tables yet to be implemented)

reservoir_lumped(): unify behaviour

Currently, reservoir_lumped() directly creates WASA-input files. It would be better if it would behave like the other routines: create output files, which can be re-imported into the db using db_fill.

issues grass7_2018

- to be continued -

Remaining issues I observed with my test dataset along with GRASS 7 adaptation:

calc_subbas(): message of number of subbasins left after removing spurious subbasins is wrong (i.e. not the actual number found in GRASS) -> resolved by commit 3fc4c67
calc_subbas(): differing number of categories in subbas and drainage points might be intentional (e.g. when drain_points and thresh_sub are given) -> remove or adjust warning message -> resolved by commit 3fc4c67
calc_subbas(): column subbas_id in output <points_processed>_snap does not correspond to categories in output basin_out -> resolved by commit 3fc4c67
calc_subbas(): if column subbas_id is given in input drain_points the categories in the resulting basin_out raster do NOT correspond to this column as promised by the documentation -> resolved by commit 3fc4c67
area2catena(): GRASS reclass files produced even with argument grass_files = FALSE; this occurs only if an element in arg supp_qual comes without explicit mapset declaration
reservoir_lumped() and reservoir_strategic(): @TillF Adaptation to Windows might be necessary (relates to readVECT() mapset issues)
db_create(): When trying to re-create an existing database v. 26 with overwrite="drop" there is an error: Table 'x_seasons' already exists when updating to version 26. Rename / delete manually, and repeat update. Note: The message with x_seasons is exemplary, when removing x_seasons the same error occurs with the next table. I don't understand how I should overwrite an existing table.
db_create(): does not work with overwrite = 'empty' applied to an existing database (error raised by internal call to db_update(), see post below.

Observed changes in behaviour in comparison to lumpR v2.5.0, latest GRASS 6.4 based version (@TillF check if this could be reasonable):

order of subbasins (i.e. their ID) in output files changed (but statistics, i.e. cell counts for specific subbasins, remain unchanged) -> intended and no problem
lump_grass_prep(): differences in the EHA map (IDs -> no problem; slightly different EHA sizes, max. 13 grid cells in my test setup) -> I made some tests and it turned out the raw outputs of r.watershed (i.e. argument half.basin/half_basin) are different, even with the same input data, threshold values and the same algorithm (SFD)
results from prof_class() deviate slightly in some occasions (TC definition), even with the same seed and the same input files and parameter settings (should not be problematic)
reservoir_outlet(): for vector map res_vct columns res_id and name are now required. I don't understand the reason, I think this is unnecessary. Maybe an argument should be added where the user can chose which column contains reservoir IDs (the standard should be column cat). -> removed necessity to contain column name; res_id should be fine
reservoir_lumped(): When looking at the source code I get the feeling that there are superfluous commands resulting in higher computation time (e.g. res_lump <- readVECT(res_vect_class, type="point") in lines 323 and 341

Function execution times:

calc_subbas() now takes twice as much execution time in my example -> seems to be case specific, in a further test I made with different data it was faster... I guess, on the one hand GRASS operations are faster, but at the other hand some recent extensions (specifically, ensuring that drain points are not directly in the middle of a cell which includes some additional GRASS as well as read and write operations) make it slower
lump_grass_prep() takes less than half of the former runtime which is good but might be connected to the issue above? -> GRASS operations are faster, different results caused by different behaviour of r.watershed, see above
reservoir_lumped() is considerably slower

- to be continued -

calc_seasonality() performance

Function uses loops in R code which make it very slow for large datasets. These loops could be integrated to the underlying Fortran code which would result in a large improvement in performance.

R CMD check issues in v1.0.0 (RainySeason.f)

Running R CMD check LUMP for release v1.0.0 produces one warning and one note which still have to be resolved:

Compiler warnings from RainySeason.f:

Found the following significant warnings:
  Warning: Possible change of value in conversion from REAL(8) to REAL(4) at (1)
  Warning: Possible change of value in conversion from REAL(8) to REAL(4) at (1)
  Warning: Possible change of value in conversion from REAL(4) to INTEGER(4) at (1)
  Warning: Possible change of value in conversion from REAL(4) to INTEGER(4) at (1)
  Warning: Possible change of value in conversion from REAL(4) to INTEGER(4) at (1)
  Warning: Possible change of value in conversion from REAL(4) to INTEGER(4) at (1)
  Warning: Possible change of value in conversion from REAL(4) to INTEGER(4) at (1)
  Warning: Possible change of value in conversion from REAL(8) to REAL(4) at (1)
  Warning: Possible change of value in conversion from REAL(4) to INTEGER(4) at (1)
  Warning: Possible change of value in conversion from REAL(4) to INTEGER(4) at (1)
  Warning: Possible change of value in conversion from REAL(8) to REAL(4) at (1)
  Warning: Possible change of value in conversion from REAL(4) to INTEGER(4) at (1)
  Warning: Possible change of value in conversion from REAL(8) to REAL(4) at (1)
  Warning: Possible change of value in conversion from REAL(4) to INTEGER(4) at (1)
  Warning: Possible change of value in conversion from REAL(8) to REAL(4) at (1)
  Warning: Possible change of value in conversion from REAL(4) to INTEGER(4) at (1)
  Warning: Possible change of value in conversion from REAL(4) to INTEGER(4) at (1)
  Warning: Possible change of value in conversion from REAL(4) to INTEGER(4) at (1)
  Warning: Possible change of value in conversion from REAL(4) to INTEGER(4) at (1)
  Warning: Possible change of value in conversion from REAL(4) to INTEGER(4) at (1)
  Warning: Possible change of value in conversion from REAL(4) to INTEGER(4) at (1)

These implicit type conversions might be an issue.

One note regarding calc_seasonality.f90:

* checking compiled code ... NOTE
File ‘/home/tobias/R/R.checks/LUMP.Rcheck/LUMP/libs/LUMP.so’:
  Found ‘_gfortran_stop_string’, possibly from ‘stop’ (Fortran)
    Object: ‘calc_seasonality.o’

Compiled code should not call entry points which might terminate R nor
write to stdout/stderr instead of to the console.

See ‘Writing portable packages’ in the ‘Writing R Extensions’ manual.

database operations are slow

Database operation using RODBC are very slow which becomes significant when processing large amounts of datasets. I also noticed that on Linux it is slower than on Windows (for whatever reason) and there are also differences regarding the employed DBMS (e.g., sqlite is slower than MariaDB/MySQL).

There are a few discussions around regarding this issue. However, it seems to be necessary to employ a different R package (and adapt lumpR accordingly) to speed up database processing.

A solution could be the package RJDBC, see http://stackoverflow.com/questions/30943748/r-painfully-slow-read-performance-using-rodbc-sql-server.

dists (column closest_dist in lu.dat) is always zero

This column served as indicating the (virtual) distance of the closest (most similar) catena to the LU-toposequence. It is not needed for operational purposes.
Still, the column should be computed correctly or removed completely.