If the user collects values over a long period of time (weeks, months) and with a shor

The Java libraries used for /DG displays (C3+D3

[FEATURE REQUEST] Modifying the graph to allow the long-term data logging,about fredlcore/bsb-lan

Comments (166)

fredlcore commented on June 9, 2024 1

I wanted to address the original issue using SQLite. You argued counterwise and suggested a binary format. I agreed to look into that approach. If you've changed your focus because all of a sudden you don't see a connection to the orignal issue, that's perfectly fine, no one is forced to do anything here that one does not find useful, that's includes both of us.

As for the coherence of data points, this could also be done using the unique row ID if the row ID changes only when a new log interval is logged.

from bsb-lan.

fredlcore commented on June 9, 2024 1

Sure, people can try your repository to do that, but there is no need to send a PR which means you want this code included in the main repo.
Anyway, I'll try to approach the data storage first and see if I go for a binary-coded rotating buffer or if the SQLite approach is easier in the end to implement despite the added overhead. Once this is done, you can see if your /DG modification still works and if it does and uses the "days" and not "kilobyte" approach, I will of course consider merging it.

from bsb-lan.

fredlcore commented on June 9, 2024

Hm, I was 100% sure that we had an open ticket for this request already, but I can't find it right now, so I'll leave this open until I find the other one. Otherwise, we'll continue here. In short: Yes, that would be a very handy feature, but since there is no fully working SQL(ite) or other kind of database available on the Due/ESP32, it's difficult to implement this. If someone has ideas how to effectively purge data older than a certain timeframe, I'm happy to hear more about it.

from bsb-lan.

CZvacko commented on June 9, 2024

The values are currently read from a file, not from memory, right? My idea was to write a separate file for each week, e.g. datalog_w01.txt, datalog_w02.txt. Then just plot the desired file (datalog_w01.txt ...), no need to have a dymamic selection for the date range.
Maybe some users will prefer to keep the current way, then some setting could be added to allow the user to choose if they want a single log file or separated by weeks.
Also, there should be an option to download all log files at once.

from bsb-lan.

fredlcore commented on June 9, 2024

Yes, they are read from a file. Separating the files in weeks is not too feasible, I think, because then you can't see for example what happens from Sunday over to Monday in one graph. Usually time intervals are more intuitive and could easily be adjusted to the users' needs (7 days, 1 month etc.). The problem is the "maintenance work" that would have to be done at a given time where old log entries are discarded. In SQL(ite3) that's really easy and fast, but there is no full implementation for microcontrollers yet.

from bsb-lan.

CZvacko commented on June 9, 2024

Yes, sql would be ideal, but still better separate weekly logs than nothing. Concerning logging over the next week,, if someone will expect it in advance, then he can switch the (proposed) setting to use continuous logging.

from bsb-lan.

DE-cr commented on June 9, 2024

The JavaScript libraries used for /DG displays (C3+D3) are indeed slow when dealing with larger datasets. The older /DG implementation (using D3 only) might be faster. You can use it instead of the C3+D3 version by preceding #define USE_ADVANCED_PLOT_LOG_FILE in BSB_LAN_config.h with //.

One easy (?) hack with the C3+D3 version might be to filter the /D data in its JavaScript right after reading, e.g. by applying a regular expression passed in with the url, e.g. /DG?DateFilterRegEx=1.\.02\.2023 or something like that for 10.-19. Feb 2023 only. However, this seems hardly user friendly. Providing a more usable way to specify the time frame for filtering is conceivable, but would require more work.

My personal solution to this problem has been to keep log intervals long and/or log periods short. :)

from bsb-lan.

fredlcore commented on June 9, 2024

Hm, the filter is actually quite a nice idea, because if we use dropdowns or whatever for setting the timeframe, the corresponding regexes could be created from that setting. However, it would still require to always read the complete dataset from the SD-card...

from bsb-lan.

DE-cr commented on June 9, 2024

Yes, but that size penalty would hurt only once, not for working with the graph (zoom, pan, ...), and also not for its initial display.

Interactive filter settings could take different forms, e.g. show only the most recent n lines, show only data between dates x and y (possibly with y=now as a default), using every tenth value only, ... Since /D shows dates as dd.mm.yyyy and not yyyy-mm-dd, deciding if a date lies between x and y isn't as simple as it could be; for that, I'd (suggest to) transform the current format into the latter one before doing the actual filtering.

Do you know if there's enough people using /DG to display large datasets to warrant the implementation effort? So far I've thought that /DG is not meant to replace a proper FHEM system's graphing capabilities.

from bsb-lan.

fredlcore commented on June 9, 2024

Of course it would hurt for its initial display because you would still have to load the complete data set even though if you only want to display the data for last week. That's where a SQLite implementation would significantly speed things up.

As for FHEM: If you have ever worked with FHEM, you'll know that its graphing capabilities is one of its worst qualities. FHEM's strength lies in fine-grained controlling of functions which is something that BSB-LAN cannot reproduce. The functionality of /DG as it is now is not really usable unless the user takes regular action (i.e. regularly deleting/moving data). Having a function that is half-baked does not make sense, so we either have to improve it or remove it. And yes, there are quite a few people using it.

from bsb-lan.

DE-cr commented on June 9, 2024

That's a bit like saying: all cars that cannot carry a ton of bricks in their trunk doing 200 km/h don't make sense, or all cars that need to be refueled are not really usable. I consider the current /DG implementation very useful, and much more versatile than cars, to stick to that example: It gives you both detailed (speed) and long period (load capacity) graphs - and if you're patient or willing to clear the data (refuel) every once in a while, even both at the same time. ;)

I'll take a lot at extending the current /DG implementation to interactively filter data at the client side, if that's desired.
Doing the filtering on the server side (SQLite, ...) or providing a completely different /DG implementation is something I don't see in my capacity.

from bsb-lan.

DE-cr commented on June 9, 2024

If/when filtering on the client side is done: The data in /D is highly redundant and would compress very well, to reduce transmission time. Of course, server side filtering would bei better, though.

from bsb-lan.

fredlcore commented on June 9, 2024

The older I get, the less I think that analogies are helpful in any way because people tend to discuss on the details of the analogy more than they should do on the actual matter.
I have found implementations of SQLite that could work with BSB-LAN, but it's written for the IDF environment, so I guess it needs patching to work with the Arduino IDE which we use in this project:
https://github.com/siara-cc/esp32-idf-sqlite3/tree/c34beba7318fc00953b33db215b10dd23cb31924

from bsb-lan.

DE-cr commented on June 9, 2024

Sqlite3.c is 7.36 MB. Sounds pretty big to me for an arduino/esp32 library used for what, to my understanding, is not bsb-lan's core functionality.

from bsb-lan.

fredlcore commented on June 9, 2024

You know that the size of the source code (including in this case hundreds of lines of comments) says nothing about the size of the compiled binary?

from bsb-lan.

DE-cr commented on June 9, 2024

I've seen enough code to know that, as a general rule, there is some correlation between source code size and binary code size. And I've seen enough code to be skeptical about fitting SQLite3 functionality in just a few KB. The size given here sounds more reasonable to me - or at least that order of magnitude.

The current /DG implementation accounts for less than 2 KB of bsb-lan's binary code, iirc.

from bsb-lan.

fredlcore commented on June 9, 2024

So you seriously compare a full-blown SQLite installation on a regular PC with an adaptation with reduced functionality specifically designed for a microcontroller?

If you want me to take your argument seriously, provide relevant, fact-based information, for example after compiling one of the examples and see how much memory it consumes. This is what I will eventually do once I have spare time. Otherwise this discussion leads us nowhere.

from bsb-lan.

DE-cr commented on June 9, 2024

On my system, the sqlite3_spiffs example from https://github.com/siara-cc/esp32_arduino_sqlite3_lib uses 624653 bytes for the program and 22004 for global variables.

The ESP32_Console example from https://github.com/siara-cc/sqlite_micro_logger_arduino uses 376505 / 22288 bytes.

The BareMinimum example that comes with the Arduino IDE uses 211053 / 16048 bytes.

Subtracting the BareMinimum values from the SQLite / MicroLogger examples' values leaves us with 413600 / 5956 and 165452 / 6240 bytes, respectively.

Both 404 KB and 162 KB are the same order of magnitude as 500 KB.
q.e.d. ;)

from bsb-lan.

fredlcore commented on June 9, 2024

Thanks. Have you deducted the boilerplate that comes with every esp32 binary? That alone is already several dozens of kilobyte and already part of the BSB-LAN binary, so it would not be added when using sqlite.

from bsb-lan.

DE-cr commented on June 9, 2024

Yes, see above.

from bsb-lan.

fredlcore commented on June 9, 2024

Ah, ok, noted. Adding one of the SD_MMC examples (that's what will be used for SD card logging) to BSB-LAN's code adds 366kB to flash memory and 5kB of used RAM. While 500kB would have more or less maxed out the available flash in standard ESP32 configuration running BSB-LAN, 366kB still leaves quite some air to breathe.
The Micro Logger library is not an option as it does not (yet) support the VACUUM command to actually shrink the size of a database after deleting expired rows.

from bsb-lan.

DE-cr commented on June 9, 2024

I've implemented client side filtering to alleviate /DG performance issues when dealing with large datasets:

Two date input fields (from ... to) to narrow down data to be plotted. When loading /DG, they are set to the min/max dates found in the dataset (/D) - unless there's >10k lines of data, in which case the 'from' date is set to where the last 10k lines of data begin.

To signal when not all available dates are selected for plotting, a '!' is put next to the date input field that's cutting off data (both in the screenshot here). Changing one of the date input fields redraws the graph.

Compared to server side filtering (using SQLite or whatever), this solution still wastes time by both transmitting the whole /D dataset for /DG, and filtering it each time the plot is redrawn. However, what takes up the bulk of the time is highly reduced (unless the user sets the from/to dates to include all data): the actual plotting of thousands of values.

I've tested this implementation using Firefox 109.0.1 and Chromium 110.0.5481.100 on Ubuntu 22.04.1, as well as on my cheap, old Android phone with Chrome.

The code change adds 544 Bytes to the bsb-lan binary's size.

Let me know if I should upload this /DG version to github!

from bsb-lan.

DE-cr commented on June 9, 2024

P.S. I've tested this on my i5-3320M / 2.6 GHz laptop using a data set of >300k data points and found the performance quite good - unless I set the from/to date filter to plot more than about 30k data points at once.

from bsb-lan.

fredlcore commented on June 9, 2024

Cool, thanks!
544 Bytes sounds great of course, how long did it take to display let's say a week worth of data in this 300k dataset? I assume the file would then be around 30MB (or that order of magnitude ;) )?

EDIT: I'm not sure if I remember correctly, but I think you are not using an SD card in your setting? That is the main problem that data throughput of the SD card is rather slow (on the Arduino it was several dozens of kB/s). So downloading a 30MB file to display would take 300 seconds at 100kB/s. That was the point of departure for me to think of a server-side solution that in itself does not have to parse the whole file for just a subsection because SQL(ite) would know where to look for the relevnt data sets.

from bsb-lan.

fredlcore commented on June 9, 2024

We could then think of a URL-command that purges the dataset except for the last x days of data (worst case by copying over the remaining data points from the old data file to a new file, removing the old and renaming the new file).

from bsb-lan.

DE-cr commented on June 9, 2024

Times on my system (see above), using Firefox, data set = 31 days * 24 hours * 60 minutes * 7 parameters = 312480 data points:

1104 ms to plot one day (10080 data points in 7 lines),
3848 ms for one week,
15932 ms for the whole month.

The above times are for plotting only, not including the /D data transfer from server to client. I've not done repeated timings for the data subsets above, only a single timing each.

I've done the above timing tests w/o including bsb-lan communication, loading a local file from my computer instead. Since I've created that file artificially (and because I'm lazy), it is smaller than actual /D output and weighs in at 14 MB. A typical line in this file looks like this:
20230131235900000;2023-01-31 23:59:00;7;7;120;
instead of something like
364593010;01.05.2022 00:00:15;8314;Kesselrücklauftemperatur Ist;66.7;°C.

from bsb-lan.

DE-cr commented on June 9, 2024

Btw, I don't see myself running into /DG performance issues anytime soon, even with the existing implementation: I currently log two parameters only, once a day. :)

from bsb-lan.

DE-cr commented on June 9, 2024

Yes, my ESP32 doesn't have an SD card.

from bsb-lan.

fredlcore commented on June 9, 2024

Thanks, I didn't see the rendering on the local PC or mobile phone as a problem and your findings support that assumption, although I'm a bit surprised that rendering a month takes 15 seconds. For comparison: I log 49 values every five minutes for 28 days (i.e. approx. 395.000 datapoints) into a SQLite database on my Synology NAS (which runs on a mediocre Intel processor) and have a PHP script displaying these datapoints in 14 different graphs, and to dynamically generate these 14 graphs with all the data contained on one page takes 6-7 seconds, from pressing the reload button to the page finishing loading.
But even though it's 50% faster, I wouldn't mind waiting these 15 seconds on the client side to display this amount of data because that does not affect running BSB-LAN. The problem is rather waiting for the server to send (in your case 14 MB of) data. I just ran a check on a 350kB file and the performance is around 150-180 kB/s. Weirdly, the Due isn't much slower than the ESP32 Olimex EVB, and even more weirdly, performance is significantly lower (approx. 50-80kB/s) when using WiFi compared to Ethernet (not sure why, the WiFi connection should not be a problem, BSB-LAN sits right next to the router).

So to transfer a 14 MB file would take 1.5 minutes to download the data on the Due and (Ethernet) Olimex during which BSB-LAN couldn't do anything else, i.e. queries or logging entries during that time would not be possible. On the ESP32, there might be a chance to handle the data preparation and sending in a different core, but not on the Due.
The advantage of this approach would be that moving through the data could be done without additional data transfers. The drawback is that renewing the data would always fetch the complete file again, taking another 1.5 minutes.

I guess I'll have to run some performance tests with the ESP32 SQLite implementation and see if that significantly speeds things up on the server side. If it doesn't then I'd happily take you up on your offer.

from bsb-lan.

fredlcore commented on June 9, 2024

I just ran the tests that come with the ESP32 SQLite implementation:
Counting all 9000 rows in a randomly generated database takes around 3.5 seconds.
Running a query across these 9000 entries that match a certain condition and list the first 10 of them takes 0.25 seconds - including the time it takes to output the contents on the serial console.

from bsb-lan.

DE-cr commented on June 9, 2024

An easy solution for people logging much data and usually just wanting to display the newest entries would be to add a URL command to output the newest n kB of /D only, e.g. /DEn or maybe just /DE with n fixed or configurable in /C (/DE = data end). This could very easily be served by seeking n kB back from the end of the open datalog.txt, skipping to the first \n from there, and delivering the rest. If the user should miss the header line in this approach, that could also be easily fixed (but the current /DG implementation doesn't need it: it just skips the first line).

This could be implemented in addition to my current /DG modification, which could then use e.g. /DE100 per default to fetch the data, with a possible extension to fetch /D completely on request.

And yes, it seems C3 and/or D3 has not been designed to handle large datasets. However, I'm still amazed by what it does, and how easy it is to use - for free.

from bsb-lan.

DE-cr commented on June 9, 2024

As for the transfer times: Did I mention that /D data could easily be compressed, either by running it through a general purpose compressor or by using a different format? ;)

When testing my current /DG mod, I repeatedly found myself jumping between different days in that month's log data, to compare those days. This use case would still require to transfer lots of data, i.e. server side filtering would not help as much as you might think/wish.

from bsb-lan.

DE-cr commented on June 9, 2024

Btw, waiting 15s for the initial display might be ok, but as @CZvacko says: "working with the chart is quite slow. Any zooming or other operation takes a long time". He/she did not complain about the transfer time. Just saying. ;)

Of course, I also see not binding the server 100% to /DG (or /D) data transfer for extended periods of time as a priority for a system with bsb-lan's main functionality (which I see in communicating with the boiler control, not with a user / web client).

from bsb-lan.

DE-cr commented on June 9, 2024

One advantage of using a naked standard ESP32 for logging is that you're unlikely to run into /DG performance issues. ;)

Ensuring that bsb-lan doesn't get blocked by extended /D data transfers could also be done by artificially imposing a size limit on datalog.txt. If the standard use case is to look at the most recent data only, this could still be easily achieved by using datalog.txt as a rolling buffer, i.e. opening it rw for logging and seeking to (or staying at) the position after the most recent write operation (or to the file's start after roll-over) before adding new data. Staying with a plain text format for datalog.txt, this could be made easier by switching to a constant line length format, filling lines up with spaces where required. (The current behavior is to just stop logging before bsb-lan runs out of file system space, which will then lose more recent values.)

from bsb-lan.

fredlcore commented on June 9, 2024

Compressing data doesn't help because the bottleneck is the transfer from the SD card to the microcontroller, not the transfer time vie the network. For compression, you would still have to get all the relevant data into memory, compress it and then send it through the network. Without proof that this will significantly reduce the overall transfer time, I'm not going to pursue this any further.
We can discuss many different use cases, but from my experience (and that is the one that counts in the end ;) ), I monitor the last 24 to 48 hours most of the time and when there are anomalies, I go back several more days or in case I find a bigger problem, up to a month. So most of the time server-side data selection/reduction will lead to a significant reduction in transfer time compared to downloading everything all at once, especially if the client-side display caches values so that if you jump back to a previously viewed day, the data won't have to be downloaded again (this would just be an addad bonus, don't know if this is doable with C3/D3, if it doesn't then that's fine and still better than anything else at the moment).
I have also thought about using fixed length entries to create a rolling buffer and/or implement something like what you suggested with /Dn, but this is easier said than done, first and foremost because you couldn't select a timeframe based on actual time and date. Yes, you could do some kind of calculation based on the log interval and then estimate how many entries make up for an hour or a day, but changes in the log interval or any kind of interruptions will quickly compromise this approach, and the larger the interval the bigger the impact. But yes, this something in this direction would be the last resort I would fall back to if there are no better ways to deal with this.

At the moment, I find the access times of SQLite quite compelling.

from bsb-lan.

DE-cr commented on June 9, 2024

When I said "using a different format" could "compress" data, I meant not using a simple plaintext line for each log value, but either using binary format, or including a look-up table for the parameter names instead of spelling them out each time, or something like that. This would (or at least could) also reduce file sizes, i.e. SD card read times.
Client side "caching" would (or should I say "should", as I haven't done it, yet?) be possible in JavaScript, outside of C3/D3.
If time frame selection is of paramount importance, I have to agree. Personally, I wouldn't care much if I get a plot showing 2.9 days instead of two. Hey, that's almost "buy one, get one free"! ;) With my current /DG mod, you could easily narrow the plot down to just two or one day, after having transmitted a bit more data than was necessary.
I also like SQLite. I'm just not sure I would want to include a library this big into a micro controller project where (if?) the main application is not a database (server).

from bsb-lan.

fredlcore commented on June 9, 2024

If you have proof that a binary or any other compressed form of data storage will significantly reduce server-side timeload, I'm more than happy to look into it. Personally, I doubt it.
As I said, yes, if other options prove to be less viable, going with a rolling buffer and fixed-length entries will be the way to go, even if time frame selection will be less accurate.
To finally get this "main application" thing out of the discussion: Data logging has been a part of BSB-LAN since almost exactly six years ago. Displaying log files as a graph came a few months later. Given that this project has been started eight years ago, I'd say yes, that is a core part of BSB-LAN.
But apart from "we've always done it that way": For people who use home automation systems, neither logging nor graph display are necessary, so if you argue that this is not a core feature, then both SD-card logging and graph display should be removed. However, many users are ordinary guys who may not even have a home automation system. Especially now in this energy crisis, they just want to see if their heating system is properly configured (and boy, you won't believe how many of them aren't). So what they need is an easy way to monitor their heating system. Here, logging on SD-card is a must. And unless you don't want to download the CSV and force Excel each time to generate some kind of understandable graph for you, the graph functionality adds quite some significant value for these type of users. And if it stays, it should be done "right" in a way that circumstances allow. Not that it doesn't work as it is, but this way it is quite a pain to work with.

from bsb-lan.

DE-cr commented on June 9, 2024

If SD card reading is the bottleneck:

364593010;01.05.2022 00:00:15;8314;Kesselrücklauftemperatur Ist;66.7;°C is 73 bytes, plus line ending.
;Kesselrücklauftemperatur Ist and ;°C (34 bytes) would not need to be stored; they could be added on-the-fly in delivering /D.
Both uptime and date+time could be stored in four bytes each (ms, Unix epoch). In the plaintext example above, this information takes up 29 bytes.
The parameter number could be stored in 2+1 bytes (xxxxx.y) instead of five here (including the ';').
That's 57 bytes saved. Significant? Depends on your expectations. I wouldn't mind a 78% tax cut, for example. ;) Worth the effort? You decide. The current bsb-lan logging serves my needs all right, as it is.

...and do count me in on users of /DG. :)

from bsb-lan.

CZvacko commented on June 9, 2024

Data logging has been a part of BSB-LAN since almost exactly six years ago.

In my case the story is as follows, after completing the construction of the house and setting up the heating, I noticed incorrect behavior (cycling) during the spring-summer or summer-autumn transition. This problem led me to find some way to debug it, then I found out there is some BSB-LAN, so I ordered it. Datalogging is an essential feature, without it I can't tell the service technician exactly what is happening and when. Without logging, it's just guesswork that leads to a trial-and-error setup procedure (even for an experienced technician). And the reason why a longer data log is needed is because the technician will usually come to you maybe two weeks after the call and want to see some history (compare multiple days). Now that my cycling issue has been resolved, there is just some "room for improvement" in efficiency. In the long term I plan to set up a more robust solution like Grafana, but that's the distant future (I have other priorities right now).

As for the log format, I've never used uptime, not sure of the purpose. I also sometimes export data and create some chart in excel. Then the date+time, parameter number and value should stay AS IS. The description parameter can be removed and added later with some vlookup.

from bsb-lan.

fredlcore commented on June 9, 2024

@DE-cr: Again: Please provide proof that such a compression actually saves server-side timeload. Yes, some information could be generated on the fly, but it would require thousands of lookups in cmdtbl which in itself is not the most ideal form of storage for lookups, but it is what we have. Caching may not be possible because we don't know the exact number of logged parameters, plus, log parameters may change in one and the same file.
Assuming on average a 50% shrink is possible and my pessimistic assumptions are not true, that's still close to a minute for the download. Better, but still not really satisfactory.

The main advantage I see in this is that it would also run on an Arduino Due where I'm not sure the SQLite solution would work. So implementing this approach would not be wasted time because we would have to support two separate ways of storing data anyways, depending on the platform. So if you or anyone else would like to come up with something (binary storage, rolling buffer), I would be interested to see how it performs, and if it is faster by any means, switch to that for the time being.

from bsb-lan.

DE-cr commented on June 9, 2024

If so desired, I'd be willing to try my hands at the /DE approach I've drafted above, as I think this would be easy and sufficient to serve the most common /DG use cases well.

If someone wants to look at graphs of huge datasets, I'd recommend patience, or systems better suited to that task than a micro controller serving data over wifi from an SD card. :)

from bsb-lan.

DE-cr commented on June 9, 2024

My implementation of /Dn sending only the most recent n KB (minus the most likely incomplete line at that cut-over point, plus the datalog header) adds 244 bytes to the bsb-lan binary.
This should be helpful for both /D and /DG when dealing with large datalogs, allowing to cut down on transmission times.

In case we can agree on a fixed value for n to be used in /DG (how about 999, which should translate to about 15000 data points?), all that's left to do is add that n in the loading of /D in the /DG implementation (adding another 3 bytes to the bsb-lan binary), and to decide if we want to couple this with the client side filtering in my /DG modification described above.

Should a variable value of n be required for /DG, that could be handled in /DG's javascript code, introducing another GUI control there to set n (which, of course, would further increase the size of the bsb-lan binary).

Please note that my testing of this latest modification has been limited: I don't have an SD card in my bsb-lan system, and my current datalog.txt is less than 10k in size. However, I don't see how this should affect the test's significance.

from bsb-lan.

fredlcore commented on June 9, 2024

Thanks, but does it make sense to implement this before potentially switching to a binary storage format etc.? I often hate it if I have to touch the same thing twice in a short time only to revert part of the changes I've made shortly before.
As for the value used for DG, I would calculate it dynamically to match the last three days. I.e. (((32460*60)/log_interval)*log_parameters)*60(bytes ?)
Log parameters and interval can be obtained via /JI for example.

from bsb-lan.

fredlcore commented on June 9, 2024

I'm just thinking about the compression of the fields: I think we can do away with the milliseconds. While this is a nice way of finding out when the microcontroller had rebooted (due to a crash or a blackout maybe), it is not entirely necessary.
With UNIX time we have the problem that it is still 32 bits on the ESP32, at least at the moment. Making it workable until the year 9999 would cost us 38 bits for YMD-HMS. 15 bits for the parameter number, so we and up with 53 bits, except for the value and a "key" field which we need in order to figure out which is the newest/oldest data row as reference point (we cannot use the date/time field because not all heaters provide usable time, so these would start at Epoch after each reboot). The key would probably be 32 bit, so that's 85 bits in total for everything else except the value. For fixed length rows, the value field would have to be set at the maximum of 32 bytes which is the maximum length of a telegram payload. Not sure if reducing this would be feasible because especially novice users who are logging for example the heaters status won't be happy if only the numerical status is logged and not the clear text (which of course won't make sense to display as a chart, but we have had this dual use before). Of course the option values could be converted to readable strings on the fly as well, but then I'm even more sceptical if this will save timeload on the server. But even if we do so, we would have to set aside a 'safe' number of bytes to display floating point numbers up to a certain degree. Maybe 10 bytes could be considered safe if option values will be decoded on the fly.

Another thing to consider is to keep track of the most recent data row in the rolling buffer. While a variable can keep track of this during runtime, a fast method to find this row during startup is necessary that doesn't block the startup process for too long, maybe with a binary tree search using above-mentioned key field.

from bsb-lan.

DE-cr commented on June 9, 2024

As for the value used for DG, I would calculate it dynamically to match the last three days. I.e. (((32460*60)/log_interval)*log_parameters)*60(bytes ?)
Log parameters and interval can be obtained via /JI for example.

Current log parameters are available via /JI, but they could have been changed during the last previous days, plotting could have been stopped in between, etc - and 60 bytes may or may not be the average line length. And with my current logging setup (two values, once a day), limiting /DG to three days' worth doesn't make sense.

I'm in favor of just using a reasonable KB value that won't bog down the system too much. Hopefully that covers more than enough data, so that the user can just narrow it down to the last three (e.g.) days by using my current /DG javascript filtering. ...and if we let the user change that KB value, there will be enough flexibility to cover most needs.

And after some more thinking, I'm against changing the datalog.txt format to e.g. binary:

about 50% savings is less than what you need / expect ("order of magnitude")
using a different format in /D transfers would break clients (including /DG; do you know how many others exist?)
I don't like the cost/benefit ratio
it will not stop users from transferring SD card datalog.txt files measuring 2 GB ;)

As for the uptime [ms] value: /DG uses it to avoid gaps in the plots when values from one data set have differening timestamp, which is quite common at the one-second resolution used. It is much easier to check if the uptime ms differs by more than 999 from that of the previous value than to figure out that 31.12.2022 23:59:59 and 01.01.2023 00:00:00 are almost the same.

from bsb-lan.

fredlcore commented on June 9, 2024

Well, the /JI information is just a ballpark which will fit most of the time, but as you said before, if you don't exactly get what you expect, it won't hurt that much. However, in most cases, it will be much more accurate than just using a random amount of kB.

I don't understand why you're now aginst switiching to a binary format when you proposed it before and actually got me mostly convinced to do that instead of going down the SQLite way:

For one, it is the only way we can reasonably implement a rolling buffer. How else will you be able to quickly jump back let's say 1000 rows? And that is btw also why we (have to) know that each row will be 60 bytes (or whatever we choose, that's why I put the question mark). Letting ordinary users determine an abstract amount of kB is counter-intuitive and will just increase support work for us.
If three days are not suitable for you, you will be able to change it in BSB_LAN_custom_defs.h
/D transfer will always be CSV separated in order not to break compatibility (with the exception of probably removing milliseconds and replacing it with a row ID). Transformation will be done on-the-fly as you yourself suggested above.
I thought I had learned that 78% (as mentioned above by you) and 50% are in the same order of magnitude ;)? In any case, it would be a significant increase that does not come at any cost except switching to a new data storage model. So why don't you like the cost/benefit ratio?
It will definitively stop users from transferring SD card datalog.tyt files measuring 2 GB because by default, only X number of rows will be read. If they choose to let /DG display the data of the last two years, it's their choice.

Does /DG actually use the ms value to sort plot data? I thought that it uses the full date. And then it doesn't really matter if the points are not actually second-aligned as you won't notice it unless you hover over the data point and compare the times. The scales won't reflect such differences because they are in larger units IIRC.

from bsb-lan.

DE-cr commented on June 9, 2024

Binary format in file storage only, with the client interface /D untouched? Sure, if you consider the gain worth the effort.
I don't see it directly addressing this feature request's original issue, though, which is why I've now changed my focus.

/DG currently uses the ms values to check temporal coherence of adjacent data points, to avoid gaps in the plot lines. => #459 (comment)

from bsb-lan.

fredlcore commented on June 9, 2024

As for the gain worth the effort: You said you would take a "78% tax cut". And a 50% cut is suddenly not worth the effort? Where do you draw the line and why?

from bsb-lan.

DE-cr commented on June 9, 2024

Letting ordinary users determine an abstract amount of kB is counter-intuitive and will just increase support work for us.

I'd expect even more confusion when bsb-lan promises to deliver three days worth of data, but delivers either more or less than that, due to changes in the log settings during the last three days. (This of course could easily be fixed by SELECTing by date,)

For me, it's usually the gain/effort ratio that guides me, not gain alone, but as you've said: YOU decide what is (not) done.

from bsb-lan.

fredlcore commented on June 9, 2024

Again, above you said that it doesn't matter if you get 2.9 days instead of 2 days - and now this is suddenly creating too much confusion? How would users be able to make some kind of distinction based on an abstract amount of kilobyte?
Anyways, I think all relevant arguments have been exchanged, I'll seen what and when I can implement.

EDIT: Plus, the JavaScript will be able to find out if the requested date range is covered by the calculated amount. If it is not enough, it can load more. This seems to be far more efficient than always loading at least X kB and then figure out that it is way too much (if someone just logs two parameters per day as you do).

from bsb-lan.

DE-cr commented on June 9, 2024

If the promise is n days, I find delivering more or less confusing.
If the promise is n KB of csv data, I'll gladly accept a few bytes less.
How to gauge how many KB are needed to get n days? Try and adapt. I would expect no more than one adapt cycle in most cases, and at least with frequent use, most people should be able to memorize the value that suits their needs best.

From decades of dealing with user requirements, I've learned that clients often change their mind, or realize that they guessed wrong, e.g. that they need seven days of data in /DG instead of three. My proposal is to (per default) deliver as much as the system can handle without problems, and let the user filter out unwanted data (which frequently proves to be more interesting than the client's initial order), or ask for more, if so desired.

As for my currently active daily-averages-logging use case: With that I always want to see as much data as available.

from bsb-lan.

fredlcore commented on June 9, 2024

As I described above, the amount of days isn't even fixed. It's the default setting (which users can adjust) and if the calculated fetch gives less thn expected data, a subsequent fetch can be done in JavaScript, so the user won't even notice it. This will obviously produce less load on the server than downloading a fixed amoiunt of kB each time no matter how much or little data is actually necessray.

from bsb-lan.

DE-cr commented on June 9, 2024

My solution approach is in https://github.com/DE-cr/BSB-LAN (which means it is now also part of #499):

Please note that I've decided to also adapt the highlighted link on the fly, to reflect the data being displayed here, and more importantly to stop the user from accidentally calling /D on a huge datalog.txt file.

Changing the datalog.txt format to binary encoding, e.g., would nicely complement this in (hopefully) reducing SD card read times.

from bsb-lan.

fredlcore commented on June 9, 2024

As I said, working with kB is not an option for me. It would really save both our time (especially yours) if you ask me first before making changes that you expect to be included in the master repo.
Furthermore, please make separate PRs for separate functions. I'm most likely going to reject #499 as it is now because I'm most likely not going to accept the original SD card aspect of it.

from bsb-lan.

DE-cr commented on June 9, 2024

The harder part was the /DG modification.

Once my easy implementation of /Dn has been changed from limiting KB to limiting days (?), adapting /DG is trivial. In the meantime, my implementation might prove useful to people such as @CZvacko - or maybe not.

I have yet to figure out how to separate code changes into different PRs, sorry.

Btw, I'd have no problem removing "the original SD card aspect" from my PR/repo, as I always completely disable using the SD card to calculate 24h averages in my installation, anyway. For me, the gain of not losing averages in case of device resets is not worth the massively increased spiff wear.

from bsb-lan.

DE-cr commented on June 9, 2024

Just fyi, @fredlcore: To serve the use case of having /Dn deliver the most recent n days worth of data: In case days here means calendar dates and not 24h blocks, I'd approach this by writing an index file alongside datalog.txt, with fixed length entries marking at what position in datalog.txt a new calendar day starts.

from bsb-lan.

DE-cr commented on June 9, 2024

With UNIX time we have the problem that it is still 32 bits on the ESP32, at least at the moment. Making it workable until the year 9999

...if you expect bsb-lan to live beyond 2038-01-19T03:14:08 UTC. Doing an unsigned variant of the unix epoch would add 68 years, which should be plenty. Using 2000 (or even 2023) as a base point instead of 1970 would add even more leeway.

a 'safe' number of bytes to display floating point numbers up to a certain degree. Maybe 10 bytes could be considered safe if option values will be decoded on the fly.

What kind of floats are to be expected? I can only remember seeing values with (one or?) two digits after the decimal point. If that's all there is, one could use ints, shifted two decimal places (*=100).

Another thing to consider is to keep track of the most recent data row in the rolling buffer. While a variable can keep track of this during runtime, a fast method to find this row during startup is necessary that doesn't block the startup process for too long, maybe with a binary tree search using above-mentioned key field.

...or log that index with the data, either in a separate file or even in datalog.bin e.g., as the very first entry that gets overwritten with each logging?

from bsb-lan.

fredlcore commented on June 9, 2024

Logging the index in a (same) file is a good idea, thanks!

from bsb-lan.

DE-cr commented on June 9, 2024

a "key" field which we need in order to figure out which is the newest/oldest data row as reference point

Would that still be necessary when saving the index to the most recent data row?

As for the coherence of data points, this could also be done using the unique row ID if the row ID changes only when a new log interval is logged.

So far, /DG has been using uptime/ms from the datalog entries to figure out which data points belong to the same log event. In case something similar to unix time should be stored as a timestamp in a future datalog.bin, an integer representation of that value (number of seconds since begin of the epoch) could possibly be transmitted instead of the current ms field, to guide /DG in its decisions. To other users of /D, this should look similar enough to the current state (same number of fields in each text line) to not cause any incompatibilities - unless, of course, someone has been using the current ms field to really look at the uptime.

from bsb-lan.

DE-cr commented on June 9, 2024

Just fyi, @fredlcore: To serve the use case of having /Dn deliver the most recent n days worth of data: In case days here means calendar dates and not 24h blocks, I'd approach this by writing an index file alongside datalog.txt, with fixed length entries marking at what position in datalog.txt a new calendar day starts.

In case you're interested: I've created a version that does just that - because I had the time for another programming exercise.

Still no binary datalog.bin (shouldn't make a difference to /Dn) or rolling datalog (would need adapting), though, and only limited testing so far (I have no Arduino, e.g.).

from bsb-lan.

DE-cr commented on June 9, 2024

I've uploaded my new version to https://github.com/DE-cr/BSB-LAN

For the standard use case of plotting the most recent days' worth of log data, it improves the speed in working with those plots and reduces the load on the bsb-lan unit - quite significantly for huge data logs.

This improvement is achieved by both offering /Dn as a new url command, with n now denoting the number of most recent calendar days required, and interactive filters for the plots themselves.

Functional testing has been done on my esp32 system. For arduino, I can only confirm that the changed code compiles.

To address other use cases, further improvements could be added, e.g.:

reduction of time required to read long data logs from an SD card, which might be achievable by changing the file format from "csv" to binary e.g. - as discussed above
providing a new user interface to let the user pick certain dates from the data log for display (/D) before reading the data from file and transferring it to a client via http

from bsb-lan.

DE-cr commented on June 9, 2024

@CZvacko you might want to take a look at #542, which imo should address this feature request here nicely. Please note, though, that the date selection (and with that, looking at the data in /DG) will only be possible for data logged after applying my PR (however, a data log started before will still be accessible in full via /D).

@fredlcore the reasons, why I haven't tackled the conversion to a binary format for the log file (yet?) are:

For common use cases (looking at a few days' data only), I don't really see a need for a (possible) speed increase in reading data from the SD card. (Is SD card reading on bsb-lan hardware really the limiting factor here, not (W)LAN transmission (or /DG rendering!)?)
Even with reducing the transfer time by a factor of 3-4, using /D on a data log measuring several MB or even GB would still be painful. For such a use case, my personal recommendation would be to either periodically transfer the most recent days' worth only (/Dn in my PR) and combine those data logs on the computer used to evaluate them, or transfer the SD card to a computer which can read it much faster.
With a binary data log file, interpreting the file from the SD card on a system other than bsb-lan would be much more difficult (probably impossible for most users?).
You've said you want to try your hands at implementing a binary datalog file format yourself. ;)
As for a rolling data log: The only use case I see for that is with limited file storage space, e.g. on naked ESP32 systems w/o an SD card. However, even there some user will prefer to have the logging stopped when storage space runs out, instead of overwriting older data. (What other use cases do you have in mind?)

I've reasonably tested my PR, but only on the one system I have available: a naked ESP32. Therefore I would appreciate others testing its functionality on different bsb-lan hardware, most notably an Arduino and/or using an SD card.

from bsb-lan.

fredlcore commented on June 9, 2024

Yes, the SD card is the limiting factor here. Transferring data via /D is around 150kb/s, that's just the plain data file read from SD and sent out via LAN without any browser rendering. That's way below the data rates that are usually possible with LAN transfers and an ESP32. Reducing this time by factor of 3-4 would already make quite a difference.
A common use case is to activate logging and not worry about it until you notice that something is wrong. Then you want to go back some days or weeks to identify since when the problem occurs. If in the meantime logging has stopped because the SD card is full, then you are out of luck. That's why a rolling data approach is necessary where the user defines how many days/months he wants to be able to go back to.
Users can still read the SD card data. I'll write a small Perl script to do the decoding. But there is hardly any scenario why a user wouldn't just use /D and simply download the CSV data without the nuisance of going to the BSB-LAN installation (which might not even be where you are), remove the SD card, read the data on the PC and reinsert it back and in the meantime lose the data that would have been recorded inbetween.
Yes, I said I'll take care of the binary rolling data when I have the time for it. But I won't stop you if you want to do it. If you want to do it, we should discuss the details first.

from bsb-lan.

DE-cr commented on June 9, 2024

Thanks for the clarification/confirmation, @fredlcore!

Seems like I have currently more time than you to tackle this, so let's discuss the details:

There's already some ideas regarding the binary file format in this discussion here, so I'd start with that. When I see room for further improvements, I'll discuss it here (?) with you.

In general, I'm in favor of delivering added value to the customers asap, therefore I suggest to implement and release this in increments:

a..b date range selection in /DG (as already implemented in #542), to address the main reason for this feature request here
binary datalog file format, to (hopefully) speed up content delivery (and further help this feature request here)
Perl script to let the users decode datalog.bin files off-line.
rolling datalog, to serve the "keep datalog running forever" use case you've described

I would very much prefer to keep the "rolling" aspect for the end (and in a separate PR), for the following reasons:

Rolling buffers are reasonably easy to implement with a fixed buffer size. A flexible buffer size ("SD card is full") is harder, and also requires more thought regarding the implications (at least that's what I expect; I haven't given it much thought, yet). Artificially limiting the buffer to way less than (a single file on) the SD card could hold is something I'd personally try to avoid, to provide the users with as much data as possible, in case they need/want it.
I'd consider this a new feature in bsb-lan, anyway, as it currently doesn't address this use case at all.

What do you think?

from bsb-lan.

fredlcore commented on June 9, 2024

Yes, we can discuss it here because the mentioned requirements are a prerequisite to implement long-term data logging as mentioned in the subject of this issue.
I'm not in favor of releasing new functionality in stages because it means adjusting the manual in several steps and in this case the change from a non-rolling to a rolling log also means a change in functionality that may confuse users.
There is no need for a flexible buffer. The user will be able to state the amount of days he wants to keep the logs in the settings. From that, we can calculate the maximum number of entries before the buffer rolls over.
I would set the maximum file size at two months worth of data polling 10 parameters every 30 seconds. That's the maximum the bus can more or less tolerate while still leaving room for intra-heater-communication.
In the probably rare case that people want more than that, they'd have to download the file every two months.
As mentioned above, I would make the date workable until the year 9999 which would would cost us 38 bits for YMD-HMS. 15 bits for the parameter number, so we and up with 53 bits plus the value and a "key" field to identify the most recent entry. But as you mentioned above, we could just store that value in the first few bytes of the file and update it with every new log entry, so we won't need a "key" field because we can caluclate the byte position due to the same length of every row.
The main issue remains the value field. Most, but not all use-cases will be centered around drawing a graph. Clear-text options (for example as with burner status) will have to decode the option value to clear text before sending it to the browser/user. There are also values that are sent by the heater as strings instead of integers, such as parameter 3110 ("Abgegebene Wärme"). This can be quite a long number/string. So I guess it is easier to store the retreived value "as is" instead of converting strings, floats etc. into some kind of common integer/binary format and maybe set aside 10 bytes in total for it. For clear text option values we would then have to decide whether we use a magic byte at the beginning of the value field that would indicate that only the option value is stored and the clear text will have to bee looked up during output on-the-fly or if we cut the option value after 10 bytes. This I would decide based on performance impact and if this is negligible prefer the on-the-fly output with full text.
Unit and parameter description can be added to the output on the fly.

So it looks to me that we could do at best with 53 bits plus 10 bytes, i.e. 133 bits, i.e. 17 bytes in total per row. Sounds good to me :).

from bsb-lan.

DE-cr commented on June 9, 2024

I'm not in favor of releasing new functionality in stages because it means adjusting the manual in several steps

Changing the manual should be easier than changing the code. :)

in this case the change from a non-rolling to a rolling log also means a change in functionality that may confuse users.

Which could be taken as an argument against a rolling log. ;)

from bsb-lan.

fredlcore commented on June 9, 2024

No it doesn't. We have had several breaking changes, that's fine, but that's no reason to do it twice. It's really not such an important feature that users cannot wait a few weeks more.

from bsb-lan.

fredlcore commented on June 9, 2024

As for manual vs. code: I don't know if you have worked in first-level user support, but people don't tend to re-read the manual every time there is an update. They rather e-mail us directly. And that's something I want to prevent as much as possible.

from bsb-lan.

DE-cr commented on June 9, 2024

I would set the maximum file size at two months worth of data polling 10 parameters every 30 seconds.

For ESP32 systems w/o an SD card, that limit would have to be different.

from bsb-lan.

DE-cr commented on June 9, 2024

I would make the date workable until the year 9999 which would would cost us 38 bits for YMD-HMS

Do Arduino and ESP32 have integer data types exceeding 32 bits, to calculate with ("long long" or something like that)?

from bsb-lan.

DE-cr commented on June 9, 2024

I would make the date workable until the year 9999 which would would cost us 38 bits for YMD-HMS

Do Arduino and ESP32 have integer data types exceeding 32 bits, to calculate with ("long long" or something like that)?

In case bit fields are used anyway, I'd propose to directly encode Y/M/D/h/m/s into separate variables. This would result in a much easier, less error prone, and more run-time efficient encoding/decoding algorithm:

Y:14 + M:4 + D:5 + h:5 + m:6 + s:6 = 40 bits,
i.e. a mere 2 bits more (or Y:13, i.e. just one extra bit, should you consider 8191 enough for YearMax; however, I would prefer using an even five bytes for YMDhms).

...and with that, my question regarding extra long ints becomes irrelevant.

from bsb-lan.

CZvacko commented on June 9, 2024

@DE-cr

you might want to take a look

I just flashed your version.
I had a little trouble compiling it because your BSB_LAN_defs.h is missing "const char ENUM701[]", so I copied the related lines from Master...
For a short moment it even worked (was able to specify date ranges) with data recorded before applying your PR.
Then it started to go wrong, so I ran the D0 command, now I need to collect some data to try your solution.

from bsb-lan.

CZvacko commented on June 9, 2024

people don't tend to re-read the manual every time there is an update

The manual is 206 pages long, if there will be any way how to easily compare the two pdf files, then user can only read the difference. I do this with BSB_LAN_config.h.default to see the differences against my own BSB_LAN_config.h.

from bsb-lan.

DE-cr commented on June 9, 2024

@CZvacko thank you VERY much for doing some additional tests on my mod!
I'm sorry for the problems it caused you.
As my master branch currently lags a couple of commits behind the official version's (github won't let me merge without losing my changes?), only merging my PR would have avoided the compile issue.
As for the other problem you've encountered: You should have been able to see only log data from after flashing the modified software in /DG. (Older log entries would have been available via /D, though.) Maybe what you experienced was caused by browser caching(?).
By now, you should be able to see the new data in /DG, and after midnight (according to your boiler's clock), you should be able to try the new /DG functionality.
Again, thanks a lot for the testing, and sorry for the inconvenience!

p.s. Do use averages calculation in bsb-lan? My version (master branch, not the PR) disables saving/restoring them across resets!

from bsb-lan.

1coderookie commented on June 9, 2024

people don't tend to re-read the manual every time there is an update

The manual is 206 pages long, if there will be any way how to easily compare the two pdf files, then user can only read the difference. I do this with BSB_LAN_config.h.default to see the differences against my own BSB_LAN_config.h.

This is one of the reasons people should use the online version of the manual instead of the PDF, which is just in the repo as a backup in case one doesn't have internet access and needs the manual.
https://1coderookie.github.io/BSB-LPB-LAN_EN/toc.html

from bsb-lan.

CZvacko commented on June 9, 2024

@DE-cr No big inconveniences were caused to me, I appreciate the efforts made for this task.
I don't use the calculation of averages because as I know, they are only calculated for the previous day, so it's not very useful. For me, it can only be useful if daily/monthly result can be saved to a sd card. Then I can make a comparison, for example last January with this January, what the weather was like and how it affected the gas consumption.
Or I can currently save the datalog to a file and do the average calculation in Excel.

from bsb-lan.

CZvacko commented on June 9, 2024

people should use the online version

But it's the same problem, only the document format is different, because the user still needs to re-read everything. Because he can't display the difference from the selected version. There should be some feature like reading commit details in git (should be more user friendly). But I realise that even the big reputable projects don't have something like this.

P.S. You may consider to follow ISO document template, where there is usually a "List of Changes" at the beginning of the document. If you write a list of changed chapters for each major version, the user can easily see what they need to pay attention to.

from bsb-lan.

fredlcore commented on June 9, 2024

In case bit fields are used anyway, I'd propose to directly encode Y/M/D/h/m/s into separate variables. This would result in a much easier, less error prone, and more run-time efficient encoding/decoding algorithm:
Y:14 + M:4 + D:5 + h:5 + m:6 + s:6 = 40 bits, i.e. a mere 2 bits more (or Y:13, i.e. just one extra bit, should you consider 8191 enough for YearMax; however, I would prefer using an even five bytes for YMDhms).

I'd say we use 7 bits for year and add 2000 to it when converting on the fly. This will give us an even six bytes together with the 15 bits for the parameter number which then can easily be dissected using ANDs on this "long long" number. Or you group them as Year(7)-Day(5)-hour(5)-Parameter(15) and as Month(4)-minute(6)-second(6) and get a 32bit and a 16 bit value, in case long long does not perform well.

from bsb-lan.

DE-cr commented on June 9, 2024

Off-topic here, but possibly helpful info to @CZvacko:

I don't use the calculation of averages because as I know, they are only calculated for the previous day, so it's not very useful. For me, it can only be useful if daily/monthly result can be saved to a sd card.

bsb-lan does floating/rolling averages for the most recent 24h (or less, when the calculations haven't been running that long, yet). You could for example set the log interval at 24x60x60 to log only once a day for your use case - which is what I did just recently.

from bsb-lan.

DE-cr commented on June 9, 2024

which then can easily be dissected using ANDs

With bit fields, there'd be no need to and/or/shift bits in the C code. The compiler will take care of that "behind the scenes".
As I said: easier algorithm, less error prone.

from bsb-lan.

DE-cr commented on June 9, 2024

A few more questions:

Can we drop the ms field? If so, /DG will have small gaps in the plots. An easy way to fix that would be to use the timestamp from the first parameter of a parameter set / log event for all parameters of that log event, and not determine the exact date/time for each parameter in a set individually. Would that be okay?
I don't know yet how the exact decoding of values from the bsb works. Could the data received be directly used for the binary log? Would that still require ten (or more) bytes for the value field?
Would 15 bits for parameter numbers include the "fractional part" introduced in 2022, e.g. 4711.1?

from bsb-lan.

DE-cr commented on June 9, 2024

...and one more question: Do you want to stick to the DD.MM.YYYY format in /D or could that be changed to a more internationally accepted and computationally easier (string comparison to evaluate date ordering) YYYY-MM-DD?

from bsb-lan.

DE-cr commented on June 9, 2024

Yes, the SD card is the limiting factor here. Transferring data via /D is around 150kb/s, that's just the plain data file read from SD and sent out via LAN without any browser rendering. That's way below the data rates that are usually possible with LAN transfers and an ESP32. Reducing this time by factor of 3-4 would already make quite a difference.

Just fyi: My datalog.txt is currently at 846322 bytes in 11365 lines, and the transfer from the esp32's eeprom(?) via wifi takes about 5.5s. That's about 150 kB/s transfer speed, and an average line length of about 74 bytes, confirming the numbers discussed so far.

from bsb-lan.

DE-cr commented on June 9, 2024

So I guess it is easier to store the retreived value "as is" instead of converting strings, floats etc. into some kind of common integer/binary format and maybe set aside 10 bytes in total for it. For clear text option values we would then have to decide whether we use a magic byte at the beginning of the value field that would indicate that only the option value is stored and the clear text will have to bee looked up during output on-the-fly or if we cut the option value after 10 bytes.

I guess that already answers my question 2 above, sorry.

Please let's still use the option value for clear text parameters in /D. Numbers can be plotted in /DG, to let the user more easily see when those parameters have changed. I've been doing this frequently, and using the url command /En to decipher the option values plotted.

from bsb-lan.

fredlcore commented on June 9, 2024

Bitfieds are of course fine, my main point was to fit the necessary data into one byte less.
You are right, 15 bits won't cover the decimal point parameters. So we have to multiply by ten, i.e. three bits more if we want to cover up to 26200.X
Data payload in the BSB data telegrams varies each time and can be up to 32(?) bytes (for storing phone numbers as strings). I thought about that, too, because we have the raw data by the time we log, but I'm not sure if decoding it again when we send the data to the browser will have a performance impact. OTOH, we have to do lookups anyways for the data unit and parameter clear text, so this might be feasible. However, we still would have to set aside 6-7 bytes for string-encoded parameters such as 3100 mentioned above.
As for timestamp: We have the time encoded down to the second already, so there is no need for me to add another timestamp. If really necessary, we could also convert the stored time into epoch or whatever.
As for time format: Yes, because people might have to do changes on their side if we change the format. Plus, there are hardly any users outside continental Europe, so most of them are familiar with DD.MM.YYYY

Thanks for confirming the speed transfer. I just did a comparison with /JK=58 (which is "Diagnose Verbraucher" here) and is about 36kB, it also comes down to 184kB/s. Since the complete transfer is just 0.2 seconds, I'm not sure if the boilerplate of the HTTP header is the slow-down here, but we should definitively check first if the lookups that /JK does (and would be similar to the ones we discuss here) are not slowing down the transfer in the same way that the SD card transfer does.

from bsb-lan.

DE-cr commented on June 9, 2024

I would set the maximum file size at two months worth of data polling 10 parameters every 30 seconds.

With 16 bytes per data log entry, that would make for 16 x 10 x 30.5 x 24 x 60 x 2 = 14 MB, which is MUCH less than what could fit into a single file (2 GB, equalling almost 24 years at these settings?) on SD cards that would be bigger still. Personally, I'd prefer to better utilize the storage capacity available.

from bsb-lan.

fredlcore commented on June 9, 2024

But who needs 24 years of storage? And if people really want more, we can make it a #define and users can divert from the recommended preset.

from bsb-lan.

DE-cr commented on June 9, 2024

As for timestamp: We have the time encoded down to the second already, so there is no need for me to add another timestamp. If really necessary, we could also convert the stored time into epoch or whatever.

As I've said: conversions between calendar dates and epoch are not as straightforward as one would like. I'd try to avoid them, if possible.

As for time format: Yes, because people might have to do changes on their side if we change the format. Plus, there are hardly any users outside continental Europe, so most of them are familiar with DD.MM.YYYY

With the first part of this argument, we should also keep the ms field - and should also deliver the same value field entries as now.

from bsb-lan.

DE-cr commented on June 9, 2024

But who needs 24 years of storage?

Personally, as a user I'd prefer 24 years of storage to two months' worth in a rolling log.

...and as a programmer, I'd love to avoid the added complexity of a rolling buffer, and of having to search for data points in files (or keeping a rolling index as well). Especially since I don't see a need for a rolling buffer when you can store 24 years of data w/o one. (For people like myself, who use an esp32 w/o an SD card, I'd put into the manual: go get an SD card for use cases involving lots of data logging! :)

from bsb-lan.

DE-cr commented on June 9, 2024

Bitfieds are of course fine, my main point was to fit the necessary data into one byte less.

Bit fields are always (?) stored in multiples of four bytes, as I've been able to confirm with my esp32.

Using seven bits for the year (minus 2000), a date+time bit field would measure 33 bits, leading to eight bytes for the bit field. Of course these eight bytes could then be used to also include 15+3 bits for the parameter number.
...and there'd be room for another couple of bits for the year, if required. I'd use them to avoid the +/- 2000 calculations.

As an alternative, a date+time bit field with only six bits for the year (minus 2000 or even 2023, meaning "only" years up to at most 2086 could be encoded) could be used. Then that bit field would fit into four bytes. The parameter could be stored as a short (parameter number) plus a byte (fractional part) next to the bit field. This version would require then seven bytes, i.e. one byte less.

These are the only two options that seem to make sense to me.
The first version would require slightly less code (writing/reading one variable instead of three, no +/- 2000 calculations for the year). It would be my preference.

from bsb-lan.

DE-cr commented on June 9, 2024

If reducing the datalog file size as much as (reasonably) possible is the design goal, I would still expect considerable gains (on average) with using '\0' zero terminated value field entries instead of fixed length fields for them.

Pros:

substantial size reduction (on average)
could also encode longer strings - and keep the /D output as it currently is, i.e. no incompatibilities with any possibly existing post processing by the users
easier encoding/decoding (compared to using a magic number to alternate between different encoding schemes)

Cons:

much harder to find a particular data set in the log (especially w/o an extra byte (sequence?) to clearly mark record boundaries, as 0x00 might be part of the date/time/paramNo fields)
slightly harder to decode from the log file (read until '\0', instead of read n bytes)
rolling buffer (if used) slightly more complicated to implement

The first con could be addressed by keeping an index for important points in the log e.g. date changes, and not supporting directly addressing individual data points in between - which I've already implemented in my open PR.

from bsb-lan.

DE-cr commented on June 9, 2024

Some test results from my system (esp32 w/o SD card, wifi connection):

regular /D: reading datalog.txt, http transmission in 1024 bytes blocks -> about 6 s
reading datalog.txt, http transmission in 20 bytes blocks ("unbuffered") -> more than 100 s!
not reading datalog.txt, just transmitting an equivalent number of bytes in 1024 bytes blocks via http -> about 4 s
not reading datalog.txt, http transmission in 10 KB blocks -> about 5 s
not reading datalog.txt, http transmission in 2 KB blocks -> about 3 s
not reading datalog.txt, http transmission in 512 bytes blocks -> about 5 s
not reading datalog.txt, http transmission in 4 KB blocks -> about 2.5 s
not reading datalog.txt, http transmission in 8 KB blocks -> about 2.5 s

Seems as if datalog.txt reading on my system is not limiting /D delivery speed as much as the actual http transmission is, and that using 1 KB for buffered http transmission has been a reasonable choice. This also means to me that, for system setups like mine, conversion to a binary datalog file format may not be worth the effort. It also means that for systems like mine, increasing the http transmission buffer size to 4 KB might be worth considering.

Could somebody with a bsb-lan unit that uses an SD card for logging and is connected to the network by wire please do similar timings for their system?
P.s. I'd be interested in seeing those values for scenarios 1, 3 and 7 above.

P.s. datalog.txt size was about 900 KB.

from bsb-lan.

DE-cr commented on June 9, 2024

If the ms field in the data log entries can be dropped: That would be an easy >10% reduction for the datalog.txt (current plaintext format) file size and transmission time. If that's feasible, I'd be happy to provide a PR. :)

P.s. For me, the only value of the ms field would be to see device resets (and to help avoid gaps in /DG plots, but that could be fixed, as suggested above).

from bsb-lan.

DE-cr commented on June 9, 2024

Applied to my current datalog.txt (containing an overall ~13k lines for a set of four temperature parameters), the file size could be reduced as follows:

to ~24% with ten bytes fixed for the value encoding
to ~17% with zero terminated value encoding
("12.3\0" = five bytes instead of ten, or even four for our current single digit outside temperatures)
p.s. to ~88% by just dropping the ms field and leaving everything else as is

(http transmission size would of course remain at 100%)
p.s. No, it wouldn't: Since we've never seriously considered keeping the ms field with a binary format (or did we?), I had already dropped that field in my binary file calculations, and http transmission size would be reduced to ~88% in all of the cases mentioned above.

from bsb-lan.

fredlcore commented on June 9, 2024

Thanks for the transfer time testings. I don't know why I remember to have much faster transfer rates, but even with a larger buffer size, that's just about 360kB/s, i.e. just about 3MBit/s. So let's think first before we move forward. If (apparently) the SD card file read times are faster than the http transmission, then there is no point to reduce the file size on the SD card if we will be sending CSV-style data to the browser anyways. Then it's probably more convenient to keep the data on SD card in a readable format.

So what could be done is to reduce the data transferred to the browser. As for zero-terminated strings, it wouldn't make a difference compared to the currently used semicolon-separated delimiter.

I don't understand how zero-terminated encoding would save 17% or dropping the ms field 88%. Can you elaborate further?

As far as I see it now, dropping the ms field and maybe encoding date/time a bit differently could save a few bytes, but nothing with a major impact :(.

from bsb-lan.

DE-cr commented on June 9, 2024

I don't understand how zero-terminated encoding would save 17% or dropping the ms field 88%. Can you elaborate further?

Sorry, I've meant to x%, i.e. leaving x% of the original size remaining, e.g.
35631879;07.03.2023 07:52:27;8770;Raumtemperatur-Istwert 2;20.2;°C\r\n = 69 bytes = 100 %
=>
07.03.2023 07:52:27;8770;Raumtemperatur-Istwert 2;20.2;°C\r\n = 60 bytes = 87 %
=>
20230307075227;8770;Raumtemperatur-Istwert 2;20.2;°C\r\n (still legible enough?) = 55 bytes = 80 %
=> (logfile only, not for client consumation)
BinaryDateTime:5 + BinaryParamNo:3 + "20.2\0" = 13 bytes = 19 %

As far as I see it now, dropping the ms field and maybe encoding date/time a bit differently could save a few bytes, but nothing with a major impact :(.

One could try to be clever and e.g. use the following in data transmitted to the client: send parameter name and unit only the first time for each parameter number, requiring the client to keep a parameter dictionary for the full text. It should be possible to expand that format via javascript code on the client side, transparent to the user, with reasonable effort - at least for /DG, which already uses javascript in html, instead of plain text, as /D does. However, would that really be worth the effort?

On the other hand: I do consider my open PR to have a major positive impact for many (or most?) /DG users. ;)

from bsb-lan.

fredlcore commented on June 9, 2024

Yes, I think under the current circumstances, your open PR might just be a good compromise. Let me sleep over it (a few days ;) )...

from bsb-lan.

DE-cr commented on June 9, 2024

Let me sleep over it (a few days ;) )...

Maybe you could leave a datalog running while you do that, with my PR code enabled? That way you could see how it feels (shouldn't be much of a difference until you pass midnight for the first time). And maybe you could even come up with a better (preferably non-verbal?) "busy" indicator.

from bsb-lan.

DE-cr commented on June 9, 2024

More timing tests from my system (esp32 w/o SD card, wifi connection, currently 964 KB datalog.txt):

/D -> 6.3 s (153 KB/s)
/D w/o sending the data over http (only the "table" heading is sent to the client) -> 2.9 s (332 KB/s)

Both versions use the standard 1 KB buffer for file reading and http transfer (where done).

from bsb-lan.

DE-cr commented on June 9, 2024

If my PR gets merged:

Should the changes be mentioned in the change log comment in BSB_LAN.ino? If so, should I include that in the PR, @fredlcore?
Shall I provide you a draft for the description of the changes in the manual (new URL commands, how to interpret/operate the new UI controls in /DG), @1coderookie?

from bsb-lan.

[FEATURE REQUEST] Modifying the graph to allow the long-term data logging about bsb-lan HOT 166 CLOSED

Comments (166)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent