openwisp / openwrt-openwisp-monitoring Goto Github PK
View Code? Open in Web Editor NEWOpenWRT monitoring agent for openwisp-monitoring
Home Page: https://openwisp.org
License: GNU General Public License v3.0
OpenWRT monitoring agent for openwisp-monitoring
Home Page: https://openwisp.org
License: GNU General Public License v3.0
Merge the two scripts netjson-monitoring.lua
and netjson-monitoring-wan.lua
Hi,
openwisp-config has been working fine on OpenWrt-21.02.
Config:
netjson-monitoring=y
openwisp-config-openssl=y
openwisp-monitoring=y
Error:
Collected errors:
* pkg_hash_check_unresolved: cannot find dependency openwisp-config for openwisp-monitoring
* pkg_hash_fetch_best_installation_candidate: Packages for openwisp-monitoring found, but incompatible with the architectures configured
* opkg_install_cmd: Cannot install package openwisp-monitoring.
make[2]: *** [package/Makefile:69: package/install] Error 255
I know this is annoying, but in an embedded system every byte counts.
Because every byte must also be interpreted.
Therefore, tabs should always be used by shell scripts.
https://github.com/openwisp/openwrt-openwisp-monitoring/blob/gsoc21/openwrt-openwisp-monitoring/files/monitoring.agent
https://github.com/openwisp/openwrt-openwisp-monitoring/blob/gsoc21/openwrt-openwisp-monitoring/files/monitoring.init
This is also the way it is done throughout OpenWrt.
From my point of view, this is a script that only
is used by openwisp-monitoring.
To avoid polluting the system path, it is usual to move such scripts to /usr/libexec.
I would create a separate subdirectory named openwisp in this location and would drop this file there.
See #59 (comment)
lua: /usr/sbin/netjson-monitoring:67: Interrupted system call
stack traceback:
[C]: in function '(for generator)'
/usr/sbin/netjson-monitoring:67: in function 'get_ip_neigh'
/usr/sbin/netjson-monitoring:84: in function 'get_neighbors'
/usr/sbin/netjson-monitoring:181: in main chunk
[C]: ?
I need to test this on more devices before ruling out if it's a system issue or something that has to be dealt with in the code.
The current method we use to get clients associated to a WiFi interface only works for access point mode, it doesn't work for mesh
(802.11s), an most probably also not for adhoc mode.
I verified that ubus call iwinfo assoclist '{"device": "<interface_name>"}'
works also in 802.11s, so we can switch to this instead.
Reindent lua code in a consistent way (4 spaces).
In OpenWRT it's possible to define bridge members that do not exist, they'll be added only if they exist and be ignored otherwise.
This is pretty handy in OpenWISP because it's possible to define different bridge members which get added only if the right template is used.
@devkapilbansal can you change the code so that only existing bridge members are collected in the NetJSON monitoring output?
EG:
{
"name": "br-wifi",
"bridge_members": [
"wlan0",
"wlan1",
"tap_tcp",
"tap_tcp_nocryp",
"tap_udp",
"tap_udp_nocryp"
]
}
In the case above, these members do not exist and therefore should not be added:
As I see it, the scripts do not have to be executable that are under /usr/lib/lua/openwisp
on the system.
Please replace INSTALL_BIN with INSTALL_DATA in the openwrt makefile install routine.
I also noticed that the openwsip-agent also has lua scripts under /usr/lib/lua/openwisp.
Does it make sense to use a separate directory?
Hey,
Just letting you know, we are changing this from device
to ctl_device
upstream in the OpenWrt ModemManager protocol handler because of compatibility issues with DSA. I will link the PR here when it is in.
This change is requested here:-
Can you please refactoring the script that it has more have a look and feel of a C programm with a main and help function?
Have a look at https://github.com/openwrt/packages/blob/master/net/mwan3/files/usr/sbin/mwan3rtmon thats only an example.
From my point of view, this has certain advantages. If you later decide to make a C program out of it, then you already know what you need for functions. My experience as a maintainer has often shown me that.Also add a valid license header
Current dependencies:
We may replace cjson with some other library already available on OpenWRT (asked for a suggestion in the OpenWRT forum).
Make sure that if the package rpcd-mod-iwinfo
is not installed, these two lines won't fail:
https://github.com/openwisp/lua-monitoring/blob/master/netjson-monitoring.lua#L398-L399
Ideally we should check for the presence of the feature before the loop, and skip the loop if the feature is not available.
We should add a brief README which should tell users to install package X.
Since we have already added a functionality to upload packages directly to downloads.openwisp.io, it will be beneficial to also upload checksums for built packages.
This way users will be able to verify the authenticity of downloaded packages.
Mention in Debugging section about verbose mode
Up to now only the modem manager is supported to get information from them wwan modem. Not all system does have a modem manager instance running. There are the following other backends
So far, there is no uniform way to read the information from these different backends. The needed information are not all the same. We would have to agree on the smallest common set here.
I would suggest an ubus interface that provides the necessary information. Then there would be no need to go to the trouble of collecting this information from the different backends system. The ubus interface then takes over the collection of the information. The openwisp-monitoring then only calls the ubus endpoint and returns what is there.
Add runbuild script
Add LuaCov for test coverage
Add an install script (to install openwrt dependencies in container)
This issue depends on openwisp/openwisp-monitoring#30.
In this script:
https://github.com/openwisp/lua-monitoring/blob/master/netjson-monitoring.lua
For each monitored interface, we have to find out the ip addresses.
Once we get the list of IP, for each ip we can check the configuration of that interface to see if any static address is configured, and find out if the IP we got matches the static IP configured, if yes, that's a static address and we'll flag it as such, otherwise it's most probably an address got in DHCP client mode or manually configured, we can further check the configuration to see if the interface is flagged as proto dhcp, if yes we'll flag the address as dhcp, otherwise we'll flag it as static.
If the device is not registered yet, the agent exits:
openwisp-monitoring: OpenWISP monitoring daemon stopping
openwisp-monitoring: uuid or key are not set, please add these to /etc/config/openwisp
This is wrong, because it means users will have to restart the agent manually each time they register a device, which is not what we want.
The agent needs to be smart enough and figure out that if there's a shared secret in the openwisp configuration, it means that openwisp-config has not registered yet but it should not exit like that.
We have to introduce a mechanism which allows the agent to wait for configuration changes and start automatically after the device registers.
I wonder if we can somehow force the openwisp-monitoring agent to automatically restart whenever the config of openwisp-config changes? @feckert do you know if this is possible?
This was a bug that I fixed in the early version of this package, it seems to have popped up again:
Invalid data in "#/interfaces/4/bridge_members", validator says:\n\n{} is not of type 'array'
This happens when there's an empty bridge, the validator expects an empty list but since in lua there's no difference between list and objects, both are represented with tables, so an empty table is converted to JSON as {}
which is not accepted to the validator.
The solution I had used was: fa69556.
After uploading the compiled package to google cloud, we shall remove any latest symbolic link and create a new one pointing to the directory just uploaded.
PS: it looks like GC does not support symbolic links (yuck), we shall then just create a copy of the last compilation called "latest".
We need to write tests for the lua functions.
I hope we can use mocking to simulate the calls to the network programs we use to get the data (ubus, arp, etc).
How to reproduce the bug:
wlan0 wlan1 lan1 lan2 lan3 eth0 wan
, eg:config monitoring 'monitoring'
option interval '300'
option verbose_mode '0'
option required_memory '0.05'
option max_retries '5'
option monitored_interfaces 'wlan0 wlan1 lan1 lan2 lan3 eth0 wan mesh0 mesh1'
Expected outcome: I expected the agent to work normally.
Actual outcome: the agent doesn't start, it gives this error:
Fri Apr 22 20:30:00 2022 daemon.err openwisp-monitoring[23774]: missing required --mode option
Fri Apr 22 20:30:00 2022 daemon.err openwisp-monitoring[23775]: root: No data file found to send.
Fri Apr 22 20:30:05 2022 daemon.err openwisp-monitoring[23802]: missing required --mode option
Fri Apr 22 20:30:10 2022 daemon.err openwisp-monitoring[23803]: missing required --mode option
Fri Apr 22 20:30:15 2022 daemon.err openwisp-monitoring[23804]: missing required --mode option
Fri Apr 22 20:30:20 2022 daemon.err openwisp-monitoring[23834]: missing required --mode option
Fri Apr 22 20:30:26 2022 daemon.err openwisp-monitoring[23843]: missing required --mode option
Fri Apr 22 20:30:26 2022 daemon.info procd: Instance openwisp-monitoring::openwisp-monitoring_collect_data s in a crash loop 6 crashes, 0 seconds since last crash
If the proto address of an interface is wireguard, we can consider its type to be "virtual".
Depends on #29 openwisp/openwisp-monitoring#298
Update:-
This change was originally requested here:
#42 (review)
Logged response can be improved
Here's an example output: Data not sent successfully. Response code is {"detail":"Authentication credentials were not provided."}403.. The message is readable but it's weird. The response body is {"detail":"Authentication credentials were not provided."} while the response status code is 403. Can you turn this into something like:
The response received was: 403 {"detail":"Authentication credentials were not provided."}?
We can reduce the use of memory of more than 70% thanks to gzip.
We should just do the following:
gzip $path
, this command also removes the original filegzip -d $path
While testing the daeamon I get the following:
Mon Jul 5 00:23:06 2021 daemon.info monitoring: Collecting NetJSoN Monitoring data
Mon Jul 5 00:23:06 2021 daemon.err monitoring: Data not sent successfully. Exit code is 127
First of all, can you please change any occurrence of output or docs mentioning "NetJSON" which is not spelled this way and end ensure follows this spelling?
Then, Data not sent successfully. Exit code is 127
is not super helpful in finding out what's wrong.
What steps can we take in order to improve the life of users which will be using this package?
Can we include the code line and command which are failing?
We are now using this template: https://gist.github.com/nemesisdesign/cb005dc80a75536cf8537105c3038a71
Steps:
Also document how to add the template to an already registered device
Monitor addresses of all interfaces by default, but do not include traffic statistics unless explicitly configured.
I also want to make the monitoring information regarding addresses more complete by using:
nixio = require('nixio')
cjson = require('cjson')
print(cjson.encode(nixio.getifaddrs()))
Because ubus call network.interface dump
only gives partial information, although it's very useful as well, so I think I will try to merge the two.
Possible new algorithm:
ubus call network.interface dump
, build a dictionary of all the interfaces and their addressesnixio.getifaddrs()
, update the dictionary created previously and add only new informationubus call network.device status '{"name": "<NAME>"}'
to get additional information and traffic statisticsIn Linux, bridged interface show tx/rx packets in reverse: download is upload and upload is download.
I couldn't find a canonical reference on why this is the way it is, if anyone finds it, let me know.
Now the problem we have is that in the traffic graph, the upload is greater than the download on wifi interfaces that are part of a bridge, which most users will consider as a bug.
So I think we should do the following:
Addendum: I'm trying to do some more research on the subject before proceeding with this: https://forum.openwrt.org/t/inverted-tx-rx-packets-in-bridged-interfaces/85093.
root@E4-C3-2A-90-84-97:/usr/sbin# ./netjson_monitoring '*'
lua: ./netjson_monitoring:226: bad argument #2 to 'tonumber' (integer expected, got string)
stack traceback:
[C]: in function 'tonumber'
./netjson_monitoring:226: in function 'get_cpus'
./netjson_monitoring:260: in main chunk
[C]: ?
Still not sure why this error is occuring, will try to test more until I got some satisfactory results
Mention that pre-build packages can be downloaded and installed from https://downloads.openwisp.io
We need to convert this into an OpenWRT package with compilation instructions in the README.
It could be named openwrt-netjson-monitoring
or openwrt-openwisp-monitoring
.
The logger
shell call is used over and over again.
Which makes the code confusing.
I would suggest that you move the call to a subroutine.
https://github.com/openwisp/openwrt-openwisp-monitoring/blob/master/openwrt-openwisp-monitoring/files/monitoring.init
https://github.com/openwisp/openwrt-openwisp-monitoring/blob/master/openwrt-openwisp-monitoring/files/monitoring.agent
This issue depends on openwisp/openwisp-monitoring#33.
In this script:
https://github.com/openwisp/lua-monitoring/blob/master/netjson-monitoring.lua
Find out where the lease file is stored, (uci command is uci get dhcp.@dnsmasq[0].leasefile
, this has to be translated into the lua way of doing it).
Parse the lease file to produce the output described in openwisp/openwisp-monitoring#33 (see also https://serverfault.com/a/786141/91152 for a good explanation of the contents of the lease file).
This issue depends on openwisp/openwisp-monitoring#34.
In this script:
https://github.com/openwisp/lua-monitoring/blob/master/netjson-monitoring.lua
Execute the arp
command and parse its output to produce the output described in openwisp/openwisp-monitoring#34
I am not sure but there should be some ways to write unittests for monitoring agent or monitoring daemon too.
We can research about the possibilities and try to cover these too
Yes, I remove
-t
flag when logging in verbose mode. In verbose_mode, it is automatically tagging the logged message by the procd service name and if I pass a tag, then the message is logged twiceWhat do you think about adding a new "debug mode" to turn this on?
In verbose mode we could avoid using it. In debug mode we would know (and document) that any output is sent to the logging facility, causing some output to be repeated, Or alternatively, we could even modify our log function to use echo instead of
logger
.
See #57 (comment)
@devkapilbansal is there a way to remove the old builds?
I think we can keep stable releases (which we shall name differently), but we should highly limit the build results because space has a cost, I think we can keep the last 5 builds.
An autolinter to lint the shell script codes should be added to maintain consistency
This change was requested here: ๐
#25 (comment)
In general I would split the netjson into small plugin parts. The problem I see here is that not all have all related information on the system, so we should not check on every run if we have a wlan or a cellular. I do not have for example a cellular on my system. So why should the openwisp-monitoring always try to check the cellular backend.
Format lua files to be consistent with openwisp-config in general and following lua best practices
When collecting device status from OpenWrt, the bridge member list may include members of interfaces which are not active.
Before adding the interface to the bridge members, it would be great to ensure the interface is present and active.
It would be good to create a utility function which does the read, close and returning of the output when using io.popen().
This would also ensure that whoever will put their hands on this code will keep reusing our function and avoid messing it up again by forgetting to close a pipe.
In #34 we are storing data in temporary storage if specific percentage(required_memory) of total memory is available.
But it may be possible that the server is not reachable for a long time and memory is filled with the json data files.
In that case, previous data files should be deleted to store the new one.
I just found out some output from a device like the following:
"mobile": {
"operator_name": "xxxx",
"power_status": "on",
"operator_code": "xxxx",
"model": "xxxx",
"signal": {
"umts": {
"ecio": -5,
"rscp": "--",
"rssi": -69
}
},
"manufacturer": "xxxx",
"imei": "xxxx",
"connection_status": "connected"
},
I guess we should either set null
instead of "--"
or we shouldn't include the attribute at all.
The server returns error because it expects a number. It returns an error also with null
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.