Giter Site home page Giter Site logo

Comments (41)

HomeSpan avatar HomeSpan commented on May 18, 2024

Can you be more specific ---- what libraries are you using, what are they doing, etc. NVS is a core function of ESP32.

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

https://github.com/dgomes/homeGW
what are they doing - listening 433mhz radio sensors
my receiver is RXB6

from homespan.

HomeSpan avatar HomeSpan commented on May 18, 2024

If you look through the issues list on the homeGW site it appears others have already found and encountered the same problem on ESP8266 chips under a variety of conditions. The solution seems to be similar to what you've identified (disable the interrupt), so I'm afraid this is an issue with the way the homeGW is constructed.

Note that many libraries are designed for standalone use, and were developed for a single-core Arduino running nothing but that library. For example, recently another HomeSpan user found that adding a certain library to run a stepper motor within HomeSpan caused the motor to stutter.

HomeSpan needs to run a LOT of things in order to operate. And the ESP32 needs to run even more things in the background, such as the entire WiFi stack. Libraries can be designed to operate seamlessly with HomeSpan provided they are constructed to operate in larger environments and/or use the appropriate hardware accelerations built into the ESP32. In the case of the stepper motor problem, the library was trying to control a stepper motor in real-time using software delays and software-based timing. This obviously can't work in a larger environment because all the resources of the ESP32 can't be dedicated to operating just the stepper motor. Fortunately, the ESP32 has a built-in peripheral (e.g. motor-control PWM) to perform just this tasks in the background, using hardware, and without relying on software-based delays. Other stepper-motor libraries were identified that use this method and should readily operate within HomeSpan.

Processing incoming RF signals falls into this same category. The library you are using unfortunately relies on brute-force reading and interpretation of the signal using a series of interrupts, which are triggered by any unfiltered RF noise, whether or not related to a real signal from the weather station. Using interrupts is usually a good thing, unless you generate an enormous number, in which case you prevent other critical functions from operating correctly. It seems that these interrupts are causing a crash in the NVS code of the ESP32.

There are numerous solutions to this, but they all require significant changes to the RF library and/or new hardware. One possibility would be to reprogram the RF-reading library to take advantage of different ESP32 priority interrupts. It may be possible to lower the priority of the interrupt used by the RF library so that it is only called under certain circumstances in a way that would not interfere with NVS. However, it's not obvious that even with a lower priority the RF interrupts would not interfere with other critical functions (such as WiFi).

The second option would be to use a library designed to take advantage of the ESP32's built in RMT peripheral. I use this peripheral myself to send RF signals for RF appliances I control with HomeSpan (see Extras), though I have not tried using it to read and decode RF signals (which is a much bigger lift). Perhaps there is a library someone has created that does this already?

The third option is use a dedicated RF chip that reads and decodes signals independently, and only interrupts the ESP32 when it receives a fully-formed and valid RF signal. Using such chips requires a dedicated library designed directly for that chip.

Sorry for the long answer, but my guess is that many signal-processing and signal-generating libraries that were developed for the Arduino, were designed for standalone use without much else running. Since the Arduino does not have the power to run other tasks, such a design makes complete sense (and as a long-time user of the Arduino platform, it's exactly what I have done in the past). But these libraries often do not translate well when run as part of a larger environment.

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

What I did
I placed directly gpio_intr_disable(gpio_num_t(22)) as a first directive in all sensitive methods of homespan.cpp and hap.cpp to prevent panic. In main loop where homespan.poll lives i placed gpio_intr_enable(gpio_num_t(22)). So Loop automatically restores interrupt after homespan “save mode”.

This way works great. So I think you can TODO something more reliable in future. For example to create array of such interrupt pins as we set system led and button before calling .begin.

https://github.com/danilkorotkov/Digoo-Homekit

from homespan.

HomeSpan avatar HomeSpan commented on May 18, 2024

I'm glad that worked and you can run the library, though I'm hesitant to generalize this as a function inside the HomeSpan library itself.

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

why? it is easy to define some weak functions void stopPinInterrupts(){nop;} & startPinInterrupts() and use them when critical routines start
user can redefine them in sketch
stm32 framework style)

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

why? it is easy to define some weak functions

void attribute((weak)) criticalF() {}

No?

from homespan.

HomeSpan avatar HomeSpan commented on May 18, 2024

It's a good idea, but I'm struggling with how to generalize this in a robust fashion. The issue you identified is specific to the library you are trying to use. There is nothing wrong with using interrupts - the problem is the way the RF library is using them. Also, there is nothing special about the NVS code --- it's something the RF library is doing that the NVS code does not like. But if you review the issues list for this RF library you'll see others have had OTHER issues, totally unrelated to NVS. You were able to determine that in your specific use case the NVS code is "sensitive" to the RF library, so you wrapped interrupt disables around that portion of the code. But there's nothing about the NVS code that makes it special. Another poorly-written library that uses interrupts could interfere with any part of HomeSpan, or even the WiFi code. There is no way I can know ahead of time where there would be problems.

The only general way of dealing with this would be to simply disable offending interrupts before homeSpan.poll() and re-enable after. But since that's in the main sketch, users can do this in the *.ino file without needing a hook into HomeSpan. I'll given some more thought to this and see if there is way of generalizing, lest before you know it, you've wrapped dozens of subsets of code in an interrupt-protective zone, only some of which are needed depending. And simply turning them all one may have a negative impact on the external library - it must be using interrupts for some reason that is time-sensitive.

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

Problem is not in “ homeSpan.poll()” but in some functions: pairing, WiFi connect & nvs operations. RF lib has a simple isr, even without millis(), only micros(), and serial output is under DEBUG only.

https://gitter.im/espressif/arduino-esp32?at=5d386d298fe53b671dcc1efb

attempting to access flash while an ISR is actively running, this is not supported. You will need to disable

So RF always in ISR because this is its intended purpose.

from homespan.

HomeSpan avatar HomeSpan commented on May 18, 2024

Thanks! That was the information I needed to generalize the issue. Will build something in for next patch release.

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

news?

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

encoder causes reboot if wifi isnt connected

from homespan.

HomeSpan avatar HomeSpan commented on May 18, 2024

what encoder?

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

ky-040 or similar with interrupt polling

from homespan.

HomeSpan avatar HomeSpan commented on May 18, 2024

Can you try running without defining a Status LED pin. One difference between WiFi connected vs. not connected is the HomeSpan status LED blinks without WiFi, and is steady once WiFi is on and the device is paired. The HomeSpan blinking routine uses an alarm interrupt that I have not wrapped with IRAM_ATTR, which, according the Espressif docs, is needed under certain circumstances. I can easily add that, but one way to test if this is even related to the issue is by not using the blinking routine interrupt at all (which is accomplished by not setting the Status LED pin).

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

is there no default Status Led pin?
I tried but all is the same
I think it is better to add smth to all NVS:

HAP_gpio_intr_disable(gpioArray);
nvs_set_blob(HAPClient::srpNVS,"VERIFYDATA",&verifyData,sizeof(verifyData));                              // update data
nvs_commit(HAPClient::srpNVS);
HAP_gpio_intr_enable(gpioArray);  

it works

also with saving Characteristic settings in NVS I get

Guru Meditation Error: Core  1 panic'ed (Cache disabled but cached memory region accessed)
11:14:39.213 -> Core 1 register dump:
11:14:39.213 -> PC      : 0x400e0480  PS      : 0x00060034  A0      : 0x800812cc  A1      : 0x3ffc07f0  
11:14:39.213 -> A2      : 0x00000000  A3      : 0xb0000000  A4      : 0x00000000  A5      : 0x00000000  
11:14:39.248 -> A6      : 0x3ff42000  A7      : 0x700000eb  A8      : 0x80081280  A9      : 0xd0000040  
11:14:39.248 -> A10     : 0x3ffc22c4  A11     : 0x3ffb1c40  A12     : 0x8008ac6d  A13     : 0x3ffb1c40  
11:14:39.248 -> A14     : 0x0000b038  A15     : 0x3ffb1cbc  SAR     : 0x00000016  EXCCAUSE: 0x00000007  
11:14:39.248 -> EXCVADDR: 0x00000000  LBEG    : 0x4008c220  LEND    : 0x4008c23c  LCOUNT  : 0xffffffff  
11:14:39.248 -> Core 1 was running in ISR context:
11:14:39.282 -> EPC1    : 0x40062226  EPC2    : 0x00000000  EPC3    : 0x00000000  EPC4    : 0x400e0480
11:14:39.282 -> 
11:14:39.282 -> ELF file SHA256: 0000000000000000
11:14:39.282 -> 
11:14:39.282 -> Backtrace: 0x400e0480:0x3ffc07f0 0x400812c9:0x3ffc0810 0x40089f6d:0x3ffc0830 0x40062223:0x3ffb1b70 0x40093cb7:0x3ffb1b90 0x40093cf2:0x3ffb1bc0 0x40094029:0x3ffb1bf0 0x40094259:0x3ffb1c10 0x4008a9d5:0x3ffb1c40 0x4008ac6a:0x3ffb1c60 0x4016e949:0x3ffb1cb0 0x4016e9a3:0x3ffb1ce0 0x4016ecc2:0x3ffb1d00 0x4016ddaa:0x3ffb1d80 0x4016e26e:0x3ffb1e00 0x4016d65f:0x3ffb1e70 0x400d2109:0x3ffb1eb0 0x400d2225:0x3ffb1f10 0x400d3d7f:0x3ffb1f40 0x400d4c7c:0x3ffb1f60 0x400e5ce4:0x3ffb1fb0 0x4008ea0a:0x3ffb1fd0
11:14:39.316 -> 
11:14:39.316 -> Rebooting...

Interrupt while NVS causes reboot

from homespan.

HomeSpan avatar HomeSpan commented on May 18, 2024

Can you provide a link to your code?

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

https://github.com/danilkorotkov/recuperator

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

tested with empty isr call

void IRAM_ATTR isrENC() {
  
}

no Core panic
i think enc1.tick() is too big for ISR routines

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

tested with different encoder libs
the same result: if enc is rotated fast -> many interrupts->core panic

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

call setVal() with disabled int and timeout for the fast rotation
no panic

void IRAM_ATTR isrENC() {
  enc1.tickISR();
  inc_dec_time = millis();
}

if (inc_dec != 0 & (millis() - inc_dec_time) > inc_dec_timeout){
..........
        gpio_intr_disable(gpio_num_t(CLK));
        gpio_intr_disable(gpio_num_t(DT));
        gpio_intr_disable(gpio_num_t(SW));
        
        recuperator->RotationSpeed->setVal(tempSpeed);
        recuperator->setSpeed();

        gpio_intr_enable(gpio_num_t(CLK));
        gpio_intr_enable(gpio_num_t(DT));
        gpio_intr_enable(gpio_num_t(SW));
..........

from homespan.

HomeSpan avatar HomeSpan commented on May 18, 2024

I took a look at the enc1.tickISR() code and indeed it seems much too big for a typical interrupt routine. Most interrupt routines do very little except set flags, count events, reload data into registers, etc. If the interrupt gets too complex (both in terms of stack usage and/or time) it can impact the rest of the code. Since NVS is the most sensitive, the code is causing NVS to fail.

To test this hypothesis and check whether pin interrupts are somehow fundamentally incompatible with NVS, I created some test code that implements a simple pin interrupt using core ESP IDF functions. I tried with IRAM and without IRAM interrupts and everything worked as expected regardless of the IRAM settings. However, I realized that you are using attachInterrupt(), which is an Arduino-ESP32 overlay of the ESP IDF calls. To check whether this is causing any issues I changed my test code to instead use attachInterupt(), but everything still works fine.

The test code below generates a 10kHz PWM signal on pin 13, which I feed through a jumper wire into pin 23 (code is running on an ESP32). I enabled a change-state pin interrupt on pin 23, which means the interrupt routine is called 20,000 times per second. The interrupt routine is very brief - it simply counts down from 50,000, sets a flag when it reaches 0, and resets the counter.

The flag is read in the main loop(). When set to true, I use setVal() to toggle the on/off status of a lightbulb. The On Characteristic of the lightbulb is enabled for NVS storage, which means NVS is called every 2.5 seconds while the pin interrupt remains enabled and continues to be checked 20,000 times per second.

This seems to work fine, suggesting that the isrEnc() code is indeed too complex, and as you've done, needs to be disabled before any calls that invoke NVS. There may be other effects as well (on WiFi?) but the NVS call seems to be failing first. To check whether there are any non-NVS issues, you can try changing your Characteristics so that NVS is not called in setVal() and see what happens.

Though I have not reviewed the isrEnc() code in detail, my guess is it could be re-written as a more streamlined interrupt with flags that trigger more complex actions taken in a polling routine outside the actual interrupt code.

#include "HomeSpan.h"         // HomeSpan sketches always begin by including the HomeSpan library
#include "extras/PwmPin.h"

LedPin *led;
#define COUNT   50000

// Here are two variables used by interrupt - note they are set as volatile

volatile int count=COUNT;
volatile boolean flag=false;

// Here is the interrupt routine - logic is minimal.  It simply counts number of 20kHz pulses and sets a flag when count reach zero

void isrHandler2(){
  if(--count)
    return;
  count=COUNT;
  flag=true;  
}

SpanCharacteristic *on;

//////////////////////////////////////

void setup() {
 
  Serial.begin(115200);

  homeSpan.begin(Category::Lighting,"HomeSpan LightBulb");

  new SpanAccessory();
  
    new Service::AccessoryInformation();
      new Characteristic::Name("My Table Lamp");
      new Characteristic::Manufacturer("HomeSpan");
      new Characteristic::SerialNumber("123-ABC");
      new Characteristic::Model("120-Volt Lamp");
      new Characteristic::FirmwareRevision("0.9");
        new Characteristic::Identify();
  
    new Service::HAPProtocolInformation();
      new Characteristic::Version("1.1.0");

    new Service::LightBulb();
    on=new Characteristic::On(0,true);      // save pointer to this Characteristic and set NVS to save

  led=new LedPin(13,50,10000);              // create PWM on pin 13 to generate 10kHz signal (50% duty cycle)
  pinMode(23,INPUT);                        // connect output of PWM on pin 13 to pin 23
  gpio_intr_enable(GPIO_NUM_23);            // enable interrupt of pin 23
       
  attachInterrupt(23,isrHandler2,CHANGE);   // attach interrupt to pin 23; with trigger=CHANGE the interrupt will be called 20,000 times per second

} // end of setup()

//////////////////////////////////////

void loop(){

  homeSpan.poll();         // run HomeSpan!

  //. Here is where we implement the logic triggered by the interrupt.  Whenever flag is set (every 2.5 seconds if COUNT=50000), flip the on/off status of lightbulb
  
  if(flag){
    Serial.println("Count is completed\n");
    flag=false;
    on->setVal(!on->getVal());                    // Note since NVS storage for this characteristic is set to true, the NVS is accessed while interrupts are still ocurring in background (at rate of 20,000 per second)
  }
  
} // end of loop()

//////////////////////////////////////

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

Yes. Simple counter never calls core panic. But it is hard to rewrite libs every time for this esp32 feature

from homespan.

HomeSpan avatar HomeSpan commented on May 18, 2024

Agreed. Now that the problem is fully diagnosed I'll add an option to turn off select pin interrupts prior to NVS calls in next patch release.

from homespan.

HomeSpan avatar HomeSpan commented on May 18, 2024

I've been working through a few different ways of effecting an "ISR protection" per above, but since I can't reproduce the actual failure it's hard to know for sure where within the NVS calls things are failing. Can you test your set-up without disabling the interrupts, but instead comment out the nvs_commit() lines in HomeSpan.h. The commit is what does all the writing and updating to flash. If that's the offending line I will replace nfs_commit() with a "protected" version that disables a user-definable set of GPIO ISRs.

Simply disabling them at the top of the block and re-enabling may work in some instance, but in others it may be that the interrupt was not enabled at that particular time, and re-enabling it will cause a problem for the code using the interrupt. Though there is no IDF function to check whether or not an interrupt is enabled, you can get this from the underlying GPIO registers. My "protect" function will save the enabled state of all "protected" interrupts before disabling them so they can be re-enabled after the nfs_commit() is completed. Assuming that the nfs_commit() is the offending line.

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

done but core panic
added LOG1() and found

if(nvsKey){
        //nvs_set_blob(homeSpan.charNVS,nvsKey,&value,sizeof(UVal));    // store data
        LOG1("h charNVS4 \n");
        //nvs_commit(homeSpan.charNVS);
      }

it is interesting but nvs_set_blob() calls panic too

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

can you test https://github.com/danilkorotkov/isrESP32 ?
this is encoder button pin connected to your pwm pin

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

updated example above
tested
core panic

safe

    gpio_intr_disable(gpio_num_t(SW));
    Serial.println("Count is completed");
    on->setVal(!on->getVal());                    // Note since NVS storage for this characteristic is set to true, the NVS is accessed while interrupts are still ocurring in background (at rate of 20,000 per second)
    gpio_intr_enable(gpio_num_t(SW));

from homespan.

HomeSpan avatar HomeSpan commented on May 18, 2024

When I try your example the program sends out a continuous stream of setVal() updates. setVal() updates should be limited to at most one per second.

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

It is just for ISR test. Not necessary to pair this board

from homespan.

HomeSpan avatar HomeSpan commented on May 18, 2024

Yes, but continuous calls to setVal will try to update the NVS repeatedly even without the board paired. This itself can be problematic. Do you have a simplified baseline -- something that calls setVal periodically while the interrupt is polling at high-speed?

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

I don’t understand why do we need to build something else if the main aim is the simulation of interrupt during nvs?
I already realized time limitation in /recuperator project and without turning off interrupts: “core panic”

from homespan.

HomeSpan avatar HomeSpan commented on May 18, 2024

It's because if the nvs_commit() is not the problem (since you disabled that) the problem may not lie with the NVS. The NVS calls are used throughout the code base so I want to make sure that I have properly identified the root cause and can create an appropriate method to rectify. You've identified that the issue is in the setVal, but that does more than just call NVS. If I can duplicate a baseline scenario I can than step through all the code and determine exactly where (and why) things are failing. A scenario that calls setVal() continuously is not a good baseline for testing this issue.

Since you seem to have been able to address the issue in your own code by disabling the interrupts, it sounds like you have a good solution that works for your set-up. I'm happy to try to add something more generic to the code, but want to make sure it operates as intended lest it created other unexpected errors with other libraries.

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

I have a radio Digoo sensors example(topic example). There is no problem with sending setVal() rapidly in this case because sensors send update 1 time in 60-90s but the board always in ISR routine because there are many other events in the air of other 433mhz devices and all events call isr routine. As a result I get core panic.

I will try to call setVal() with millis() tomorrow

from homespan.

HomeSpan avatar HomeSpan commented on May 18, 2024

That would be a good base case to test - can you convert that so it uses a PWM signal instead of reading from a receiver?

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

https://github.com/danilkorotkov/isrESP32 updated
2s setVal() interval
do not connect 13-23 before wifi will be connected (it is nvs issue too)

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

core panic tested with 1.0.6 idf
but arduino idf 2.0.2 works different with interrupts

from homespan.

HomeSpan avatar HomeSpan commented on May 18, 2024

That would explain why I'm having problems duplicating the issue. Espressif fixed a lot of issues in moving to version 2. Are there any remaining issues when running under 2.0.2?

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

yes, 2.0.2 fixed this issue

from homespan.

HomeSpan avatar HomeSpan commented on May 18, 2024

okay - I will consider this issue resolved.

from homespan.

danilkorotkov avatar danilkorotkov commented on May 18, 2024

I think the case is closed?)

from homespan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.