Giter Site home page Giter Site logo

Comments (9)

smarlowucf avatar smarlowucf commented on August 28, 2024

For all the hooks of save or update (cache, csp config) maybe the expectation is that the plugin logs the error and re-rasises a native exception with a useful message. This can be caught in main event loop and handled accordingly. If continuing to run could cause trouble then we want the adapter to crash.

For example if the adapter meters a bill and is unable to cache the latest bill then this should lead to adapter halting. As otherwise the adapter will keep metering a bill every time it runs. If there's an error preventing metering in CSP an error can be logged and adapter can retry next wake up event.

For issues with usage retrieval the adapter can log, update state and continue. That way if it's just a temporary network issue it doesn't require manual restart.

Possible error cases:

from csp-billing-adapter.

rtamalin avatar rtamalin commented on August 28, 2024

@smarlowucf PR #77 addresses the specific cases you have mentioned above. I will continue working along the same lines.

from csp-billing-adapter.

rtamalin avatar rtamalin commented on August 28, 2024

Going forward we want to clarify how the csp_config will be updated for various error scenarios, and what the implications of that are from a support perspective and also the product UI integration.

For example, if we have on-going issues getting usage data from the product, but are able to successfully submit bills, we would technically be "in-support" because the deployment is able to submit billing requests, but those billing requires would be zero usage once any existing usage records had been consumed/billed. And the product UI would need to have some way of indicating that state to the operator, to let them know that they should submit a support request.

from csp-billing-adapter.

rtamalin avatar rtamalin commented on August 28, 2024

One thought that occurs to me is that it may make sense to refactor the high level csp_config management to leverage a CSPConfig (or perhaps with a better name?) class object that can be updated during the run to reflect the progress/issues encountered during an event loop handler run, and then committed at the end of the run, setting the appropriate fields in the actual csp_config data store based upon the recorded state...

That would give us the scope to:

  • make decisions about what values to assign to the csp_config data store fields based upon the aggregate state for the entire event loop handling run, rather than just the specific location where the error occurred.
  • track multiple errors occurring during a run
  • compare existing previously stored errors with ones from the current run, retaining the original ones in the case of repeated failures, allowing us to track when an issue first started appearing

That last one may require us to define a structured format for our error messages, e.g. a timestamp followed by the actual error message, or even a timestamp followed by an error id followed by an error message...

from csp-billing-adapter.

smarlowucf avatar smarlowucf commented on August 28, 2024

There's already a number of mechanisms to cover this:

  • Adapter is expected to always run. It updates the timestamp and expiry date on every run. This ensures the adapter is running. If date is beyond expiry date the adapter is in failed state.
  • The last bill details are provided. It's expected that the adapter bills usage every cycle. If the last bill is beyond the end of cycle then something is wrong with billing.
  • The billing details also provides a way to validate what's getting billed.
  • Then the state and errors list provide more fine grained details if there is a mixed state. Such as usage not accessible but last metering was okay.

from csp-billing-adapter.

rtamalin avatar rtamalin commented on August 28, 2024

Yes there are a number of possible mechanisms to handle this, but doing so will require adding appropriate logic handling to ensure that the code is using them appropriately to reflect the varying problem scenarios, if we allow the event loop handling to proceed to the metering phase after encountering issues with earlier steps, such as usage data retrieval.

Taking the example scenario I've been referring to, namely a successfully deployed system, for which valid usage records exist for the current billing period, that starts getting persistent get_data_usage() errors.

With our current code flow, we will hit the usage errors, causing us to update the csp_config to indicate an error, and stop processing for this event loop iteration, and this state will persist so long as the get usage data retrieval issues persist, and eventually the "in-support" status will expire...

However if we change the code flow so that we proceed to the metering phase even if we have issues with getting usage data, at least initially we would have usage records that could be billed at the end of the current billing cycle.

At each event loop cycle we would either just validate metering is working, or, if we hit the end of the billing period, then we would submit a bill, both of which will result in the timestamp and expiry fields being updated, as well as potentially the last_billed and dimensions fields for a case where a bill is submitted, and the billing_api_is_ok field would remain set to true because there are no problems there...

And once into the next billing cycle we would have no usage records to bill for so we would only bill for whatever the configured minimum consumption billing amount is, if any, and we would continue to report that the customer is in support, and has successfully paid whatever billing we may have submitted.

Only the errors list in the csp_config is available to us, currently at least, to record any state information about the fact that there were problems getting usage data for the application in a given event loop cycle, since technically the customer's deployment is validly able to use the billing API, and the customer has potentially been paying the submitted bills for minimum consumption amounts, meaning that the timestamp, expiry, billing_api_is_ok, last_billed and dimensions fields are being updated regularly.

That being the case we would want to:

  1. Make sure that the Product UI integration still flags to the operator that there is a error reported in the csp_config, that should be resolved, potentially by raising a support case.
  2. Ensure that the error message in the csp_config is as useful as possible, e.g. makes an effort to indicate when the issue first started happening, rather than just telling us when the most recent occurrence was...

Agreed we will not be able to get this sort of thing addressed in the Product UI integrations in the near future, but still something we should be aware of.

from csp-billing-adapter.

smarlowucf avatar smarlowucf commented on August 28, 2024

And once into the next billing cycle we would have no usage records to bill for so we would only bill for whatever the configured minimum consumption billing amount is, if any, and we would continue to report that the customer is in support, and has successfully paid whatever billing we may have submitted.

Yes with a min billing amount set it would show up okay in configmap. However, support will be given more information about the cluster in support documents. Thus if adapter last billed a min amount yet the cluster under support has more nodes this will be visible. Couple that with the errors, one of which stating an issue with usage data retrieval there's enough information to deduce something is wrong and to some degree what caused it.

from csp-billing-adapter.

rjschwei avatar rjschwei commented on August 28, 2024

I think we are making the live for support and consumers of our information difficult:

"""
However, support will be given more information about the cluster in support documents. Thus if adapter last billed a min amount yet the cluster under support has more nodes this will be visible. Couple that with the errors, one of which stating an issue with usage data retrieval there's enough information to deduce something is wrong and to some degree what caused it.
"""

This has the potential to trigger lots of questions to us.

"""
There's already a number of mechanisms to cover this:

Adapter is expected to always run. It updates the timestamp and expiry date on every run. This ensures the adapter is running. If date is beyond expiry date the adapter is in failed state.
The last bill details are provided. It's expected that the adapter bills usage every cycle. If the last bill is beyond the end of cycle then something is wrong with billing.
The billing details also provides a way to validate what's getting billed.
Then the state and errors list provide more fine grained details if there is a mixed state. Such as usage not accessible but last metering was okay.
"""

this sounds a lot like an if-then-else condition and we try to avoid those in our own code, why would we inflict this on someone else?

Can we make a state diagram of the failure modes, there should be relatively few

  • usage data retrieval
  • usage data storage
  • billing API access
  • ...?

And then layer in how we expect NV to behave and how we want to communicate those conditions out to NV? Also to consider in the state diagram, could probably even be a simple table, is the temporal aspect, i.e. do we want to behave differently if any of the failure mode happens 10 times (or some other arbitrary number) in a row?

from csp-billing-adapter.

smarlowucf avatar smarlowucf commented on August 28, 2024

Closing this with #88 . I agree @rjschwei that the adapter can make error cases more clear in csp config. However, any changes there are technically API changes. Thus we should target "v2". I will copy the comment to a new issue.

from csp-billing-adapter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.