Giter Site home page Giter Site logo

census-ecosystem / opencensus-go-exporter-stackdriver Goto Github PK

View Code? Open in Web Editor NEW
67.0 12.0 79.0 5.69 MB

OpenCensus Go exporter for Stackdriver Monitoring and Trace

License: Apache License 2.0

Go 98.97% Shell 0.41% Makefile 0.63%
stackdriver opencensus trace stackdriver-monitoring monitoring metrics instrumentation

opencensus-go-exporter-stackdriver's People

Contributors

aabmass avatar aeneev avatar anniefu avatar ascherkus avatar bentekkie avatar bogdandrutu avatar csbell avatar dashpole avatar draffensperger avatar fabxc avatar gaplyk avatar gottwald avatar gsiffert avatar gunturaf avatar haito avatar harwayne avatar ibawt avatar james-bebbington avatar jeanbza avatar knyar avatar lucacome avatar nilebox avatar odeke-em avatar olagacek avatar peiqinzhao avatar punya avatar rakyll avatar rghetia avatar songy23 avatar x13n avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opencensus-go-exporter-stackdriver's Issues

Import cycle

$  go get -u contrib.go.opencensus.io/exporter/stackdriver
import cycle not allowed
package contrib.go.opencensus.io/exporter/stackdriver
	imports cloud.google.com/go/monitoring/apiv3
	imports google.golang.org/api/transport
	imports google.golang.org/api/transport/http
	imports contrib.go.opencensus.io/exporter/stackdriver

Improve error message for distribution with no buckets [was: Stackdriver export is failing]

I have a simple pubsub app using stackdriver exporter and some views. When I go to stackdriver, I see my stats but without any resource (I expect "global"). I put some logging around handleUpload and notice it's erroring out:

rpc error: code = InvalidArgument desc = Field timeSeries[1].points[0].distributionValue had an invalid value: Distribution |explicit_buckets.bounds| does not have at least one entry.
2018/09/11 15:28:45 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = Field timeSeries[1].points[0].distributionValue had an invalid value: Distribution |explicit_buckets.bounds| does not have at least one entry.

Full program is here.

PublishSuccessMillisView looks like this:

PublishSuccessMillis = stats.Int64(statsPrefix+"publish_success_millis", "Number of milliseconds to publish a message", stats.UnitMilliseconds)

...

PublishSuccessMillisView *view.View = distView(PublishSuccessMillis)

...

func distView(m *stats.Int64Measure) *view.View {
	return &view.View{
		Name:        m.Name(),
		Description: m.Description(),
		TagKeys:     []tag.Key{subscriptionKey},
		Measure:     m,
		Aggregation: view.Distribution(),
	}
}

It gets used as such:

millis := end.Sub(start).Nanoseconds() / 1000000
stats.Record(ctx, PublishSuccessMillis.M(millis))

Incorrect trace link types

From the Google Stackdriver Trace documentation, trace link types are documented as:

CHILD_LINKED_SPAN (1) = "The linked span is a child of the current span."
PARENT_LINKED_SPAN (2) = "The linked span is a parent of the current span."

While the open-census implementation documents this as:

LinkTypeChild (1) = "The current span is a child of the linked span."
LinkTypeParent (2) = "The current span is the parent of the linked span."

From what I can tell, this exporter currently translates LinkTypeChild into CHILD_LINKED_SPAN and LinkTypeParent into PARENT_LINKED_SPAN, which seems to be the opposite of what the documentation states it should be.

pkg/monitoredresource.Autodetect(): causes me to vendor aws-sdk-go

vendor/contrib.go.opencensus.io/exporter/stackdriver/monitoredresource/aws_identity_doc_utils.go

has imports:

18:	"github.com/aws/aws-sdk-go/aws/ec2metadata"
19:	"github.com/aws/aws-sdk-go/aws/session"

but importing stackdriver/monitoredresource causes me to vendor github.com/aws/aws-sdk-go/aws in my godep package.

I'm not using aws at all, so it's not right for me to have aws sdk end up in my final binary.

There are ways to avoid this such as building a wrapper that satisfies an interface.

Stats: Flaky errors when exporting to Stackdriver

I'm using OpenCensus Stackdriver exporter in a container running on GKE. I use cloud.google.com/go/compute/metadata to get the ProjectID and pass it to OpenCensus Stackdriver exporter. Sometimes I got following errors when I start to run my container in a pod.

2019/02/14 21:10:04 Failed to export to Stackdriver: context deadline exceeded
2019/02/14 21:10:04 Failed to export to Stackdriver: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: authentication handshake failed: read tcp 10.32.4.155:48788->74.125.129.95:443: read: connection reset by peer"

If I delete the pod and let GKE recreates a new one for me without doing anything else, sometimes it works again.

What reason could be that sometimes I got those errors sometimes not?

Stats exporter does not work out of the box on app engine

I'm running an app on app engine where runtime is go111 and stats exporter does not work out of the box getting rpc error: code = Internal desc = One or more TimeSeries could not be written: An internal error occurred.: timeSeries[0-5] error. This is because multiple instances try to write to the same timeseries.

By default, exporter tags stats by hostname and pid to make destination timeseries unique (https://github.com/census-ecosystem/opencensus-go-exporter-stackdriver/blob/master/stats.go#L152
). However hostname/pid are all the same on app engine.

We can use GAE_INSTANCE environment variable which is available app engine instance and is unique across instances.

Is it OK for you guys to add app engine specific code (it'll only rely on os package not google.golang.org/appengine)? If so, I'm willing to send a PR. Thanks!

proposal: provide concrete error to report failed rows and failed spans to users

Requested by a cloud team that needs an enhanced Stackdriver exporter, there is need to have an error report what fields are related to the failure. They had requested that we change the Stats exporter to take in an OnError of the form

func OnError(err error, rows ...*view.Row)

instead of

func OnError(err Error)

However, that would:
a) Be a breaking change -- those are two different signatures if users already defined:

OnError: func(e error) {
       // handle error
}

b) The previously proposed change only applies for stats yet this exporter is both a trace and stats exporter, so we also need a solution that handles stats and tracing

Proposition

We create an introspectable error that can be type asserted on e.g.

type DetailsError {
    failedSpanData []*trace.SpanData
    failedViewData []*view.Data
    err error
}

func (de *DetailsError) Error() error {
    if de == nil || de.err == nil {
       return ""
    }
    return de.err.Error()
}

func (de *DetailsError) FailedSpanData() []*trace.SpanData { return de.failedSpanData }
func (de *DetailsError) FailedViewData() []*view.Data { return de.failedViewData }

obviously with the contract that returned attributes are read-only

Sample usage

sd, err := stackdriver.NewExporter(stackdriver.Options{
     OnError: func(err error) {
          de := err.(DetailsError)
          switch {
          case fsd := de.FailedSpanData(); len(fsd) > 0:
              // Handle the failed spans
          case fvd := de.FailedViewData(); len(fvd) > 0:
              // Handle the failed view data/rows
          }
     }.
})

and I believe with this kind of error, we'd satisfy that requirement and give users the ability to introspect their errors and figure out which rows

/cc @Ramonza @lychung83

Document GOOGLE_APPLICATION_CREDENTIALS

In order to use a custom service key, users can set the GOOGLE_APPLICATION_CREDENTIALS env variable. The exporter should document this capability in the godoc.

No way to set an empty MetricPrefix

It is currently impossible to set an empty MetricPrefix. Setting this field to an empty string causes it to assume the default value of "OpenCensus".

In many cases (perhaps most), the default view names are already well namespaced enough to not require any global prefix. For example, this is what the gRPC metrics look like with a prefix of "testapp":

image

LastValue exported as aggregation and not as gauge

Hi,

Im pretty sure im doing something wrong but cant seem to get it to export as gauge.

Versions:

Locking in v0.14.0 (e262766) for direct dep go.opencensus.io
Locking in v0.5.0 (37aa280) for transitive dep contrib.go.opencensus.io/exporter/stackdriver

Sample:

package main

import (
	"context"
	"fmt"
	"log"
	"time"

	"go.opencensus.io/exporter/stackdriver"
	"go.opencensus.io/stats"
	"go.opencensus.io/stats/view"
)

var (
	numOpenMeasure = stats.Int64("opencensus.io/test/num_open_test_measure_v3", "Number open connections", stats.UnitDimensionless)
)

func main() {
	exporter, err := stackdriver.NewExporter(stackdriver.Options{
		ProjectID: "xxx"})

	if err != nil {
		log.Fatalln(err)
	}

	view.RegisterExporter(exporter)

	numOpenView := &view.View{
		Name:        "opencensus.io/test/num_open_test_view_v3",
		Description: "Number open connections",
		Measure:     numOpenMeasure,
		Aggregation: view.LastValue(),
	}

	if err := view.Register(
		numOpenView,
	); err != nil {
		log.Fatal(err)
	}

	numOpen := int64(50)

	for {
		stats.Record(context.Background(), numOpenMeasure.M(numOpen))
		fmt.Println("Num open: ", numOpen)
		time.Sleep(5 * time.Second)
		numOpen++
	}
}

Stackdriver:
screen shot 2018-08-14 at 09 10 22

Any ideas?

Thanks!

No Resource information tied with traces?

I am moving to this library from cloud.google.com/go/trace due to the obsolete notice, but noticed a few things that has me scratching my head.

I setup the package as follows (I'm using AppEngine Flexible):

import (
	"net/http"
	"os"

	"contrib.go.opencensus.io/exporter/stackdriver"
	"contrib.go.opencensus.io/exporter/stackdriver/propagation"
	"go.opencensus.io/plugin/ocgrpc"
	"go.opencensus.io/plugin/ochttp"
	"go.opencensus.io/stats/view"
	"go.opencensus.io/trace"
	mrpb "google.golang.org/genproto/googleapis/api/monitoredres"
)

func InitTraceClient() error {
        res := &mrpb.MonitoredResource{}
	res.Type = "gae_app"
	res.Labels = make(map[string]string)
	res.Labels["project_id"] = os.Getenv("GOOGLE_CLOUD_PROJECT")
	res.Labels["module_id"] = os.Getenv("GAE_SERVICE")
	res.Labels["version_id"] = os.Getenv("GAE_VERSION")

	exporter, err := stackdriver.NewExporter(stackdriver.Options{
		ProjectID: os.Getenv("GOOGLE_CLOUD_PROJECT"),
		Resource:  res,
	})
	if err != nil {
		return err
	}

	view.RegisterExporter(exporter)
	trace.RegisterExporter(exporter)

	if err = view.Register(ochttp.DefaultClientViews...); err != nil {
		return err
	}

	if err = view.Register(ocgrpc.DefaultClientViews...); err != nil {
		return err
	}
        return nil
}

For my datastore client I do:

func getDatastoreClient(ctx context.Context) (*datastore.Client, error) {
	var options []option.ClientOption
	options = append(options, option.WithGRPCDialOption(grpc.WithStatsHandler(&ocgrpc.ClientHandler{})))
	return datastore.NewClient(ctx, "", options...)
}

And I add middleware to start tracing on every request:

func TraceHandler(h http.Handler) http.Handler {
	traceHandler := &ochttp.Handler{
		Handler:          h,
		Propagation:      &stackdriver.HTTPFormat{},
		IsPublicEndpoint: false, // I've tried true here as well...
		StartOptions: trace.StartOptions{
			Sampler: trace.AlwaysSample(),
		},
	}
	fn := func(w http.ResponseWriter, r *http.Request) {
		traceHandler.Handler.ServeHTTP(w, r)
	}
	return http.HandlerFunc(fn)
}

However, I get no HTTP traces in StackDriver, just datastore ones and other gRPC traces I didn't even intend to monitor (Logging). There are no labels associated with the spans to help identify the service/version/http request the spans originated from.

image

Am I missing something or is this package still not ready to trace requests like cloud.google.com/go/trace is able to?

Thanks for this feature in general, it's a really helpful tool and I really like the idea of OpenCensus!

Users cannot provide custom context

I am trying to use OpenCensus to report Stackdriver metrics from an App Engine app. This does not currently work, since App Engine expects a custom context to be used for all API calls, while the exporter just does context.Background() whenever it needs one.

Is there any reason why stackdriver.NewExporter() does not accept a custom context?

FYI, this is the specific error I am getting:

panic: not an App Engine context

goroutine 14 [running]:
panic(0x1672a60, 0xc0084374e0)
	go/src/runtime/panic.go:491 +0x283
google.golang.org/appengine/internal.fullyQualifiedAppID(0x1cb64c0, 0xc008414080, 0x1d003c0, 0x0)
	google.golang.org/appengine/internal/identity_classic.go:54 +0x95
google.golang.org/appengine/internal.FullyQualifiedAppID(0x1cb64c0, 0xc008414080, 0xc00870d290, 0x1ca4b00)
	google.golang.org/appengine/internal/api_common.go:77 +0x98
google.golang.org/appengine/internal.AppID(0x1cb64c0, 0xc008414080, 0xc00870d290, 0x1)
	google.golang.org/appengine/internal/identity.go:13 +0x35
google.golang.org/appengine.AppID(0x1cb64c0, 0xc008414080, 0xc008669301, 0x33)
	google.golang.org/appengine/identity.go:20 +0x35
golang.org/x/oauth2/google.findDefaultCredentials(0x1cb64c0, 0xc008414080, 0xc0086c5440, 0x4, 0x4, 0x163e6c0, 0x1, 0xc0086c5440)
	golang.org/x/oauth2/google/default.go:65 +0x52f
golang.org/x/oauth2/google.FindDefaultCredentials(0x1cb64c0, 0xc008414080, 0xc0086c5440, 0x4, 0x4, 0x4, 0x4, 0x4)
	golang.org/x/oauth2/google/go19.go:48 +0x53
google.golang.org/api/internal.Creds(0x1cb64c0, 0xc008414080, 0xc008621c20, 0x18, 0x167d580, 0x30)
	google.golang.org/api/internal/creds.go:41 +0x108
google.golang.org/api/transport/grpc.dial(0x1cb64c0, 0xc008414080, 0xf0fd00, 0xc0086c5400, 0x3, 0x4, 0xc008527368, 0xf3ddf4, 0xc0086c5400)
	google.golang.org/api/transport/grpc/dial.go:65 +0x50e
google.golang.org/api/transport/grpc.Dial(0x1cb64c0, 0xc008414080, 0xc0086c5400, 0x3, 0x4, 0xc0086c53c0, 0xc0086c5400, 0x1d)
	google.golang.org/api/transport/grpc/dial.go:37 +0x58
google.golang.org/api/transport.DialGRPC(0x1cb64c0, 0xc008414080, 0xc0086c5400, 0x3, 0x4, 0x1, 0x1, 0x1)
	google.golang.org/api/transport/dial.go:41 +0x53
cloud.google.com/go/monitoring/apiv3.NewMetricClient(0x1cb64c0, 0xc008414080, 0xc0087072f0, 0x1, 0x1, 0xc0087072f0, 0x0, 0x1)
	cloud.google.com/go/monitoring/apiv3/metric_client.go:106 +0xfe
contrib.go.opencensus.io/exporter/stackdriver.newStatsExporter(0xc00870b5a0, 0x20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	contrib.go.opencensus.io/exporter/stackdriver/stats.go:81 +0x112
contrib.go.opencensus.io/exporter/stackdriver.NewExporter(0xc00870b5a0, 0x20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	contrib.go.opencensus.io/exporter/stackdriver/stackdriver.go:169 +0x7e

"Duplicate TimeSeries encountered" errors from Stackdriver

After upgrading from 0.7.0 to the latest master, I started seeing occasional errors like this from CreateTimeSeries :

rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Field timeSeries[1] had an invalid value: Duplicate TimeSeries encountered. Only one point can be written per TimeSeries per request.: timeSeries[1]

Here's a sample request matching the error message above (slightly reformatted):

name:"projects/xxxxx"
time_series:<metric:<type:"custom.googleapis.com/opencensus/ts_bridge/metric_import_latencies" labels:<key:"metric_name" value:"sli_sample_ratio10m" > labels:<key:"opencensus_task" value:"go-2@localhost" > > resource:<type:"global" > points:<interval:<end_time:<seconds:1539597540 nanos:971130331 > start_time:<seconds:1539597540 nanos:971119624 > > value:<distribution_value:<count:1 mean:602 bucket_options:<explicit_buckets:<bounds:100 bounds:250 bounds:500 bounds:1000 bounds:2000 bounds:3000 bounds:4000 bounds:5000 bounds:7500 bounds:10000 bounds:15000 bounds:20000 bounds:40000 bounds:60000 bounds:90000 bounds:120000 bounds:300000 bounds:600000 > > bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:1 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 > > > >
time_series:<metric:<type:"custom.googleapis.com/opencensus/ts_bridge/metric_import_latencies" labels:<key:"metric_name" value:"sli_sample_ratio10m" > labels:<key:"opencensus_task" value:"go-2@localhost" > > resource:<type:"global" > points:<interval:<end_time:<seconds:1539597541 nanos:703625748 > start_time:<seconds:1539597540 nanos:971119624 > > value:<distribution_value:<count:1 mean:602 bucket_options:<explicit_buckets:<bounds:100 bounds:250 bounds:500 bounds:1000 bounds:2000 bounds:3000 bounds:4000 bounds:5000 bounds:7500 bounds:10000 bounds:15000 bounds:20000 bounds:40000 bounds:60000 bounds:90000 bounds:120000 bounds:300000 bounds:600000 > > bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:1 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 > > > >
time_series:<metric:<type:"custom.googleapis.com/opencensus/ts_bridge/metric_import_latencies" labels:<key:"metric_name" value:"total_bytes_rcvd" > labels:<key:"opencensus_task" value:"go-2@localhost" > > resource:<type:"global" > points:<interval:<end_time:<seconds:1539597541 nanos:703625748 > start_time:<seconds:1539597540 nanos:971119624 > > value:<distribution_value:<count:1 mean:718 bucket_options:<explicit_buckets:<bounds:100 bounds:250 bounds:500 bounds:1000 bounds:2000 bounds:3000 bounds:4000 bounds:5000 bounds:7500 bounds:10000 bounds:15000 bounds:20000 bounds:40000 bounds:60000 bounds:90000 bounds:120000 bounds:300000 bounds:600000 > > bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:1 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 > > > >
time_series:<metric:<type:"custom.googleapis.com/opencensus/ts_bridge/metric_import_latencies" labels:<key:"metric_name" value:"total_bytes_sent" > labels:<key:"opencensus_task" value:"go-2@localhost" > > resource:<type:"global" > points:<interval:<end_time:<seconds:1539597541 nanos:703625748 > start_time:<seconds:1539597540 nanos:971119624 > > value:<distribution_value:<count:1 mean:735 bucket_options:<explicit_buckets:<bounds:100 bounds:250 bounds:500 bounds:1000 bounds:2000 bounds:3000 bounds:4000 bounds:5000 bounds:7500 bounds:10000 bounds:15000 bounds:20000 bounds:40000 bounds:60000 bounds:90000 bounds:120000 bounds:300000 bounds:600000 > > bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:1 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 > > > >
time_series:<metric:<type:"custom.googleapis.com/opencensus/ts_bridge/import_latencies" labels:<key:"opencensus_task" value:"go-2@localhost" > > resource:<type:"global" > points:<interval:<end_time:<seconds:1539597541 nanos:703638673 > start_time:<seconds:1539597540 nanos:971119624 > > value:<distribution_value:<count:1 mean:1337 bucket_options:<explicit_buckets:<bounds:100 bounds:250 bounds:500 bounds:1000 bounds:2000 bounds:3000 bounds:4000 bounds:5000 bounds:7500 bounds:10000 bounds:15000 bounds:20000 bounds:40000 bounds:60000 bounds:90000 bounds:120000 bounds:300000 bounds:600000 > > bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:1 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 > > > >
time_series:<metric:<type:"custom.googleapis.com/opencensus/ts_bridge/oldest_metric_age" labels:<key:"opencensus_task" value:"go-2@localhost" > > resource:<type:"global" > points:<interval:<end_time:<seconds:1539597541 nanos:703641788 > > value:<int64_value:479964 > > >

As you can see, timeSeries[0] and timeSeries[1] are identical except for the end_time (which is less than a second apart).

Stackdriver requires at most one data point per request per time series. I think the exporter will need either to only send the latest point (discarding earlier ones), or use several separate CreateTimeSeries calls per time series.

I believe I've not seen such errors on 0.7.0. I've not examined the diff between 0.7.0 and master very closely, but looking at the list of commits 2f26a5d seems the most suspicious.

Allow MonitoredResource to be mutable.

Reported offline by a few users. Currently we only allow setting MonitoredResource when creating the exporter, and later user cannot update the resource. While in most cases we expect MonitoredResource to be the same throughout the application lifetime, there may be cases when users want to associate different MonitoredResources with different metric batches. Consider supporting dynamic MonitoredResource when uploading metrics to Stackdriver.

Another reference: OC-Agent metrics protocol also supports dynamic resources:

  // The resource for the metrics in this message that do not have an explicit
  // resource set.
  // If unset, the most recently set resource in the RPC stream applies. It is
  // valid to never be set within a stream, e.g. when no resource info is known
  // at all or when all sent metrics have an explicit resource set.

stats: do not create MetricDescriptor for Stackdriver built-in metrics.

Stackdriver Monitoring API only allows customer to create MetricDescriptor for metrics with custom.googleapis.com/ or external.googleapis.com/prometheus/ metric prefixes. For other prefixes, the API returns permission denied error.

Currently stackdriver exporter always creates MetricDescriptor before sending metrics. For metrics with other domain prefixes, it fails to send data to Stackdriver.

on stats, i always have an Unauthenticated error

Hello, i try to test opencensus with my stackdriver project.

With the example code, i always have an unauthenticated error when i record stats.

After a few debug i have removed the projectID to use the default credential because if it is set, the exporter not get authentication (

if o.ProjectID == "" {
)

But i have always the same error, the context.Context is empty after authentication (

creds, err := google.FindDefaultCredentials(ctx, traceapi.DefaultAuthScopes()...)
)

And i always have error on record...

I have tested my credential with google example (https://github.com/GoogleCloudPlatform/golang-samples/blob/master/monitoring/monitoring_quickstart/main.go) and all is good.

Can you help me to understand why?

This is on stackdriver export 0.8.0 and opencensus 0.18.0

Retry mechanism?

Moved the issue from census-instrumentation/opencensus-go#766 (comment).

Original issue said:

I am instrumenting an API client for some latency sensitive applications and on turning the sampling rate to always on, which would mimick a server receiving say 50,000 requests per second but with a sampling rate of 1 in 100 so ideally 500 traced QPS I get back thousands of Stackdriver export errors logged on almost a 5 second interval [1]. Perhaps some retry mechanism with exponential backoff or by default using a large buffer as convincing people to using OpenCensus for very traffic applications is bound to happen and it would be worrying to see a bunch of those logs

[1] https://gist.github.com/odeke-em/32cf7359f397a4b93692bcf46109e184 with a sample inlined

$ GOOGLE_APPLICATION_CREDENTIALS=~/creds.json go run main.go 
2018/05/27 20:44:47 OpenCensus Stackdriver exporter: failed to upload span: buffer full
2018/05/27 20:44:52 OpenCensus Stackdriver exporter: failed to upload 1126 spans: buffer full
2018/05/27 20:44:57 OpenCensus Stackdriver exporter: failed to upload 1652 spans: buffer full
2018/05/27 20:45:02 OpenCensus Stackdriver exporter: failed to upload 1447 spans: buffer full
2018/05/27 20:45:07 OpenCensus Stackdriver exporter: failed to upload 1598 spans: buffer full
2018/05/27 20:45:12 OpenCensus Stackdriver exporter: failed to upload 925 spans: buffer full
2018/05/27 20:45:17 OpenCensus Stackdriver exporter: failed to upload 1534 spans: buffer full
2018/05/27 20:45:22 OpenCensus Stackdriver exporter: failed to upload 1403 spans: buffer full
2018/05/27 20:45:27 OpenCensus Stackdriver exporter: failed to upload 1034 spans: buffer full
2018/05/27 20:45:32 OpenCensus Stackdriver exporter: failed to upload 1538 spans: buffer full
....

/cc @odeke-em @Ramonza

stats Flush not working

According to the documentation, Flush waits for exported view data to be uploaded. This is useful if your program is ending and you do not want to lose recent spans.
This is supposed to be for the stats but the comment mentions spans. If this is supposed to work for stats, it is not working.

Here is my test case:

package main

import (
    "context"
    "fmt"
    "log"
    "time"

    "go.opencensus.io/exporter/stackdriver"
    "go.opencensus.io/stats"
    "go.opencensus.io/stats/view"
    "go.opencensus.io/tag"
)

var MerrorCount = stats.Int64("razvan.test/measures/error_count", "number of errors encounterd", "1")

var (
    ErrorCountView = &view.View{
        Name:        "demo/razvan/rep_period_test",
        Measure:     MerrorCount,
        Description: "Testing various reporting periods",
        Aggregation: view.Count(),
    }
)

func main() {

    KeyMethod, _ := tag.NewKey("test")
    ctx := context.Background()
    ctx, _ = tag.New(ctx, tag.Insert(KeyMethod, "30 sec reporting"))
    // Register the views
    if err := view.Register(ErrorCountView); err != nil {
        log.Fatalf("Failed to register views: %v", err)
    }
    // SD Exporter
    sd, err := stackdriver.NewExporter(stackdriver.Options{
        ProjectID: "opencensus-test",
        // MetricPrefix helps uniquely identify your metrics.
        MetricPrefix: "opencensus-test",
    })
    if err != nil {
        log.Fatalf("Failed to create the Stackdriver exporter: %v", err)
    }
    // It is imperative to invoke flush before your main function exits
    defer sd.Flush()

    view.RegisterExporter(sd)

    // Set reporting period to report data at every second.
    view.SetReportingPeriod(60000 * time.Millisecond)

    ticker := time.NewTicker(1000 * time.Millisecond)
    i := 0
    go func() {
        for {
            select {
            case <-ticker.C:
                stats.Record(ctx, MerrorCount.M(1))
                i++
            }
        }
    }()
    runtime := 65000
    time.Sleep(time.Duration(runtime) * time.Millisecond)
    fmt.Printf("Incremented  %d times\n", i)
}

I only get the value exported after 60 seconds.

stats: don't let bad View data fail the whole export process

Actual:
We call createMeasure before createTimeSeries when reporting metrics data. If createMeasure returns an error, the whole reporting process will fail.

Expected:
Drop the bad View data if createMeasure returns an error. Let remained data to be sent to stackdriver.

Remove enforcing single-project-per-process

Currently in newStatsExporter we record seenProjects[o.ProjectID] = true and check that we only call this function once per project ID.

This makes it inconvenient to use the exporter in situations where it might be dynamically created at runtime, for example in an Istio adapter.

We should remove this enforcement and just document that creating multiple exporters with the same project ID and monitored resource in the same process is not supported if you register as a stats exporter.

ClientCompletedRPCsView does not have meaningful groupings

In my code I have,

// Subscribe views to see stats in Stackdriver Monitoring.
	if err := view.Register(
		ocgrpc.ClientSentBytesPerRPCView,
		ocgrpc.ClientReceivedBytesPerRPCView,
		ocgrpc.ClientRoundtripLatencyView,
		ocgrpc.ClientCompletedRPCsView,
		ocgrpc.ClientSentMessagesPerRPCView,
		ocgrpc.ClientReceivedMessagesPerRPCView,
		ocgrpc.ClientServerLatencyView,
		pubsub.AckCountView,
		pubsub.ModAckCountView,
		pubsub.NackCountView,
		pubsub.PullCountView,
		pubsub.StreamOpenCountView,
		pubsub.StreamRequestCountView,
		pubsub.StreamRetryCountView,
	); err != nil {
		panic(err)
	}

When I go to stackdriver, I expect to be able to group by RPC. However, the groupings for ClientCompletedRPCsView all have meaningless names.

screen shot 2018-10-12 at 1 22 27 pm

Unexpected span display name prefixes

I've been using Stackdriver Trace for some time via the zipkin-gcp exporter, but have recently been experimenting with Istio, which sends spans to Stackdriver via this exporter. Istio lets you configure how your spans will look, including their name. I spent a bunch of time racking my brain trying to figure out why Istio was prefixing my client spans with Sent. and my server spans with Recv. before tracking it down to this exporter.

Is this a convention I'm unaware of? It was quite unexpected, and the period delimiter could be confusing given that I want my span names to be derived from the HTTP host header. Assuming the Host header example.org the spans would appear as Sent.example.org and Recv.example.org in the Stackdriver Trace UI.

RPC errors should include the failed method.

An RPC error (e.g., "ResourceExhausted") results in a log line that does not include the failed RPC method, e.g.:

2019/01/29 02:53:20 Failed to export to Stackdriver: rpc error: code = ResourceExhausted desc = Resource has been exhausted (e.g. check quota).

That makes it hard to debug the failed RPCs, because it's not clear which RPC actually failed.
The error handler should propagate and log the name of the failed RPC.

The recommended reporting period for the Monitoring API is 60 seconds.

There appears to be no recommendation here for a suitable reporting period for the Monitoring API. The sample code sets it to one second:

view.SetReportingPeriod(1 * time.Second)

If this corresponds to the period of the time series data sent to the Monitoring API, it is way too short. Best practice is a period of 60 seconds.

Apologies if I have misunderstood and this does not actually determine the period between points in a time series. I am not very familiar with OpenCensus.

blank HTTP Host variable is logged

Blank /http/host values are being sent because this exporter uses the r.URL.Host variable (that is often empty) and doesn't log the r.Host (which is often provided) on http.Request objects.

The reason why is described in this stackoverflow answer: https://stackoverflow.com/questions/42921567/what-is-the-difference-between-host-and-url-host-for-golang-http-request

I've not found a case in which the r.URL.Host was the desired one since HTTP/1.1 and newer (which require Host headers) is standard now.

Allow arbitrary metric prefixes

Today OpenCensus Go only supports custom.googleapis.com domain prefix but Stackdriver supports more prefixes (and it's expanding - https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.metricDescriptors#MetricDescriptor). OC should allow users to report metrics for their registered domain.

return path.Join("custom.googleapis.com", "opencensus", v)

Counterpart in Java: census-instrumentation/opencensus-java#1440

Exporter errors and stops reporting after some time

I noticed that when I leave my app running for a period of time, eventually it stops reporting with this error:

{
 insertId:  "i1qjvug3b9hn98"  
 labels: {โ€ฆ}  
 logName:  "projects/deklerk-sandbox/logs/go-ps-consumer"  
 receiveTimestamp:  "2018-10-15T15:03:37.561005583Z"  
 resource: {โ€ฆ}  
 severity:  "INFO"  
 textPayload:  "got stackdriver-opencensus err rpc error: code = Internal desc = One or more TimeSeries could not be written: An internal error occurred.: timeSeries[10,11]
"  
 timestamp:  "2018-10-15T15:03:33Z"  
}

What does this error mean? Minimally, should it include the reason why the timeseries could not be written?

Traces do not span across service calls

The traces do not currently span across the different services in my setup. I am currently using GKE cluster with the following setup for my endpoint GRPC. I have written a wrapping function that is stored in the package middleware. The dialer and the GRPC server setups are shown below.

I have looked at some resources. Mainly the following two links:

The first is empty and requires additional documentation but I thought it would be what I was looking for. The second uses the old stackdriver trace package, but the author wraps the logic with his own to pass on the headers. Is that required?

grpcServer := grpc.NewServer(unaryInterceptor, tracer.ServerOptionPublicFacing())	unaryInterceptorBack := grpc.UnaryInterceptor(grpcMiddleware.ChainUnaryServer(
		grpcLogging.UnaryServerInterceptor(log.NewEntry(l)),
		grpcRecovery.UnaryServerInterceptor(),
	))
grpc.Dial(c.Services.Address, grpc.WithInsecure(), tracer.ClientDialOption())
package middleware

import (
	"context"
	"fmt"
	"go.opencensus.io/exporter/stackdriver"
	"go.opencensus.io/plugin/ocgrpc"
	"go.opencensus.io/trace"
	"google.golang.org/genproto/googleapis/api/monitoredres"
	"google.golang.org/grpc"
	"os"
)

// StackDriverTracer is a middleware component for the stackdriver tracer on GCP
type StackDriverTracer struct {
}

const googleProjectID = "GOOGLE_PROJECT_ID"

// NewStackDriverTracer returns a structure with the tracer, GOOGLE_PROJECT_ID must be set as environment
// variable
func NewStackDriverTracer(ctx context.Context) (*StackDriverTracer, error) {
	projectID := os.Getenv(googleProjectID)
	if projectID == "" {
		return nil, fmt.Errorf("the following environment variable must be set %s", googleProjectID)
	}
	exporter, err := stackdriver.NewExporter(stackdriver.Options{
		ProjectID: projectID,
		// Set a MonitoredResource that represents a GKE container.
		Resource: &monitoredres.MonitoredResource{
			Type: "gke_container",
			Labels: map[string]string{
				"project_id": projectID,
			},
		},
	})
	if err != nil {
		return nil, err
	}
	trace.ApplyConfig(trace.Config{DefaultSampler: trace.AlwaysSample()})
	trace.RegisterExporter(exporter)
	return &StackDriverTracer{}, nil
}

// ClientDialOption provides a client option for the client
func (t *StackDriverTracer) ClientDialOption() (option grpc.DialOption) {
	return grpc.WithStatsHandler(&ocgrpc.ServerHandler{})
}

// ServerOptionPublicFacing provides an option for the server that is public facing
func (t *StackDriverTracer) ServerOptionPublicFacing() (grpc.ServerOption) {
	return grpc.StatsHandler(&ocgrpc.ServerHandler{IsPublicEndpoint: true})
}

// ServerOptionInternal provides an option for the server that is internal
func (t *StackDriverTracer) ServerOptionInternal() (grpc.ServerOption) {
	return grpc.StatsHandler(&ocgrpc.ServerHandler{IsPublicEndpoint: false})
}

Provide convenience constructors for MonitoredResource types

Custom metrics (as produced by this exporter) are only compatible with a handful of MonitoredResource types. These have a fairly complicated set of properties associated with them. We should provide strongly-typed constructor functions to correctly build these MonitoredResources.

We could also consider an "Autodetect" MonitoredResource, which would rely on auto-detecting based on the runtime environment (Java currently does this).

vgo: importing exporter/stackdriver vastly increases dependencies

If I'm using go.mod with go1.11 and have

require (
	contrib.go.opencensus.io/exporter/stackdriver v0.8.0

in my go.mod file, it brings some unused dependencies like aws-sdk-go. If I require v0.5.0 this problem doesn't exist because it was kinda fixed in #35, but v0.8.0 re-exposes the problem.

For example, v0.8.0 brings github.com/aws/[email protected] which brings github.com/jmespath/go-jmespath which causes my go build to fail on an golang:1.11-alpine image because go-jmespath requires gcc.

I see issues like #60 also opened that describes this. Can we do anything about this?

cc: @rghetia

Export from GKE to Stackdriver broken with latest update

Hi,

I think that this commit (#90) may have broken my export. I haven't changed my export code, recompiled yesterday and now I am getting:

2019/02/28 12:01:10 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: pod_id: timeSeries[0]
2019/02/28 12:02:09 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: namespace_id: timeSeries[0,1,3,4,7,9]; Unrecognized resource label: pod_id: timeSeries[2,6,8]; Unrecognized resource label: zone: timeSeries[5]
2019/02/28 12:02:10 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: zone: timeSeries[0]
2019/02/28 12:03:09 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: namespace_id: timeSeries[0-3,9]; Unrecognized resource label: pod_id: timeSeries[5]; Unrecognized resource label: zone: timeSeries[4,6-8]
2019/02/28 12:03:10 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: namespace_id: timeSeries[0]
2019/02/28 12:04:09 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: namespace_id: timeSeries[0,1,4]; Unrecognized resource label: pod_id: timeSeries[6-9]; Unrecognized resource label: zone: timeSeries[2,3,5]
2019/02/28 12:04:10 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: instance_id: timeSeries[0]
2019/02/28 12:05:09 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: namespace_id: timeSeries[2-7]; Unrecognized resource label: pod_id: timeSeries[0,1,9]; Unrecognized resource label: zone: timeSeries[8]
2019/02/28 12:05:10 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: zone: timeSeries[0]

I am running on GKE. My code is very simple, you can see it here:

https://github.com/DanTulovsky/playground/blob/master/frontend/run.go#L45

Is there more information I can provide here?
Thanks
Dan

stats, metrics: deduplicate TimeSeries before making CreateTimeSeriesRequest-s

A bug/inadequecy that I've found while doing a live test with the OpenCensus Agent. If multiple metrics are streamed from multiple sources and more than one at export instance share the same name, we'll have an error from Stackdriver's backend. This is because per CreateTimeSeriesRequest, it expects unique metrics. This problem has plagued even the stats exporter for years and the advice/work-around was setting view.SetReportingPeriod but this just masked the problem, because it gave time for aggregation to occur within an exporting period.

In the case where you have metrics concurrently streamed in, all bets are off for example given this data

{
    "name": "projects/census-demos",
    "time_series": [
        {
            "metric": {
                "type": "custom.googleapis.com/opencensus/oce/dev/latency",
                "labels": {
                    "client": "cli",
                    "method": "repl",
                    "opencensus_task": "[email protected]"
                }
            },
            "resource": {
                "type": "global"
            },
            "points": [
                {
                    "interval": {
                        "end_time": {
                            "seconds": 1547860312,
                            "nanos": 655706000
                        },
                        "start_time": {
                            "seconds": 1547857792,
                            "nanos": 658197000
                        }
                    },
                    "value": {
                        "Value": {
                            "DistributionValue": {
                                "count": 399,
                                "mean": 6461.507067283209,
                                "sum_of_squared_deviation": 5680369911.614502,
                                "bucket_options": {
                                    "Options": {
                                        "ExplicitBuckets": {
                                            "bounds": [
                                                0,
                                                10,
                                                50,
                                                100,
                                                200,
                                                400,
                                                800,
                                                1000,
                                                1400,
                                                2000,
                                                5000,
                                                10000
                                            ]
                                        }
                                    }
                                },
                                "bucket_counts": [
                                    0,
                                    0,
                                    1,
                                    0,
                                    3,
                                    3,
                                    21,
                                    5,
                                    17,
                                    15,
                                    89,
                                    153,
                                    92
                                ]
                            }
                        }
                    }
                }
            ]
        },
        {
            "metric": {
                "type": "custom.googleapis.com/opencensus/oce/dev/process_counts",
                "labels": {
                    "client": "cli",
                    "method": "repl",
                    "opencensus_task": "[email protected]"
                }
            },
            "resource": {
                "type": "global"
            },
            "points": [
                {
                    "interval": {
                        "end_time": {
                            "seconds": 1547860312,
                            "nanos": 655722000
                        },
                        "start_time": {
                            "seconds": 1547857792,
                            "nanos": 658197000
                        }
                    },
                    "value": {
                        "Value": {
                            "Int64Value": 399
                        }
                    }
                }
            ]
        },
        {
            "metric": {
                "type": "custom.googleapis.com/opencensus/oce/dev/latency",
                "labels": {
                    "client": "cli",
                    "method": "repl",
                    "opencensus_task": "[email protected]"
                }
            },
            "resource": {
                "type": "global"
            },
            "points": [
                {
                    "interval": {
                        "end_time": {
                            "seconds": 1547860372,
                            "nanos": 653868000
                        },
                        "start_time": {
                            "seconds": 1547857792,
                            "nanos": 658197000
                        }
                    },
                    "value": {
                        "Value": {
                            "DistributionValue": {
                                "count": 409,
                                "mean": 6443.895616823964,
                                "sum_of_squared_deviation": 5882240635.357754,
                                "bucket_options": {
                                    "Options": {
                                        "ExplicitBuckets": {
                                            "bounds": [
                                                0,
                                                10,
                                                50,
                                                100,
                                                200,
                                                400,
                                                800,
                                                1000,
                                                1400,
                                                2000,
                                                5000,
                                                10000
                                            ]
                                        }
                                    }
                                },
                                "bucket_counts": [
                                    0,
                                    0,
                                    1,
                                    0,
                                    4,
                                    3,
                                    22,
                                    6,
                                    17,
                                    16,
                                    90,
                                    156,
                                    94
                                ]
                            }
                        }
                    }
                }
            ]
        }
    ]
}

we get an error

err: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Field timeSeries[2] had an invalid value: Duplicate TimeSeries encountered. Only one point can be written per TimeSeries per request.: timeSeries[2]

because we've got both

  • A
{
            "metric": {
                "type": "custom.googleapis.com/opencensus/oce/dev/latency",
                "labels": {
                    "client": "cli",
                    "method": "repl",
                    "opencensus_task": "[email protected]"
                }
            },
            "resource": {
                "type": "global"
            },
            "points": [
                {
                    "interval": {
                        "end_time": {
                            "seconds": 1547860312,
                            "nanos": 655706000
                        },
                        "start_time": {
                            "seconds": 1547857792,
                            "nanos": 658197000
                        }
                    },
                    "value": {
                        "Value": {
                            "DistributionValue": {
                                "count": 399,
                                "mean": 6461.507067283209,
                                "sum_of_squared_deviation": 5680369911.614502,
                                "bucket_options": {
                                    "Options": {
                                        "ExplicitBuckets": {
                                            "bounds": [
                                                0,
                                                10,
                                                50,
                                                100,
                                                200,
                                                400,
                                                800,
                                                1000,
                                                1400,
                                                2000,
                                                5000,
                                                10000
                                            ]
                                        }
                                    }
                                },
                                "bucket_counts": [
                                    0,
                                    0,
                                    1,
                                    0,
                                    3,
                                    3,
                                    21,
                                    5,
                                    17,
                                    15,
                                    89,
                                    153,
                                    92
                                ]
                            }
                        }
                    }
                }
            ]
        }

and

  • B
        {
            "metric": {
                "type": "custom.googleapis.com/opencensus/oce/dev/latency",
                "labels": {
                    "client": "cli",
                    "method": "repl",
                    "opencensus_task": "[email protected]"
                }
            },
            "resource": {
                "type": "global"
            },
            "points": [
                {
                    "interval": {
                        "end_time": {
                            "seconds": 1547860372,
                            "nanos": 653868000
                        },
                        "start_time": {
                            "seconds": 1547857792,
                            "nanos": 658197000
                        }
                    },
                    "value": {
                        "Value": {
                            "DistributionValue": {
                                "count": 409,
                                "mean": 6443.895616823964,
                                "sum_of_squared_deviation": 5882240635.357754,
                                "bucket_options": {
                                    "Options": {
                                        "ExplicitBuckets": {
                                            "bounds": [
                                                0,
                                                10,
                                                50,
                                                100,
                                                200,
                                                400,
                                                800,
                                                1000,
                                                1400,
                                                2000,
                                                5000,
                                                10000
                                            ]
                                        }
                                    }
                                },
                                "bucket_counts": [
                                    0,
                                    0,
                                    1,
                                    0,
                                    4,
                                    3,
                                    22,
                                    6,
                                    17,
                                    16,
                                    90,
                                    156,
                                    94
                                ]
                            }
                        }
                    }
                }
            ]
        }

have a metric with Type "custom.googleapis.com/opencensus/oce/dev/latency"

monitoredresource.Autodetect() only works on the first call

The monitoredresource.Autodetect() function uses a closure structure with a sync.Once.Do in an attempt to avoid executing the slow operation of detecting the application resources multiple times.

However, the way it was implemented makes the function to work only for a single call. After the first execution the function returns nil when called again.

Is that the expected behavior for this function? If so, it could be better documented.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.