Comments (18)
My apologies for taking so long to get back to this. Re-reading your #36 (comment) @inge4pres , are you saying you are trying to view trace
type profiles? If so I think this is our mistake, as we don't actually support viewing trace profiles, as we never ended up finishing #56 / #61 .
I think we should probably disable scraping trace profiles by default until we have this support implemented, to avoid confusing situations like this (if this is the case).
Regarding your last comment, do you have a sequence of steps to run to reproduce this? I can't by just running a conprof instance from scratch, however, if I run conprof, collect some data, shut it down, and then start it back up, I can indeed reproduce some failed to fetch any source profiles
errors. I'll investigate why this is happening.
from parca.
Quick update: It appears I have found part of the problem, there was an unsafe loading of data in the WAL. So the good news is that there is no corruption of the data on disk. Bad news is that for some reason new appends still yield this error, but when restarted those appends can be viewed without issues. I'm continuing to investigate.
I'll open a PR as soon as I fix the remaining issue.
from parca.
I finally managed to find that last bug, and all new e2e tests are passing with this patch: #113
Thank you so much everyone for bearing with me!
from parca.
Yes I've noticed this as well, I haven't fully figured out why this happens, as we do parse profiles before appending them. Maybe we need to do run further validations on the profile rather than just parsing it. Maybe an empty profile is a valid one, I'm not sure.
from parca.
I just double checked, and honestly I don't understand how those samples get persisted, as parsing the profile should encounter the exact same issue judging by: https://github.com/google/pprof/blob/27840fff0d09770c422884093a210ac5ce453ea6/profile/profile.go#L167-L177
from parca.
We hit this as well. I think the stored data is getting corrupted: I've seen a particular profile go from readable (because it rendered nicely) to "failed to fetch any source profiles"
. A large proportion of our samples right now are failing...
from parca.
Thanks for reporting! I believe since the last storage rebase this started happening for data older than 8h, potentially even 2h.
Iām gonna try to build some data integrity tooling to test this over time.
from parca.
Hey I'm using the latest
version too and facing this same issue for data just collected: how to debug, is there an option to add verbose logging?
Can you please also point me to a document/part of code that handles the scraping, I am not sure if I need to configure the targest with the HTTP host-port only or if I need to add the /debug
URI too...
Thanks for this beautiful tool šš¼
Additional context: I see this log line when issuing the HTTP request to visualize a single trace in pprof-ui
2020/11/18 16:54:30 : parsing profile: unrecognized profile format
from parca.
One more hint on this: when switching to version master-2020-11-04-ce50636
this log line appears
2020/11/18 17:09:35 : decompressing profile: gzip: invalid header
But removing the old tsdb storage and recreating from scratch, that version works on the first data point of all inputs (heap, goroutines, etc...) but not on all subsequent data points š¤
Might be that if a profile collection times out its format is somehow stored corrupted in the the time series?
from parca.
Hi @brancz the steps you describe are exactly the one needed to reproduce, for some reasons it's like the tsdb files becomes unreadable?
from parca.
I have two hunches right now, either the chunks mmap'ed to disk are somehow corrupted, or when WAL replay happens they get corrupted in some way. The later would be better for us as that would mean the storage isn't corrupted, just the way we load them. Once I have a better idea I'll report back here! :)
Thank you so much for reporting!
from parca.
If you can give a pointer to a piece of code and/or a test to debug, I'd be happy to fork and see if I can help š
from parca.
I just opened a PR to fix most problems that I found so far in the tsdb (conprof/db#2), but the later issue is still present though. To debug I clone both repos in the same directory and then use a replace directive in the conprof repo to use the local tsdb.
from parca.
I just wrote an extensive test that seems to indicate that the database is functioning just fine. I'll continue investigating further up the stack.
from parca.
I tried a number of things, and definitely found a couple of small problems, but none of those ended up fixing this symptom. For what it's worth I was finally able to write a test to reproduce this.
from parca.
I think I found the last "failed to fetch" errors with: #112.
There is still a remaining problem though, which is after a restart, previous series don't seem to be continued for some reason. At least now we're in a state where all data is viewable and queryable though (with more tests to prevent this from happening again in the future)! š
(unfortunately we're being rate limited heavily in our CI environment by docker, so it may take a couple of hours until images are available; I'll look into moving to github actions to prevent this)
edit: looks like at least the amd64 image managed to push
from parca.
With #112 and #113 merged I think we can close this. Thank you everyone for reporting, and please open new issues if you find anything else or if you think this isn't resolved with the latest versions!
from parca.
Thanks! Will test asap and send some feedback
from parca.
Related Issues (20)
- cross compiling for ARM support HOT 1
- add qps and bytes limiter for presigned client
- Special characters in search bar get lost when changing UI settings HOT 1
- Java appliacation most functions cannot be parsed
- No long-term profiles visible: invalid magic footer of parquet file HOT 4
- [c++] after changing stripped binary to unstripped (with the same Build Id) parca doesn't run symbolizer on it (again) HOT 3
- Separate Normalizer and Ingester Components
- msg="failed to read config" path=parca.yaml HOT 1
- No symbols in Node.JS application HOT 15
- Feature request: support `now` keyword in the time range
- Figure out the reasoning for the magic numbers used in the metrics range selection and derive it from some global parameter instead of hard-coding it.
- Meticulous logo broken in README
- Symbols not found for rust application HOT 3
- bug: debuginfod logger never inited HOT 2
- `pkg/query`: Data races
- Support basic auth for parca grafana data sourse plugin
- memory usage of symbolizer's linerCache? HOT 4
- Profile has no samples HOT 4
- Rrelease 0.21.0 is missing binaries HOT 2
- [Question] Why a location shows different file/function line number at different timestamps?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parca.