Comments (7)
It looks like we can have "Rows Removed by XXX" where XXX can be either:
- "Filter",
- "Join Filter",
- "Index Recheck".
There are cases where we have 2 of those counts (for example Filter
+ Index Recheck
).
Should I sum them up to get the total of discarded rows?
Or should I treat the different cases differently and only concentrate on "by Filter" for now?
from pev2.
by Join Filter
and by Index Recheck
have different meaning. I will not take them into account for now.
from pev2.
Here are several alternatives to show the information. I personnaly prefer the last one.
from pev2.
I think that the ratio is the most important, so it should appear in any case. In this example it is obvious that 190165 >> 522 , but is not obvious at first sight that 91234567899 >> 9123456799 (or you must round to have 91 billions vs 9 billions).
Keep the color on the 190165 if you prefer (it's only there to catch the eye after all).
So I'd vote for the 1st or a mix with the others:
Rows removed by FIlter : **190165** out of 190687 scanned rows (99%), 522 returned
But this is maybe too much on one line.
from pev2.
About "Rows Removed by Join Filter'': I think this is a similar problem, an indication that some filtering occur that an index may perhaps solve.
''Index Recheck'' is another beast, this is more a problem of index efficiency (a bit like the Heap fetches in #99), that may be interesting too.
from pev2.
Here's an additional example showing that the number of rows actually returned should be taken in the direct parent not the gather node.
explain (analyze) select sum(i) from t_un_million_int where i+0 = 500000 ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=11680.42..11680.43 rows=1 width=8) (actual time=56.924..56.925 rows=1 loops=1)
-> Gather (cost=11680.21..11680.42 rows=2 width=8) (actual time=56.657..63.498 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=10680.21..10680.22 rows=1 width=8) (actual time=49.794..49.795 rows=1 loops=3)
-> Parallel Seq Scan on t_un_million_int (cost=0.00..10675.00 rows=2083 width=4) (actual time=42.558..49.772 rows=0 loops=3)
Filter: ((i + 0) = 500000)
Rows Removed by Filter: 333333
Planning Time: 0.129 ms
Execution Time: 63.581 ms
(10 lignes)
from pev2.
After a deeper investigation I think it's a bad idea to take parent nodes into account to count the number of actual rows. First of all, the number of actual rows is indeed the average number of rows for each loop. And not only in the case of parallel nodes. It has to be multiplied by the number of loops in order to get the number of rows returned by the node. It is imprecise though since the average value is rounded. In some cases the average value is 0. But it's impossible to know what's the actual value for the node itself.
Nevertheless, it is in my opinion not very safe to guess the value from the parent node. Because it may be very different from one node type to another.
So I decided to use the values that are given in the plan without trying to guess more. In some cases, the UI will show "100% of rows removed" which may not be very accurate. When it's not accurate (because of several loops), the UI will show that the values are approximated.
As suggested by @ioguix, I also took the duration into account to show and highlight the "rows removed ratio". The ratio itself is not very relevant. It can be very high but with almost no impact on the query duration. It is relevant if the relative duration is high as well.
For example, if a node takes 1% of the whole execution time, we don't really care if 90% of the rows were removed for this node. However, 50% of rows removed in a node taking 95% of the execution time must be taken into account.
from pev2.
Related Issues (20)
- Warn the user in case the plan is very big
- pev2 fails to reconstruct a copy-pasted plan HOT 3
- Shared blocks stats query summary HOT 1
- Exporting plan as picture HOT 1
- Partitions aggregation
- I/O timing and buffer info is missing HOT 1
- Show the structure of the query
- Impossible to expand worker detail
- Easy copy and paste of a table name & other things
- Node detail is missing info when parsed from text (on "ANY_subquery")
- Small encoding issue in Output tab HOT 1
- Parsing error of Execution Time and JIT time because of double quptes
- IO times along Execution & planning times along throughput HOT 1
- Cannot click in Workers in a parallelized plan HOT 2
- Number of workers launched is not displayed for other nodes than Gather HOT 1
- Show heap fetches as percent % HOT 2
- Multiple versions of a plan
- FR: vscode extension HOT 1
- Dashboard to automatically visualize long-running query plans
- Can't parse plan on Firefox (large value for Index Cond)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pev2.