Giter Site home page Giter Site logo

pgindexes's Introduction

Postgres Indexes for Python Programmers

Setup

dropdb pgindexes
createdb -E utf8 pgindexes 

What is an index?

An index is just a data structure that makes the searching faster for a specific column in a database.

What would happen without an index?

To see if the row value matched the query, database software would have to look at every single row in the table.

Why not index every column?

It takes up space, and the bigger your table is, the bigger your index will be. Another performance hit with indexes is that you must do the same actions on your index whenever you add, delete, or update entries in the corresponding table. Keep in mind that an index must include the same up-to-date data as whatever is in the table column covered by the index.

Postgres Index Types

Hash

A hash index, inverts the relationship between a primary key and a value in the table.

Supported Operators

  • =

B-tree

Balanced tree, not binary tree.

  • Optimised for parallel index scans
  • Very shallow unlike normal binary trees
  • Cheap to keep balanced
  • Sorted which allows for the < / > operators.

Official docs: https://www.postgresql.org/docs/current/btree-intro.html

Good explainer: https://rahulreddyr3.medium.com/indexing-in-postgresql-b-tree-4377bd72d7e5

Supported Operators

  • <
  • <=
  • =
  • >=
  • BETWEEN
  • IN
  • IS NULL
  • IS NOT NULL

GIN

Generalised Inverted Indexes

  • Most commonly used for full text searching
  • Good for when you have multiple values in a column, e.g array, jsonb types.

Further reading:

Table

pkmenu item
1chicken burger
2chickpea burger

Index

valpk
chicken1
chickpea2
burger[1, 2]

Supported operators

  • <@
  • @>
  • =
  • &&

GiST

Generalised Search Tree

Good for when a value of a row can overlap with the same column in another row. Think polygons on a map, or to return only rows where the polygon contains a point.

GiST indexes can also be used on text columns for full text search similar to GINs but there are some fixed size constrains for GiST indexes.

Further reading: https://medium.com/postgres-professional/indexes-in-postgresql-5-gist-86e19781b5db

./poly.jpg

SP-GiST

Space partitioned GiST

As the name implies this index type is similar to GiST, but this works better for data where the values can be highly clustered like post codes or IP addresses.

BRIN

Block range indexes

When you have a huge dataset that is organised by dates, such as log events, BRIN indexes allow you to rapidly skip or eliminate a lot of the irrelevant data. BRIN indexes are also kept as smaller indexes in relation to the entire data size, making them an excellent choice for huge datasets.

Docs: https://www.postgresql.org/docs/current/brin-intro.html

Python Implementation

Example: A database table storing logins

Rows

Rows will be represented as a namedtuple.

from collections import namedtuple

Login = namedtuple('Login', ['email', 'date', 'user_agent', 'ip'])

Tables

Tables will be represented as a Python list

[
    Login(

        email='[email protected]',
        date=datetime.date(2010, 4, 13),
        user_agent='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/531.0 (KHTML, like Gecko) Chrome/54.0.869.0 Safari/531.0',
        ip='44.14.199.207'
    ),
    Login(
        email='[email protected]',
        date=datetime.date(2013, 11, 3),
        user_agent='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/531.0 (KHTML, like Gecko) Chrome/94.0.822.0 Safari/541.0',
        ip='22.12.189.17'
    )
]

Postgres equivalent

CREATE TABLE IF NOT EXISTS logins (
       id serial primary key,
       email VARCHAR(100) unique not null,
       date timestamp not null default NOW(),
       user_agent VARCHAR(1000) not null,
       ip inet not null
);
CREATE TABLE

Populate a “table”

from pprint import pprint
import faker
import random

fake = faker.Faker()

def make_row():
    return Login(
        fake.unique.email(),
        fake.date_object(),
        fake.chrome(),
        fake.ipv4_public()
    )

table = [make_row() for _ in range(10000)]
random_login = random.choice(table)
last_login = table[-1]
pprint(last_login)

Find a login by email address

from timeit import timeit

def find_by_email(t, email):
    for x in t:
        if x.email == email:
            return x
timeit(
    'find_by_email(table, last_login.email)',
    globals=globals(),
    number=10000
)

With a hash index

h_i = {login.email: i for i, login in enumerate(table)}
def find_by_email_with_hash_index(t, email, hash_index):
    return table[hash_index[email]]

# peek at the index
pprint(dict(list(h_i.items())[:3]))
timeit(
    'find_by_email_with_hash_index(table, last_login.email, h_i)',
    globals=globals(),
    number=10000
)

pgindexes's People

Contributors

r4vi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.