Giter Site home page Giter Site logo

Comments (6)

sergioisidoro avatar sergioisidoro commented on September 28, 2024 1

Ok, maybe I need to contextualise "unreasonable" because it greatly depends on the use case.

I had a small project running a very simple VM on Digital Ocean. We had <100 visitors per day, and very little events. I deployed plausible for that project on a docker swarm. In such a small project (Postgres, Django, Redis, worker, ghost and maria for a blog, + plausible), clickhouse hogged out the disk space although there was not that many events (~50Gb if I recall).

Self hosting sometimes is small, and having small footprint projects (in memory and disk) such as Shynet is super nice for that use case. If all projects start looking for scale, and adopting dependencies with larger starting footprint (Clickhouse, Elastic, etc), the requirements for self hosting an entire small stack (eg, Service + Blog + Analytics) start to go up.

There is nothing wrong in having a large footprint when there is scale - All I'm arguing here is that if there is no scale, the footprint to self host should be minimal :)

Caveat: bear in mind that I might have done something wrong in deploying Clickhouse, since I was using mostly the defaults from the official image.

from shynet.

rallisf1 avatar rallisf1 commented on September 28, 2024 1

I know I'm late to the party, I've barely used shynet but I'd like to share my 2 cents:

  • I believe Timescale can greatly improve Postgres' performance without many code changes. It is an open source plugin, has a plug-n-play docker image (which includes Postgres) and data can be easily migrated.
  • Django can be a bottleneck, especially when handling high traffic. Decoupling the data collection endpoint with pretty much anything else (e.g. bun) will solve the performance issue without the need to change any other parts of shynet.
  • I'm leaning against client side aggregation. With all the performance boost from the above steps I don't think the added complexity will add much benefit. You could optimize the data transfers by replacing REST with gRPC streams, but that would possibly add even more complexity and you'll also need a memory buffer (Redis?) before aggregating all that data and writing them in the main db. On the other hand, you can do heatmaps or advanced fingerprinting with that type of data.
  • Don't save IPs YES, but improve geolocation.
  • IMHO a community-driven open source project can't really be compliant with anything. You can't take the blame for how everyone uses your code, at least not yet. If there was some sort of company or organization behind the project it would be different. Also keep in mind that shynet could be fully compliant by itself but the websites being tracked not be compliant (e.g. because of missing ToS/Privacy Policy and tracking consent).
  • You can never test everything. Don't sweat over full test coverage.
  • Custom Events, yes please!
  • Differential Privacy? Sure, that would make it a commercial analytics killer. You could run a laplace function straight in postgres using a cron.

from shynet.

haaavk avatar haaavk commented on September 28, 2024

First things first. Thanks a lot for building Shynet. It helped me escape from terrible Google Analytics.
Some thoughts about Shynet in not particular order:

  • I understand your worries about performance. Python and Postgres aren't good choice for huge traffic.
    There is definitely a place for "Enterprise Shynet" which should use appropriate technology.
    I really love Shynet simplicity and I benefit form it a lot. I'm afraid I will never have 1M+ request daily.
    I think there is room for both "Simple Shynet" and "Enterprise Shynet".
  • ClickHouse is a good idea for "Enterprise Shynet"
  • Aggregation on-device is interesting idea but I would start from aggregation on server.
    I think archiving old data by aggregations may be good path.
  • I don't like collecting any user data so I'm 100% for removing option to track IPs.
  • I'm not a big fan of full test coverage because tests need maintenance too.
    No tests at all is obviously no go. I may try to add some test in spare time.
  • I want custom events too. I’m planning to work on them after adding full API.
  • If there is a problem with engineering capacity I'm happy to become a maintainer and help when I can.

from shynet.

c4lliope avatar c4lliope commented on September 28, 2024

Hello, I've been running Shynet for 2! days now and I'm really happy at how simple it's been to deploy using docker-compose.

So long as Clickhouse has a docker image which can be packed easily into a docker-compose.yml file, I see no reason to hold back from adoption. https://hub.docker.com/r/clickhouse/clickhouse-server/#

In my highly-localized application, I rely on IPs to see which states people are logging in from. As a mainly-USA application, I care less than many people do about GDPR, and so I'd make a proposal here: if you could make a small engine inside the application for plugins or bespoke code, then end-users could make up the logic on a per-application basis. In my case, this could be:

  • Do IP geolocation using https://ipinfo.io
  • If in the US, record the state-or-county level identifier
  • If in the EU, record the country-level identifier
  • In other places, decide based on local laws
  • Discard or blur the IP address
  • (possibly) map the accrued locations inside the app

I like and encourage your decision on Elixir and Phoenix, this seems like a prime use case for both.

from shynet.

sergioisidoro avatar sergioisidoro commented on September 28, 2024

My 2 cents on ClickHouse:

I've used Plausible, which runs with ClickHouse, and for small projects it starts using unreasonable amounts of space. For a few hundred events per day it starts to hog unreasonable disk (maybe I'm doing something wrong). Also I'm so much more familiar with Postgres backup and restore procedures, that it was a bit of a pain to learn and setup them for Clickhouse.

So I keep coming back to Shynet as the alternative for small projects. What if this is Shynet's niche?

I do miss custom events tho... I could give it another shot at #168 if you want.

from shynet.

c4lliope avatar c4lliope commented on September 28, 2024

from shynet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.