Giter Site home page Giter Site logo

apache / flagon-distill Goto Github PK

View Code? Open in Web Editor NEW
10.0 10.0 12.0 15.49 MB

Apache Flagon Distill is a python package to support and analyze Flagon UserAle.js logs

Home Page: https://flagon.apache.org/

License: Apache License 2.0

Python 98.53% CSS 0.57% Makefile 0.91%
flagon apache behavioral-analytics business-analytics usability usage user-monitoring python jupyter pypi behavioral-sciences

flagon-distill's Introduction

Apache Flagon Distill

Documentation Status

This project is a work in progress, prior to an official Apache Software Foundation release. Check back soon for important updates.

Please see our readthedocs.org pages for documentation.

A contribution guide has been provided here.

Installation

To install and set up the Python project, Distill uses Poetry, a dependency and package management tool. Poetry simplifies the management of project dependencies and virtual environments, ensuring consistent and reproducible builds.

Prerequisites

Before you begin, make sure you have the following prerequisites installed on your system:

  • Python (>= 3.8)
  • Poetry (>= 1.0)

You can check your Python version by running:

python --version

This will return the version of Python installed on your system. If you do not have Python installed, you can download it from the official website. However, we recommend using a Python version manager such as pyenv. You can refer to this guide for setting it up: pyenv guide.

You can install Poetry a number of ways (see the Poetry docs for all methods). We recommend installing one of the following two ways:

Official Installer:

Linux, macOS, Windows (WSL)

curl -sSL https://install.python-poetry.org | python3 -

Windows (Powershell)

(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -

pipx:

pipx install poetry

The above two methods should minimize the chances of dependency conflicts with your system Python (global) installation. Some users have reported issues with Poetry using an incorrect Python environment instead of the project's local virtual environment when using regular pip method. If you run into issues, please refer to the official Poetry docs or Github for more in-depth installation instructions.

Installation Steps

Follow these steps to set up and install the project:

  1. Clone the repository:

    git clone https://github.com/apache/flagon-distill.git
  2. Navigate to the project directory:

    cd flagon-distill
  3. Use Poetry to install project dependencies and create a virtual environment:

    poetry install

    This command reads the pyproject.toml file and installs all required packages into a dedicated virtual environment.

  4. Activate the virtual environment:

    poetry shell

    You are now inside the project's virtual environment, which isolates the project's dependencies from your system-wide Python packages.

  5. Run the tests:

    You can now run the tests to make sure everything installed properly. For example:

    make test

    Remember that you need to activate the virtual environment (step 4) each time you work on the project.

Updating Dependencies

To update project dependencies, you can use the following command:

poetry update

This command updates the pyproject.toml file with the latest compatible versions of the packages.

Uninstalling

To uninstall the project and its dependencies, simply deactivate the virtual environment (if activated) by typing:

exit

This will exit the virtual environment. You can then safely delete the project directory.

By following these installation steps, you can easily set up and manage the Python project using Poetry. Enjoy coding!

flagon-distill's People

Contributors

amirmghaemi avatar dependabot[bot] avatar eandrewjones avatar grtnation avatar hungryarthi avatar jyyjy avatar krassmann12 avatar lewismc avatar mdiep-cese avatar michellebeard avatar poorejc avatar vl8x avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flagon-distill's Issues

getUUID refactor

This indeed fixes the problem. However, it's unclear why we need to construct an id from the individual field values in the log.

Questions:

* How is this function used elsewhere in the codebase?

* Do we want/need to be able to parse the id to retrieve those values in some way?

* Do we want our ids to have a partial-logical ordering?

If the only requirement is to generate a uuid, then why not just use str(uuid.uuid4()) and call it a day? We'll never get a collision.

Tagging @Jyyjy or @amirmghaemi

As the package is written, getUUID has to return the same value when the same log is passed in. str(uuid.uuid4()) will create a different uuid when the same log is passed. hash() might be a better option.

  1. getUUID isn't really used within the distill package. Users are expected to use it to create a dictionary mapping UUID to logs. Then that dictionary is whats passed to the segmentation functions. This is one of the biggest pains of working with distill, you have to manage the UUID's and dictionary of logs yourself.
  2. No, all that info is in the log, which the UUID (assuming the user set things up correctly) maps to.
  3. Not sure exactly what you mean. One of the assumptions of the segmentation functions is that the user sorts the log dictionary by clienttime.

Also, the reason @mdiep-cese ran into this issue is that interval logs have some inconsistencies in userale, and we have historically filtered out all interval logs. I'm not sure about the details, but that's been josh's guidance. This may be the relevant ticket. But my point is that nothing in this package is built to deal with interval logs.

Distill is even less mature than userale. The upside is that we can change things a lot without really affecting anyone. I'm team fresh rewrite.

Edit: found an old PR which sparked a discussion about this last year
UMD-ARLIS#18

Originally posted by @Jyyjy in #29 (comment)

Migrate from rst to markdown

The current release uses a mixture of .rst files (held over from pre-release code) and .MD files (introduced in first release) for documentation. Using a mixture of markup languages for documentation leads to confusion, especially for new contributors.

I recommend migrating everything to MD for simplicity's sake.

Add function to segment logs into user sessions

What are "User Sessions"?

Most user behavior services provide some definition of a user "session" and then segment the log stream into sessions for further behavior. For example, LogRocket defines a session as:

A session is a series of user interactions on your site, beginning with the first page they visit and ending with either:
a.) a period of inactivity lasting longer than 30 minutes, or
b.) after the user has navigated away from your app for more than 2 minutes. This includes closing the tab or navigating to a different domain on the tab.

"Activity" is defined as any user mouse movement, clicks, or scrolls.

As an example, if your user visits your landing page, then your app, and then refreshes the page all within 30 minutes of each other without closing the tab, the entire experience is recorded in a single session. If the user returns back to your site after another hour, a new session recording starts from the moment that they do the first action.

LogRocket sessions also support recording across multiple tabs, so a user opening a link in your app in a new tab will count as the same session. This means that if your app is running in multiple tabs, each tab would need to be navigated away from in order to end a session after 2 minutes. Otherwise, it wouldn't end until a period of inactivity across all tabs lasting longer than 30 minutes.

Why do we need "User Sessions"?

Sessions are a particularly useful unit by which to analyze user behavior since they represent a logical clustering of activity. Answers to simple questions such as:

  • How long did the user's first session last?
  • How long are a user's session, on average?
  • What actions did the user perform in their session?
    all provide quite a bit of insight into whether and how users engage with an application. Generally speaking, they are a great entry point to begin building one's understanding of UX in your app.

Proposed change

We should add a method that segregates the entire log stream into appropriate session buckets according to some definition of a "user session." It need not necessarily be the LogRocket definition shared above; however, I am proposing that as a reasonable starting point.

Deprecated package files

There are still some leftover files from before we migrated to Poetry for the official release:

  • setup.py
  • setup.cfg

among others, to name a few. These should be removed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.