Giter Site home page Giter Site logo

pamaxie / --old--pamaxie Goto Github PK

View Code? Open in Web Editor NEW
2.0 0.0 2.0 186.38 MB

Worker "clients" for analyzing media content. Pamaxie is a Natural Learning API for accessing several Neural Networks and hand-crafted algorithms for moderating content on platforms.

Home Page: https://pamaxie.com

License: Apache License 2.0

C# 91.69% Mustache 5.56% Python 2.75%
moderation content-filtering machine-learning internet content-detection

--old--pamaxie's People

Stargazers

 avatar  avatar

--old--pamaxie's Issues

Think about an update delivery strategy

We need to still implement an automated update download and installation. If possible this should be thought about before the initial full release of the API to make hosting it yourself easier.

Write FFMPEG Library Bindings for IFrame Extraction of Video Files

We need to extract IFRAMES from Video files for scanning. This should be done by using FFMPEG and Interop (Application Pipes) between Pamaxie and the FFmpeg process. If anyone requires help doing that feel free to reach out to me I will provide support along the way. This is a separate library and should be implemented as such (please put the library into Assemblies with all other libs). FFMPEG needs to be delivered during install as well.

Website Button Rate Limiting

Currently spamming our websites buttons (except the ones for redirects) could cause a crash of our webserver. This should not happen. Please validate every button and increase a rate limit on a session basis so people cant just spam buttons to kill our website.

Media Type Detection

We require a way to detect the type of media sent to the API. We need to validate ways on how to do this first and foremost.

Create Unit Tests

We should create unit testing for the libraries, databases, and API. Best done by someone who knows what they are doing with unit testing.

Rewrite Pamaxies Tooling

Pamaxies tooling should be rewritten to be easier to use for people, preferably containing a UI. This way people can train custom datasets more easily and quickly.

Test Configuration for CI

We need to change our way of approaching the configuration inside testing classes.
Currently we rely on appsettings.test.json, but that will fail when run by the CI.

A approach we can do, is to fill up a json string and put it into the Configuration property through code.

Currently the appsettings.test.json looks like this:

{
    "AuthData": {
        "Secret": "",
        "ExpiresInMinutes": 15
    },
    "ApiData": {
        "Instance": ""
    },
    "RedisData": {
        "ConnectionString": ""
    },
    "UserData": {
        "EmailAddress": ""
    },
    "EmailSender": {
        "EmailAddress": "",
        "Password": ""
    },
    "JwtToken": {
        "Secret": ""
    }
}

The secret in AuthData can be auto generated in the TestBase constructor.
ApiData instance and RedisData can be empty.

We need a way to have the option to test the EmailSender, but don't test it when running the CI, so just let it pass if the CI is running without actually running the EmailSender.
UserData, EmailSender are used for the EmailSender so can be empty, but with the option to be filled.

JwtToken is used to encode/decode the email confirmation link currently, this will just be filled like the Secret in AuthData, and removed once the website have been reworked a bit.

Create Media Virus Scanning

We don't only want to develop a software that scans images but also malicious files (at least to detect the worst of them). So our solution would be to interact with an Anti-Virus API. We could do software forensics ourselves, however that would take decades to build up a good database and a ton of our time, since this is a side product of our API I want to just hook into an existing database of hashes or just use a AV API.

Any recoms in this regard are welcome.

Hamming-Distance based Redis Search

Basically, a concern right now is seeing how to implement the search algorithm to find similarities through the hammering distance in Image hashes in the database. If anyone feels like implementing an algorithm for that feel free to after brainstorming an idea with everyone of course.

Think about installation strategy

Currently, the installation is very convoluted. We need to think about an installation strategy like a setup script or installer.

Seperate Pamaxies API from the Scanning

We need to seperate the API from the actual process of scanning since the process of scanning can take over 1 second which during high workloads may cause issues. I am thinking about a work handler that distributes all work tasks on a backbone dynamically via gRPC. We could then work via a queue system where work nodes take work out of the queue once they are ready.

Please, if you have another idea contact me this is quite complicated as an implementation.

Create Content Delivery Network for Application Profile Pictures and Data

We should create a content delivery network to deliver application specific data to users (granted they are able to access said resource) as well as delivering the images for our website (this can be anonymous).

This would make updating pictures on the website as easy as replacing a database value instead of hard coding them into each delivery.
This is a future request. Over the CDN we could also deliver things like Ticket Data, Tickets that users created, etc... (see #27 )
Resource files could theoretically also be delivered over a CDN but I don't think that's a feasible idea (maybe only an update system for localizations but even that seems pointless to me).

DIN 66399 Compliance

We would love compliance with DIN 66399 Data Request Policies. This has to be handled in a similar fashion by someone who understands data privacy like DIN 66398 (see #2 ).

WebClient Download implementation make stream seekable

Currently, when scanning files via URL the stream is not seekable, this needs to be changed. We cannot have a stream be nonseekable and potentially cause issues. This is why we need to make an implementation that elevates this issue.

Rework Authentication Flow for APIs

Currently the Authentication flow for the API doesn't use HTTP standards.
The issues we are currently aware about are:

  • We don't use a Basic authentication in the header for the initial authentication but we use the Body of the initial request. This is a Security concern.
  • We require an object to re-authenticate which contains the user id. This is completely unnecessary as bearer tokens can be decoded to find the user who originally authenticated to re-auth them.
  • Some of the methods use Post. Thats not correct. The authentication should use Get Methods defined by rfc7235.
  • If the login credentials are incorrect we need to return a 401 unauthorized.

See: https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication for more info and things we are maybe doing wrong.
Please fix this.

Increase Website Performance

The current performance that was measured via lighthouse is abysmal (26). We are aiming for a performance of at least 85 to 90 with Pamaxies website.

Seperate Sql and Redis from Pamaxie.Database.Extensions

We have to seperate the Sql and Redis parts from Pamaxie.Database.Extensions, since Redis uses PamaxieML.Model, which uses TensorFlow library. Currently the website uses the Extensions library and will be refering to TensorFlow too, but we want to avoid having to copy a unnecessary library every time we publish.

Use / Update APIs to use proper status codes

We need to adhere to HTTP status code standards and not just return what ever we want.
If you need help on fixing this status codes can be found here:
https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
Its OK to use unofficial codes as long as they are unique and describe the error clearly.
Creating new Status codes should be documented here. Also make sure they do NOT conflict with existing status codes of any of the status codes on that website (official or not doesn't matter).

Verify emails of new signups

Should be relatively easy. Just send out an email when a new user is created, and add a data entry for the users if the email has been verified or not. If not they can't make a new application.

Rework database to utelize JUST Redis

We want to go away from having a seperate SQL database for the Website. This means all of our stuff should use JUST Redis, please think of a data storage concept n stuff if you feel like it and post it here.

Introduce Application Based Rate Limiting

We need to rate limit applications so they cannot just shoot our API from the internet.
This has to be implemented before making logging into the website possible for everyone.

This would also result in having a rate limit increase button to ask for increases in an applications rate limit (can be triggered once it was hit once)

Rewrite Pamaxie.Leecher

Pamaxie.Leecher needs a rewrite to feature UI similar to #39 .
Pamaxie.Leecher is designed to be a crawler for media that we use for training or validating our neural network. This should automatically crawl through the internet and attempt to auto sort things into folders which we can later validate. This should impelement a way to run on multiple servers since this can be quite an intensive process.
This is sadly the only viable way we could think of of automatically gathering data. We validated the legality of webcrawlers and for research purposes like ours it seems to be legal. Before running them we will consult with a lawyer again with our software to make sure this is legal.

Create Managment Interface Website

We need a management interface for the website to manage APIs. This should allow us to also restart API nodes services and update them (see #14). Also should show the current status of nodes and requests per second each one of them is handling (if possible).

Create way for users to submit false positives / negatives

We should allow users to submit false positives or negatives easily. So that if they spot a hole in our system we can fix it quickly. We require PhotoDNA for this. This is not an optional step since we can have illegal photography on our servers and stored permanently. If someone wants to do this please reach out to [email protected] first so we can discuss a solution and talk with Microsoft beforehand.

Create Installation Script / Program

We need software to make the installation of our software easier. I will handle that but I'm happy to take suggestions. The current plan is to make a self bundled executable C# package that deletes itself after it downloaded and installed everything. Supported platforms will only be Linux (this software isn't really meant to be run under windows anyway).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.