Giter Site home page Giter Site logo

sptm's Introduction

Software Project

Description

The goal of this project is to classify books in a genre using a maschine learning algorithm.

Software

Dependencies

  • SciKit-Learn
  • Pandas
  • SpaCy
  • NLTK

Features

  • Words per sentence
  • Number of Sentences
  • Relative frequency of nouns, adjectives and verbs
  • Number of Symbols
  • Degree of correspondence with created wordbooks of each genre

sptm's People

Contributors

rotrixx avatar jdommers avatar michelle19827 avatar sophie714 avatar leschu avatar vertex2go avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

sophie714

sptm's Issues

improve wordbooks

-make dictionary add weight based on in how many other wordbooks the word is

Problem with lemmatization

Textblob-de dosent support lemmatization yet or its not working correctly

  • use textblob (english) and translate between them (may be performance heavy)
    -write own lemmanizer with big databse (may be also performance heavy)

Fixing Radius Neighbor

The confusion matrix for Radius Neighbor Classification shows that almost all the books are classified as Literatur/Unterhaltung.
The way this classifier works: It looks at a certain radius around the specific book, that is to be classified. Now when calling the function we have set a fixed radius and we set a "default class" (L/U), which the book is assigned to, in case no other neighbor is found inside of that radius.

PROBLEM:

EITHER: The radius is wayyy too small, so that every time we check for neighbors, no other neighbors are found and we just assign the default class "Literatur und Unterhaltung".

OR: The radius is wayyy too big, so that every time we check for neighbors, hundreds or thousands of neighbors of other classes are inside of that radius. "L/U" being the most represented class in the training set, would always be the majority inside of that big radius.

ERGO: Make the fucking radius bigger or smaller and check if we get other results :)

Maybe I find the motivation to download all of that shit and try it myself. Maybe, but not today :)

In case we already tried that... ignore everything above.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.