Giter Site home page Giter Site logo

notebooks-1's Introduction

Notebooks

This is the repo for my projects, both finished and in-progress. Here are the ones you should check out:

In this project, I:

  • Clean text data (news article titles and headlines from this paper)
  • Use Word2Vec to create word embeddings, and visualize word clusters on a t-SNE plot
  • Create several illuminating visualizations of popularity and sentiment using Seaborn
  • Do the same with titles, by averaging the word vectors in each title
  • Use model stacking to engineer new features, with the goal of improving performance for a larger popularity model
  • Train a model based on title embedding, topic, time since publishing, and sentiment, in order to predict the article's popularity on Facebook

I am no longer actively working on this project, but future directions would include further feature engineering and perhaps joining external data to improve the accuracy of the popularity model.

At work, I've been analyzing a lot of survey data to produce insights for the teams who need it. I came up with a few tricks specific to producing massive amounts of charts and plots for answering various questions, particularly for working with the data as it is structured when exported from SurveyMonkey. Mostly, it involves some setup with pandas, then writing a few carefully-designed functions to output the desired results. Personally, I've found working on survey data to be quite fun, and I hope this tutorial is helpful to anyone out there who's looking to provide more value to their org while sharpening their Python data manipulation skills at the same time. Disclaimer: there may well be a better way of doing things; I wrote these to get the analysis done quickly, as I work in a fast-paced startup environment!

Also, please note that the notebook uses randomly generated data, not data from my employer.

This is an exploration of Altair, a new plotting library built on top of Vega/Vega-Lite. It is a -very- nice interface for building modern-looking, interactive visualizations. Altair provides an idiomatic API, adding interactivity and tooltips into charts easily, intelligent interpretation of variables, swift within-call aggregations, no more subplotting headaches (chart concatenation is extremely straightforward), and more!

Sadly, the interactivity doesn't seem to work on GitHub or nbviewer, so please fork the notebook to your own machine (or visit the Altair documentation) if you'd like to play around with that.

Includes:

  • Preprocessing the text data (requires significant preprocessing, incl. regex, due to the raw LaTeX format of the papers)
  • Creating a feature matrix, using both NMF (Nonnegative Matrix Factorization) and LDA (Latent Dirichlet Allocation)
  • Finding topic groups using the feature matrices
  • Clustering the documents themselves w/ K-Means

I may come back to this project and try to remove some more of the LaTeX artifacts now that I've had more experience with regular expressions. (I use regex in the project, but it is only partly effective.)

Recently, I had a take-home case study for an interview. Because I didn't have access to a database, but I wanted to be certain that my SQL queries were correct, I decided to create my own database using sqlite3 and write a function to generate data similar to that which I'd be working with on the job.

Includes:

  • Setting up a SQL database using sqlite3, creating your first table
  • Writing a function to reproducibly generate random data, including dates
  • Best practices, explanation of SQL syntax and why the queries work
  • Sanity checks for ensuring the queries produced the correct results

This project uses the same dataset as the Word2Vec project. It includes:

  • Seaborn visualization of article sentiment by topic
  • Defining a function to identify the most positive and negative headline by topic

notebooks-1's People

Contributors

chambliss avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.