Giter Site home page Giter Site logo

UnclePhilip's Projects

air-quality-dataset icon air-quality-dataset

datasource for 'Air Quality' dataset :http://archive.ics.uci.edu/ml/machine-learning-databases/00360/. from IPython.display import display from IPython.display import Image. import pandas as pd import numpy as np from sklearn.impute import SimpleImputer

binance-api icon binance-api

A python library that implements the Binance Exchange REST API and Web socket communication

census_income icon census_income

Instructions Use the attached "Adult" data set (http://arcchive.ics.uci.edu/ml/datasets/Census+Income) of census data collected to predict income for the following steps. The basic idea is to use the apply() function (Chapter 9) to clean the data, and the split-apply-combine pattern (Chapter 10) to analyze it. 1. Similar to last week, replace '-' with spaces, where appropriate, using the apply() function. 2. Determine how to deal with missing values (if any) and use apply() to make the changes.i 3. Use apply() with Use Defined Functions (UDFs) to analyze missing values, similar to page 178 (if appropriate). 4. Use the grouping and aggregation methods in Chapter 10 to analyze data vs. income in several different ways. FOR EXAMPLE: education vs. income, job vs. income, job & education vs. income... etc. (This is NOT an exhaustive list. I expect you to do more). Remember to document your steps and reasoning using markdown cells.

csv-to-json_and_json-to-csv_roundtrip_converter icon csv-to-json_and_json-to-csv_roundtrip_converter

we examined CSV and JSON file formats. We wrote code to manually convert a specific CSV file to a specific JSON in the process. We the functions to do a "round-trip" (CSV->JSON->CSV or JSON->CSV->JSON) on the Consumer Complaint Database data found at https://catalog.data.gov/dataset/consumer-complaint-database#topic=consumer_navigation

data_assembly_and_missing_data icon data_assembly_and_missing_data

Using Data Assembly to Tidy Data, Add rows, Add columns, and Merge Data. Using Missing Data strategies to find & deal with missing data, use sklearn, SimpleImputer, import modules, joins/merge dataframes, and concat tables based on unique identifiers. Source: https://github.com/chendaniely/pandas_for_everyone/tree/master/data

denver_international_airport_climate_data icon denver_international_airport_climate_data

The attached data set is climate data from Denver International Airport for the first half of February, 2019. Drop any columns you deem unnecessary. Set the date column as the index of the DataFrame. Create an "Elapsed Time" column that shows the amount of time since the first observation. Format the "Elapsed Time" column into some easily-readable form. For example, after two hours, the column should NOT read 7200. Do all the things we've already been doing -- format the headings, deal with missing values, etc. Perform analysis with the tools we've looked at so far. Keep in mind that the data may have to be grouped to be meaningful (average temp per day may be more useful than average for the whole two weeks, for example). Justify your analysis choices. Deliverable is your Jupyter notebook. Remember, just attach the notebook. Don't change the file extension and don't zip it."

docs-pages icon docs-pages

The hosted static files for the Holochain developer documentation

dow_jones_index_full_analysis icon dow_jones_index_full_analysis

The purpose of this lab is to use models to look for relationships between observed features and their outcomes. Based on the content of the dataset, it would be intersting to see if there is any sort of correlation between some cruicial variables in this dataset. At first glance, this is a pretty basic dataset, but after running the dataset through some of the methods we will demonstrate, we will look to find unique observations in the dataset. We will look to aggregate some information by unique stock, look to find correlation in the dataset that could lead to uniique understanding or deeper analysis, and also we will hope to uncover information that could possibly lead to future understandings/correlations by training data and test sets to look for unique inear regression models. Potentially we will discover something groundbreaking and will lead to learning events from the past that can elude to future events occuring? We will see! We will primarily look at volume and it's affect on other variables.

floodrisk icon floodrisk

Study & assessment of probable impact of catastrophic flood events & manage flood risk with a first order flood-fill model developed using Python geospatial libraries

gapminder icon gapminder

Excerpt from the Gapminder data, as an R data package and in plain text delimited form

holo-nixpkgs icon holo-nixpkgs

Modules, packages and profiles that drive Holo, Holochain, and HoloPortOS

hypothesis_testing icon hypothesis_testing

Using R studio, we will perform a paired t-test of two means of sample populations. By comparing the means of the datasets, we are searching for the equality of means of the 2 samples with unknown variances, to see if the sets of data are somehow related. In this exercise, we will test the effectiveness of a new training method used by a new athletic trainer at a school. This scenario shows 10 runners, before and after training results, with two different new coaches using the same population every time. We will be testing to see if there are variances between the 2 new coaches by comparing their means before and after training.

khronos icon khronos

A flexible python library for building your own cron-like system, with REST APIs and a Web UI.

national_center_for_immunization_and_respiratory_diseases_about_national_immunizations_in_children icon national_center_for_immunization_and_respiratory_diseases_about_national_immunizations_in_children

Retrieve all of the data within nispuf14.dat and store it in a more accessible format Accessible format can be any of the following: csv file json file relational database For this assignment, feel free to use a dataframe (python library Pandas) for intermediate steps. We will work with 2 datasets this week: NISPUF14_CODEBOOK.PDF & nispuf14.dat (attached) from the National Center for Immunization and Respiratory Diseases about National Immunizations in Children. We will need to read the .pdf file to be able to better understand the .dat file, as we will outline below. NISPUF14_CODEBOOK.PDF is a PDF that contains a description of the format for the data in nispuf14.dat. In other words, the PDF tells you how to read the data in nispuf14.dat. Why would we need a PDF to tell us how to read our data? Well, this data file is stored in a positional format. This means that both the value and relative position of each character provides meaning within the dataset.

rest_api icon rest_api

REST API using Flask, triggering workflow DAGs in Apache Airflow upon request; while CouchDB allows end user application to query the state of request via API and the scripts in Airflow update their status triggered by REST calls in the workflow with Dockers

uci_ml_archive icon uci_ml_archive

from the Bank Marketing data set from the UCI ML Archive (http://archive.ics.uci.edu/ml/datasets/Bank+Marketing). The data set has 20 feature columns plus one result column and we need to do some work to get it ready for further processing. 1. Reference the bank-additional-names.txt file for column types and what the names mean. 2. Make the following changes: Change column names to remove abbreviations, capitalize, add spaces, and generally make the names more "meaningful" to casual readers. Change column types to match the associated feature types. Replace word separators in strings like "-" or "." with spaces. 3. Missing Attribute Values: There are several missing values in some categorical attributes, all coded with the "unknown" label. These missing values can be treated as a possible class label or using deletion or imputation techniques.

wholesale_customer_data icon wholesale_customer_data

dataset used 'Wholesale customers data' from UCI Machine Learning Repository is from this source: http://archive.ics.uci.edu/ml/datasets/Wholesale+customers. This script will create DataFrames, filter, aggregate, silce, groupby, and compare variables using Pandas functions

worldbank_gdp icon worldbank_gdp

We will be using the "government expenditure on education" dataset as used in the intro. We will be deleting useless columns and analyze the data and evaluate how reshaping the dataset will help change and make analysis easier. We use the dataset found here: https://databank.worldbank.org/source/education-statistics-%5e-all-indicators and look at GDP % spending by country, verbatim to the instructions to avoid confusion. We will discuss how we tidy the data and reshape it throughout to make the dataset easier and better fot analysis as we progress. import os import requests import pandas as pd import numpy as np

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.