Giter Site home page Giter Site logo

sampling_102017182's Introduction

SAMPLING By Sandhya Goyal 102017182:

Sampling assignment

Sampling Definition

Sampling of a dataset is the process of selecting a subset of data from a larger dataset for the purpose of analysis or modeling. In other words, it involves taking a representative portion of the data that can be used to draw conclusions about the larger dataset.

Formula Used :

z = 1.96 95% confidence

e = 0.05

p = 0.05 # 5% frauds

n = (z^2 * p * (1-p) )/(e^2)

There are several types of sampling methods, including:

Random Sampling:

This involves selecting a random subset of the data without any bias.

Stratified Sampling:

This involves dividing the dataset into strata (groups) based on some characteristic, and then randomly selecting a sample from each stratum.

Cluster Sampling:

This involves dividing the dataset into clusters (groups) based on some characteristic, and then randomly selecting a sample of clusters to analyze.

Systematic Sampling:

This involves selecting every nth item in a dataset, where n is a predetermined interval.

The choice of sampling method depends on the research question, the available data, and the analysis techniques being used. Sampling can help to reduce the time and resources required for data analysis while still providing accurate and reliable results.

Analysis :

The dataset used is highly imbalanced. The number 0's were 763 and number of 1's were 9. As we can see sampling is needed as trainig the model on the current Dataset will only give 0 for all input . So we have used Databalancing:

1. SMOTE

2. Randomoversampling

3. TomekLinks

4. RandomUndersampling

5. NearMiss

Sampling used are as follow:

1. Simple / Random Sampling (RS)

2. Proportional Stratified Sampling (PSS)

3. Disproportional Stratified Sampling (DSS)

4. Cluster Sampling (CS)

5. Systematic Sampling (SS)

Model used are:

1. Support Vector Machine (SVM)

2. Decision Tree Classifier (DT)

3. K-Nearest Neighbours (KNN)

4. Naive Bayes (NB)

5. Random Forest (RF)

Model Evaluation:

The accuracy of these models are as follow :

RS SS CS PSS DSS
SVM 0.8888 0.9000 0.9000 0.9301 0.9301
DT 0.7777 0.7000 0.9685 0.9781 0.9781
KNN 0.5555 0.6000 0.8455 0.8209 0.8209
NB 0.6666 1.0000 0.8481 0.8995 0.8995
RF 0.8333 0.7000 0.9895 0.9956 0.9956

sampling_102017182's People

Contributors

sandhya-goyal avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.