chandana124,Chandana Reddy,github

building-model-on-unsupervised-data-kmeans-heirarchical-clustering

Cervical Cancer Risk Factors for Biopsy: This Dataset is Obtained from UCI Repository and kindly acknowledged! This file contains a List of Risk Factors for Cervical Cancer leading to a Biopsy Examination! About 11,000 new cases of invasive cervical cancer are diagnosed each year in the U.S. However, the number of new cervical cancer cases has been declining steadily over the past decades. Although it is the most preventable type of cancer, each year cervical cancer kills about 4,000 women in the U.S. and about 300,000 women worldwide. In the United States, cervical cancer mortality rates plunged by 74% from 1955 - 1992 thanks to increased screening and early detection with the Pap test. AGE Fifty percent of cervical cancer diagnoses occur in women ages 35 - 54, and about 20% occur in women over 65 years of age. The median age of diagnosis is 48 years. About 15% of women develop cervical cancer between the ages of 20 - 30. Cervical cancer is extremely rare in women younger than age 20. However, many young women become infected with multiple types of human papilloma virus, which then can increase their risk of getting cervical cancer in the future. Young women with early abnormal changes who do not have regular examinations are at high risk for localized cancer by the time they are age 40, and for invasive cancer by age 50. SOCIOECONOMIC AND ETHNIC FACTORS Although the rate of cervical cancer has declined among both Caucasian and African-American women over the past decades, it remains much more prevalent in African-Americans -- whose death rates are twice as high as Caucasian women. Hispanic American women have more than twice the risk of invasive cervical cancer as Caucasian women, also due to a lower rate of screening. These differences, however, are almost certainly due to social and economic differences. Numerous studies report that high poverty levels are linked with low screening rates. In addition, lack of health insurance, limited transportation, and language difficulties hinder a poor woman’s access to screening services. HIGH SEXUAL ACTIVITY Human papilloma viru

covid-19-in-india

Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).[6] The disease was first identified in December 2019 in Wuhan, the capital of China's Hubei province, and has since spread globally, resulting in the ongoing 2019–20 coronavirus pandemic.[7][8] Common symptoms include fever, cough, and shortness of breath

data-visualization-on-haberman-data-set

decision-tree_classifier

Context This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to this date. The "goal" field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4. Content Attribute Information: age

dt-donars-data

Decision Trees on Donars choose

eda-fifa-2019

FIFA-2019 data Exploratory Data Analysis

github-practice

a simple demo to learn github

k-means

k-means-on-abalon-data-set

K-means algorithm is the most popular and yet simplest of all the clustering algorithms. Select the number of clusters k that you think is the optimal number. Initialize k points as "centroids" randomly within the space of our data. Attribute each observation to its closest centroid. Update the centroids to the center of all the attributed set of observations. Repeat steps 3 and 4 a fixed number of times or until all of the centroids are stable (i.e. no longer change in step 4). This algorithm is easy to describe and visualize. Let's take a look.

k-means-on-mall-customer-segmentation-data

Context This data set is created only for the learning purpose of the customer segmentation concepts , also known as market basket analysis . I will demonstrate this by using unsupervised ML technique (KMeans Clustering Algorithm) in the simplest form. Content You are owing a supermarket mall and through membership cards , you have some basic data about your customers like Customer ID, age, gender, annual income and spending score. Spending Score is something you assign to the customer based on your defined parameters like customer behavior and purchasing data. Problem Statement You own the mall and want to understand the customers like who can be easily converge [Target Customers] so that the sense can be given to marketing team and plan the strategy accordingly.

k-means-vehicles-data-set

The data set has information about features of silhouette extracted from the images of different cars Four "Corgie" model vehicles were used for the experiment: a double decker bus, Cheverolet van, Saab 9000 and an Opel Manta 400 cars. This particular combination of vehicles was chosen with the expectation that the bus, van and either one of the cars would be readily distinguishable, but it would be more difficult to distinguish between the cars

k-nearest-neighbour

The data set we’ll be using is the Iris Flower Dataset which was first introduced in 1936 by the famous statistician Ronald Fisher and consists of 50 observations from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals.

k-nn-on-donars-choose-data-set

light-gbm-classifier

Telcom Customer Churn Each row represents a customer, each column contains customer’s attributes described on the column Metadata. The raw data contains 7032 rows (customers) and 21 columns (features). The “Churn” column is our target.

logistic_regression

Context "Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets] Content Each row represents a customer, each column contains customer’s attributes described on the column Metadata. The data set includes information about: Customers who left within the last month – the column is called Churn Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges Demographic info about customers – gender, age range, and if they have partners and dependents

lr-donars-choose

naive-bayes-on-donars-choose-dataset

pca-pima-diabetes-data

Data Set – diabetic Data Information:- The datasets consist of several medical predictor (independent) variables and one target (dependent) variable, Outcome. Independent variables include the number of pregnancies the patient has had, their BMI, insulin level, age, and so on. Attribute information :- Pregnancies - Number of times pregnant Glucose - Plasma glucose concentration a 2 hours in an oral glucose tolerance test BloodPressure - Diastolic blood pressure (mm Hg) SkinThickness - Triceps skin fold thickness (mm) Insulin - 2-Hour serum insulin (mu U/ml) BMI - Body mass index (weight in kg/(height in m)^2) DiabetesPedigreeFunction - Diabetes pedigree function Age - Age (years) Outcome - Class variable (0 or 1) 268 of 768 are 1, the others are 0 In class Assignment Expectations/Steps - Apply Data Cleaning to the Datasets and then apply PCA Find pattern and the choose the number of desired Principal components. Provide the inferences for the above analysis.

python-basics

sgd-for-linear-regression

slr-model

Context This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to this date. The "goal" field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4. Content Attribute Information: age sex chest pain type (4 values) resting blood pressure serum cholestoral in mg/dl fasting blood sugar > 120 mg/dl resting electrocardiographic results (values 0,1,2) maximum heart rate achieved exercise induced angina oldpeak = ST depression induced by exercise relative to rest the slope of the peak exercise ST segment number of major vessels (0-3) colored by flourosopy thal: 3 = normal; 6 = fixed defect; 7 = reversable defect The names and social security numbers of the patients were recently removed from the database, replaced with dummy values. One file has been "processed", that one containing the Cleveland database. All four unprocessed files also exist in this directory. To see Test Costs (donated by Peter Turney), please see the folder "Costs" Acknowledgements Creators: Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D. University Hospital, Zurich, Switzerland: William Steinbrunn, M.D. University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D. V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, M.D., Ph.D. Donor: David W. Aha (aha '@' ics.uci.edu) (714) 856-8779 Inspiration Experiments with the Cleveland database have concentrated on simply attempting to distinguish presence (values 1,2,3,4) from absence (value 0). See if you can find any other trends in heart data to predict certain cardiovascular events or find any clear indications of heart health.

chandana124 Goto Github PK

Chandana Reddy's Projects

Recommend Projects

Recommend Topics

Recommend Org