The dataset columns include customer ID (CUST_ID
), various transaction-related features, and other financial metrics. Before proceeding with the analysis, it's important to handle missing values, standardize the numerical features, and address any other preprocessing steps.
# Handling Missing Values
creditcard_df.fillna(0, inplace=True) # Replace missing values with zeros for simplicity
# Standardizing Numerical Features
scaler = StandardScaler()
creditcard_df_scaled = scaler.fit_transform(creditcard_df.drop(columns=['CUST_ID']))
# Check the shape of the scaled data
creditcard_df_scaled.shape
Exploratory Data Analysis (EDA)
Explore the distribution of each feature in the dataset to understand the underlying patterns and relationships.
plt.figure(figsize=(10, 50))
for i in range(len(creditcard_df.columns)):
plt.subplot(17, 1, i + 1)
sns.distplot(creditcard_df[creditcard_df.columns[i]], kde_kws={'color': 'b', 'lw': 3, 'label': 'KDE'},
hist_kws={'color': 'g'})
plt.title(creditcard_df.columns[i])
plt.subplots_adjust(hspace=0.5)
Clustering with KMeans
Apply KMeans clustering to identify groups within the data. Determine the optimal number of clusters using the elbow method.
score_1 = []
range_value = range(1, 20)
for i in range_value:
kmeans = KMeans(n_clusters=i)
kmeans.fit(creditcard_df_scaled)
score_1.append(kmeans.inertia_)
kmean = KMeans(8)
kmean.fit(creditcard_df_scaled)
labels = kmean.labels_
# Display cluster centers
kmean.cluster_centers_
Autoencoder Neural Network
Implement an autoencoder neural network for dimensionality reduction and feature extraction.
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
input_df = Input(shape=(17,))
# Define the architecture of the autoencoder
# ...
autoencoder.compile(optimizer='adam', loss='mean_squared_error')
autoencoder.fit(creditcard_df_scaled, creditcard_df_scaled, batch_size=128, epochs=25, verbose=1)