Replacing the <a href="https://docs.rs/smartcore/latest/smartcore/ensemble/random_fore

Random Forest vs Neural Net: Pros: captures

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Use a Neural Network instead of the Random Forest Regressor about llm-fraud-detection HOT 3 CLOSED

Philipp-Sc commented on September 6, 2024

Use a Neural Network instead of the Random Forest Regressor

from llm-fraud-detection.

Comments (3)

Philipp-Sc commented on September 6, 2024

Random Forest vs Neural Net:

Pros:

captures (most important) independent variables

Cons:

can not capture complex relationships between variables

from llm-fraud-detection.

Philipp-Sc commented on September 6, 2024

Test results on train=test dataset.

Random Forest Regressor:
Threshold >= 0.4: True Positive = 8876, False Positive = 128, Precision = 0.986, Recall = 0.985, F-Score = 0.985

Neural Net:
Threshold >= 0.4: True Positive = 8900, False Positive = 30, Precision = 0.997, Recall = 0.988, F-Score = 0.992

Neural Net performs best when all features are used.

train/test dataset with split ratio of 80%/20%.

Train:
Threshold >= 0.5: True Positive = 7076, False Positive = 27, Precision = 0.996, Recall = 0.987, F-Score = 0.992

Test:
Threshold >= 0.5: True Positive = 1669, False Positive = 69, Precision = 0.960, Recall = 0.946, F-Score = 0.953

Might be over fitted to some extend, still the test F-Score lacks behind the Random Forest.

Selected features with importance >= 0.01:

Random Forest Regressor:
Threshold >= 0.4: True Positive = 1700, False Positive = 94, Precision = 0.948, Recall = 0.939, F-Score = 0.943
Neural Net:
Threshold >= 0.4: True Positive = 1593, False Positive = 97, Precision = 0.943, Recall = 0.899, F-Score = 0.920

Neural Net performs worse that Random Forests when the features are reduces via importance score.
If the importance is computed from the NN, it performs worse (F-Score = 0.79).

from llm-fraud-detection.

Philipp-Sc commented on September 6, 2024

add Neural Net prediction as feature to final Random Forest.

As long as the number of topics is limited by the available compute (CPU only) it makes no sense to replace the Random Forest.

from llm-fraud-detection.

Recommend Projects