Preliminary investigation of machine learning techniques to perform parameters estimation for the crystal structure of type: cube.
Starting from a spectral, which represent a cubic crystal structure, the aim is to develop a ML model which can predict the three different parameters size: a, b, c.
Each observation is a couple (xi, yi), for which xi is a value between 0 and 90, with an increment of 0.02; yi is the intensity.
For the cubic structure the three dimensions are all equal; thus, for the cubic structure is a one-output regression problem.
Different ML algorithms have been implemented, such as: Regression Tree, Random Forest, LSBoost, Neural Network.
Three different experiment have been runned.
To run the Experiment #1:
run_experiment.m
In the experiment 1, the user can set different before running the experiment. In particular:
- threshold, such that a peak in the spectrum coulb be selected
- the number of the first N peaks (greater than the threshold)
- in this experiment only the position (NOT the intensity) of these peaks are used as features for the model training
- if the position of the biggest peak should be used or not as feature
- if the total number of peaks should be used or not as feature
- if the missing value should be replaced or not
Then, the experiment is runned with the selected settings.
To run the Experiment #2:
run_iteration_experiment.m
In the experiment 2, the settings are frozen. In particular:
- threshold = 1
- the numbers of the first N peaks (greater than the threshold) is fixed to [[10 15 20 25 30 40 50]]
- in this experiment only the position (NOT the intensity) of these peaks are used as features for the model training
- if the position of the biggest peak should be used or not as feature (NOT USED)
- if the total number of peaks should be used or not as feature (USED)
- if the missing value should be replaced or not (NOT)
Then, the experiment is runned for each number of peaks and a comparison of the performances is provided.
To run the Experiment #3:
run_experiment_using_all_spectrum_data.m
In the experiment 3, instead of using the position of the first N peaks, we used the entire spectrum data, which are all the yi as features to train the models.
- MATLAB Version 9.14 (R2023a) (https://it.mathworks.com/products/matlab.html)
- Statistics and Machine Learning Toolbox Version 12.5 (R2023a) (https://it.mathworks.com/products/statistics.html)
- Parallel Computing Toolbox Version 7.8 (R2023a) (https://it.mathworks.com/products/parallel-computing.html)