This is Natural language processing (NLP) project to detect language from text using machine learning. this model can detect
79 Languages.
Data:-
We got dataset from here in which their are more than 94 lakh rows but we did not needed that much so after preprocessing we
have 6K for some languages.To get prerpocessed data visit
Model Training:-
First trained pipeline of Logistic Regressor with Tf*Idf vectorizer without any hyperparameters got 85% accuracy.
And then Tf*Idf with ngrams (1,3) means all bigram, trigrams and with analyzer as characters means model will train of characters not on words got 95% accuracy.