Giter Site home page Giter Site logo

soma_backend's Introduction

soma_backend

소프트웨어 마에스트로 백엔드 과제 - 이미정

결과 링크 주소

(http://125.180.52.121:18887) 위 주소는 집 공유기 포트포워딩을 통해 실행하였습니다.

평가서버 name명

이미정(\uc774\ubbf8\uc815)(lmjing20) : 0.72387755102

docker

(https://hub.docker.com/r/lmjing/swmaestro_backend/)

이미지 다운 : docker pull lmjing/swmaestro_backend에서

import nltk
nltk.download()

실행 할 경우 d -> book -> q를 차례대로 입력해주시면 됩니다.

##성능 개선 방법

  1. konlpy를 통한 형태소 분석
  2. 상품에서 분류하는데 도움이 되지 않는 stop word 제거
from konlpy.tag import Twitter
from nltk.corpus import stopwords

stop_words = stopwords.words('english')+ [u'0',u'1',u'2',u'3',u'4',u'5',u'6',u'7',u'8',u'9']

def set_konlpy(text):
    words = Twitter().pos(text)
    check = ['Alpha','Number','Noun']
    nn = [e[0] for e in words if e[1] in check if e[0] not in stop_words]
    return ' '.join(nn)

n_list = []
for each in d_list:
    n_list.append(set_konlpy(each))
    print (set_konlpy(each))
  1. trigram을 사용 feature더 다양하게 추가
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(ngram_range=(1, 3),token_pattern=r'(?u)\b\w+\b', min_df=1)
x_list = vectorizer.fit_transform(n_list)
  1. tfidf를 사용해 단어들의 중요도, 빈도수 파악
from sklearn.feature_extraction.text import TfidfTransformer
tfidf = TfidfTransformer()
z_list = tfidf.fit_transform(x_list)

##이 외의 시도들

  • https://github.com/irony/caffe-docker-classifier 모델을 이용 : docker로 irony/caffe-docker-classifier 다운 받은 후 실행, 이미지의 결과 값 5가지를 feature에 추가하여 학습하고자 하였으나 시간이 너무 많이 소모되어 시간 부족으로 인해 실제 테스트해보지는 못함.
d_list = []
cate_list = []
i_list = []
for each in train_df.iterrows():
    cate = ";".join([each[1]['cate1'],each[1]['cate2'],each[1]['cate3']])
    cate_list.append(cate)
    d_list.append(each[1]['name'])
    i_list.append(get_image_feature(each[0]))

위와 같이 feature을 추가하려 했음

soma_backend's People

Contributors

lmjing avatar

Watchers

James Cloos avatar  avatar

Forkers

hansjin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.