Giter Site home page Giter Site logo

skill_transform not found about skill2vec HOT 1 CLOSED

ks716 avatar ks716 commented on August 27, 2024
skill_transform not found

from skill2vec.

Comments (1)

Delmark1904 avatar Delmark1904 commented on August 27, 2024
def skill_transform(skill, remove_stopwords = True):
    skill = str(skill)
    skill = html.unescape(skill)
    
    skill = skill.replace("_", " ").split()
    skill = " ".join([sk for sk in skill if sk])
    
    skill = re.sub(r"\(.*\)", "", skill)
    skill = skill.replace("-", "") \
        .replace(".", "") \
        .replace(",", "") \
        .replace("-", "") \
        .replace(":", "") \
        .replace("(", "") \
        .replace(")", "") \
        .replace(u"åá", "") \
        .replace(u"&", "and") \
        .replace(" js", "js") \
        .replace("-js", "js") \
        .replace("_js", "js") \
        .replace("java script", "js") 
    
    skill = skill.lower()
    
    # Special cases replace
    special_case = {}
    special_case["javascript"] = [ "js", "java script", "javascripts", "java scrip" ]
    special_case["wireframe"] = [ "wireframes", "wire frame", "wire frames", "wire-frame", "wirefram", "wire fram", "wireframing" ]
    special_case["OOP"] = [  "object oriented", "object oriented programming", ]
    special_case["OOD"] = [ "object oriented design", ]
    special_case["OLAP"] = [ "online analytical processing",  ]
    special_case["Ecommerce"] = [ "e commerce",  ]
    special_case["consultant"] = [ "consulting",  ]
    special_case["ux"] = [ "user experience", "web user experience design", "user experience design", "ux designer", "user experience/ux" ]
    special_case["html5"] = [ "html 5",  ]
    special_case["j2ee"] = [ "jee",  ]
    special_case["osx"] = [ "mac os x", "os x" ]
    special_case["senior"] = [ "sr" ]
    special_case["qa"] = [ "quality",  ]
    special_case["bigdata"] = [ "big data",  ]
    special_case["webservice"] = [ "webservices", "website", "webapps" ]
    special_case["xml"] = [ "xml file", "xml schemas", "xml/json", "xml web service" ]
    special_case["bigdata"] = [ "big data",  ]
    special_case["nlp"] = [ "natural language process", "natural language", "nltk" ]
    for root_skill in special_case:
        if skill in special_case[root_skill]:
            skill = root_skill
    
    # Special case regex
    special_case_regex = {
        r'^angular.*$': 'angularjs',
        r'^node.*$': 'nodejs',
        r'^(.*)[_\s]js$': '\\1js',
        r'^(.*) js$': '\\1js',
    }
    for regex_rule in special_case_regex:
        after_skill = re.sub(regex_rule, special_case_regex[regex_rule], skill)
        if after_skill != skill:
            skill = after_skill
            break
    
    # Stem
    if len(skill) > 2:
        skill_after = skill.split(" ")
        skill_after = [wordnet_lemmatizer.lemmatize(sk, pos="v") for sk in skill_after]
        skill_after = " ".join(skill_after)
        skill = skill_after
    
    # skill stopwords 
    if remove_stopwords:
        skill_stopwords = [ "app", "touch", "the", "application" ]
        skill_after = skill.split(" ")
        skill = " ".join([ sk for sk in skill_after if sk not in skill_stopwords ])
    
    skill = skill.lower().strip().replace(" ", "_")
    skill = re.sub(' +',' ', skill)
    
    # NOTE: replace js tail
    skill = re.sub('js$','', skill)
    
    return skill

print(skill_transform("js"), skill_transform("angularjs"), skill_transform("Python"))

from skill2vec.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.