Giter Site home page Giter Site logo

kasivisu4 / cordiance-experential-project Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 1.0 1022 KB

String Matching using Longest Common Subsequence with Words b/w two sentences

License: MIT License

Jupyter Notebook 100.00%
anytree inflect multiprocessing nltk-python observablehq python3

cordiance-experential-project's Introduction

Cordiance-experential-project

String Matching using Longest Common Subsequence with Words between two sentences

Project Description:

This project is an experiential Algorithm course project in collaboration with the Cordiance Company. The goal of this project is to get the closest possible UNSPSC code match for the Avalara Tax description. To begin this project, we were provided with two data files: UNSPSC and Avalara.

UNSPSC file contains code to be matched and it is divided into 4 levels:

  • Commodity level

  • Class level

  • Family level

  • Segment level

Avalara file contains Avalara Tax System code, its description and additional information related to the same.

The goal of this project is to get the closest possible UNSPSC code match for the Avalara Tax description.

Algorithm:

alt text

Documentation:

https://github.com/kasivisu4/cordiance-experential-project/blob/main/Group3-Final%20Report.pdf

Code Details:

The program to the project is written in Python language with help of Jupyter lab. All the cells are in sequence order from top to bottom. Each step mentioned in the algorithm is the function in the code. At the end of the notebook we will get the output in final_result object.

https://github.com/kasivisu4/cordiance-experential-project/blob/main/Experential_project_code.ipynb

Test Cases:

We created a test dataset of 30 records by manually mapping the corresponding UNSPSC code to Avalara description. Our algorithm is divided into two phases

  • Phase 1(Possible Outcome): In this phase, we were able to get one possibility for each level. Out of 30 we were able to match 27 possibilities
  • Phase 2: In this phase, we will get the max possibility of all the levels from phase 1. Out of 30 records we were able to match 20 records which gives us the accuracy of 66.7%.

https://github.com/kasivisu4/cordiance-experential-project/blob/main/TestRecords.xlsx

Observable Custom StopWords Visualizations:

We have performed word count on the Avalara File and tried to visualize the zoomable tree map for identifying the custom stop words

https://observablehq.com/@kasivisu4/avalara-data-analysis

Contributors:

Akhila Sulgante

Kasi Viswanath

Shital Waters

© 2022 MIT

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.