Giter Site home page Giter Site logo

kcet_frontend's Introduction

KCET Cutoff Analyzer

Context:

The Karnataka Common Entrance Test (KCET) is an annual entrance exam conducted by the Karnataka Examination Authority (KEA) for admission into various undergraduate courses in engineering, architecture, pharmacy, agriculture, and other allied courses offered by colleges in the state of Karnataka.

Pain Point

The default cutoff list from KEA is huge and confusing. Creating a custom table for specific filters would help in the Application process. I had this pain point while I was applying and recently when my brother was.

The idea here is to act like a filter and get consistent data. But the major setback is parsing the damn cutoff PDF, which is extremely tight, has less padding, and in weird table format. KCET Cutoff List

Tech Stack

I wanted to keep tech-stack simple, as I wanted to focus more on improving PDF table parsing. I have used Vercel and DigitalOcean for my deployments

  • Frontend: Vite, React, Tailwind, Primereact and React Router
  • Backend: Python, Frappe, Redis, MariaDB

Solution

Now I started with tabula-py, but it missed a lot of rows. For starters, it had issues where for a single row that has 20 columns, 4 columns data will be in 1 row, then some x columns data will be in 2 rows, and so on. No pattern, pure randomness. Tried writing a logic for it, but also explored other alternatives simultaneously.

Single Cell Value Aggregation problem More such problems

The second approach was using OCR, the output was decent but required some cleanup. But it in some way resolved the value aggregation issue. So the idea was to use OCR and convert PDF tables to CSV sheets.

But there’s one more problem, if you don’t scale down the table(in Google Sheets) and export, you’ll have value aggregation again, which seems like a nightmare to solve. So I scaled it down to 70-80% and then exported it in A3 Landscape sheet format with custom page breakups.

Now using this CSV sheet, I was able to populate my DB. There are issues still with consistency, looking for other alternatives. But this has to go with MVP. I want something to be live and get critical feedback on my work. Don’t wanna waste time figuring out an optimized approach and then release. I thought if I could set up the pipeline, I would be able to improve data quality later on.

And I did the same, after 2 months of effort, the app is in the MVP stage. So I have planned to deploy it. I plan to improve things, like exporting the cutoff sheet as CSV and exposing CRUD APIs to allow development on top of existing KCET Cutoff Data.

Result

Result Comparision The output columns need to be verbose, which I am improving.

Note:

Right now, it is not open for open-source development. As I have to update the development backend for debug/test. I am planning to add that once the project gets some traction or users. Here's the link to Backend repository

ToDos:

  • Make a mono repo of both Frontend and Backend
  • Create a Development setup and expose cutoff APIs

kcet_frontend's People

Contributors

rohansh-tty avatar

Stargazers

Kuldeep Ahlawat avatar

Watchers

 avatar

kcet_frontend's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.