cornelltech / company-projects-matcher Goto Github PK
View Code? Open in Web Editor NEWThe matching algorithm for Company Projects: CS 5999.
The matching algorithm for Company Projects: CS 5999.
If there are unmatched students, the state's energy should be verrrrrrrry high!
Need to figure out a way (and find a function) to curve the student interest points towards the bottom. I.e. if a student receives her 10th choice, there is a big penalty, so that the algorithm will optimize towards everyone receiving his/her 4th choices instead of three 2nd choices and one 10th. Thinking about x*sqrt(x).
Only generated two teams of 5 from 14 students:
Difficult to duplicate
Ameyas-MacBook-Pro:classes ameyaacharya$ python perry_geo_main.py
There are 14 students
Len unmatched students is 14
There are 6 MBAs
There are 8 MEngs
True
Length of this project is 0
Len of matched projects is 1
Len unmatched students is 10
There are 4 MBAs
There are 6 MEngs
True
Length of this project is 0
Len of matched projects is 2
Len unmatched students is 6
There are 2 MBAs
There are 4 MEngs
True
Length of this project is 4
Len of matched projects is 3
Len unmatched students is 2
There are 0 MBAs
There are 2 MEngs
False
Length of this project is 4
Should remove project 4615
Len unmatched students is 1
There are 0 MBAs
There are 1 MEngs
False
Length of this project is 4
Should remove project 3640
INITIAL SOLUTION:
3640: [6249314, 1678231, 8291021, 5467123, 6666666]
4615: [3333333, 9191919, 4990324, 5092102, 8888888]
Doesn't actually do anything currently.
In removing infeasible projects, there are no projects remaining with "tests.csv" as input. Fix this.
For use by Greg and Aaron in classes. This depends on #1 because once I find a way to properly calculate distance between these vectors, I can create the diverse teams using this metric.
Currently, the calculation between our 12-vectors in do_mahal_distance in covariance.py is incorrect. When we return the sorted list of pairwise distances, there are some "dissimilar" vectors that are supposedly "more similar than" or "the same as" a vector against itself.
For example:
[3 1 0 4]
[3 1 0 4]
79512674.1057
[3 1 0 4]
[0 3 0 4]
79512674.1057.
If # MBAs < (total num students / team size) or # MEngs < (total num students / team size) then we can't make the # of required teams.
Investigate what the change is. Maybe changing work experience from 0-6 to 0-4?
rrdhcp-10-33-45-22:classes ameyaacharya$ python greedy_attempt_two.py
Traceback (most recent call last):
File "greedy_attempt_two.py", line 121, in
initial_solution(students, all_projects)
File "greedy_attempt_two.py", line 93, in initial_solution
unmatched_students.remove(student)
ValueError: list.remove(x): x not in list
If I run greedy_attempt_two.py on tests.csv, I get the above error.
When we remove students using random shuffle, one student that is left in the list is skipped in the next iteration of the for each loop. (greedy_attempt_two/initial_solution)
Add_student doesn't allow for more than num_MBAs + num_MEngs students on a team.
However, in our case, we want teams of 4 or 5, and that functionality is not supported by add_student.
Currently just doing project.students.append(new_student).
Could change add_student to do that.
In exhaustive.py, we need to replace the constants (in getting rid of the infeasible teams) with the actual variables that they represent.
These must not be updated enough. Not up to date with the number of spots actually remaining.
The current version is a simple subtraction of the project rank from 10. It should be the curved function that I designed to close #2.
Update the energy function to include a diversity calculation as well as the cost, which is already included.
Need to think about what exactly to do here.
Before, undergrad major had many different options. Now going to change it to was_cs_ug or not. This is the data that matters for diversity.
This will involve changes in classes.py (valid values) and in survey_responses_altered.csv. This will decrease the size of our vectors by a lot. Should not require many other changes but we will see.
Not sure how long annealing is supposed to run for, but read docs and figure it out. Get it to terminate.
At the end of running initial_solution, all of the students are unmatched.
'For project 1625:
Students: [666666, 3922650, 1678231, 6249314]
Waiting: []
For project 1820:
Students: [5092102, 7894231]
Waiting: [(3, 8888888), (4, 8291021)]
For project 2145:
Students: [5092102, 8888888]
Waiting: []
For project 2860:
Students: [4102938, 3333333]
Waiting: [(1, 8888888)]
For project 2990:
Students: [8291021, 3333333, 4102938]
Waiting: []
For project 3705:
Students: [3333333, 4102938, 8291021]
Waiting: []
For project 3900:
Students: [8291021, 4102938, 9191919]
Waiting: []
For project 4225:
Students: [4990324, 5467123, 2886650, 3333333]
Waiting: [(0, 8888888), (1, 4102938), (3, 7894231)]
Unmatched
[2886650, 4990324, 6249314, 5092102, 5467123, 9191919, 3333333, 7894231, 1678231, 8291021, 4102938, 3922650, 8888888, 666666]'
With "tests.csv" as the input:
Student is on a team and on the waiting list as well:
For project 2860:
Students: [4102938, 3333333, 8888888]
Waiting: [(1, 8888888)]
Unmatched students is not updated:
For project 2860:
Students: [4102938, 3333333, 8888888]
Waiting: [(1, 8888888)]
At the end the unmatched students are
[8291021, 5092102, 7894231]
Students [4102938, 3333333, 888888 should be in the unmatched students list. A project with 3 students is not matched.
Need coding ability, CSUG, and years of work experience.
Generate lots of test cases and set up the Python testing framework.
Since #23 is closed, we need to get rid of the check (in calculating the cost of assigning a student to a certain project) that makes sure that the rank is within some range.
Create a Solution: a student cannot be on two teams.
There is a bug in the do_all_subtracted_distances_data. When calculating two distances individually, I get the right answer, but this is not the answer recorded in the do_all_subtracted_distances_data version.
How should we fix this? Could possibly change move(state). Could also be smarter about picking the initial projects.
regression.py relies on deprecated variables in the input file (group experience, multidisciplinary group experience). Need to restore original version of .csv file with these two columns as variables. Also, need to include this file as the default file in regression.py
Need in distance.py, probably clustering.py as well.
This is necessary for implementing the basic greedy algorithm (to generate the initial solution).
Important because:
Annealing just does random swaps. So, it preserves the number of students on each project. We need to have the right number of people on each project.
Traceback (most recent call last):
File "perry_geo_main.py", line 57, in
perry_geo_annealing.move(state)
File "/Users/ameyaacharya/Documents/Projects/Company Projects/Code/company-projects-matcher/src/classes/perry_geo_annealing.py", line 62, in move
second_team.students.remove(student_two)
ValueError: list.remove(x): x not in list
rrdhcp-10-33-45-22:classes ameyaacharya$
Create a list of IDs whose students were already removed from unmatched_students.
teams.py (hashtable or list membership)
Not tracked by git.
Talk to team at meeting if changing work experience years from 0-6 to 0-4 will be a problem. If they say no, then change the values in the vals_valid_work_experience (or something like that) in classes.py
Change from survey_responses.csv to survey_responses_altered.csv. This accounts for the change made in work experience values in #5.
Getting the following errors because of this:
Can add a student s to a project p by saying p.students.append(s).
Need to protect against this.
Currently use this unstable method in greedy_attempt_two.py / initial_solution.
(Towards the end of the function).
If we get the ranking of a project not in the index, should return a huge number.
All projects? greedy_attempt_two/intial_solution?
Have random_initial_solution pick only from the feasible projects. Not sure if this is already implemented.
Some bug in teams.py (it is noted there). Random team formation works for ints but gets funky when Student objects are passed in.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.