iit-cs579 / assignments Goto Github PK

View Code? Open in Web Editor NEW

20.0 20.0 93.0 1.13 MB

Python 99.76% Shell 0.24%

assignments's People

Contributors

Stargazers

Watchers

Forkers

akumar67 swapnil2095 blazers07 pliu19 pratik27shah hmnshu34 smrutipatel14 cicirao rmasand1 srana6 lun910329 mmalviya24 swapnil2195 sowjanyamallu sgunjal shrutinaik1991 msp16scm01n abhikumar427 shravni yteng2 sowmyabala11 afa17scm77g n-getty ddabhane ishujaswal hrshtsngh98 aashka-agarwal sure00 redhat8983 mychandru024 zhizhengli kshah105 asp17scm66s alishaaleem yashagarwal93 sharvaripatil02 jzgiang ghltshubh dpengbebop surajdidwania rzou15 karankatiyar92 smahadik1 whyfightfordreams dkaramchandani nerdysingh zihanniu mengjue1992 vsp18scm51n szeinali leiliu0718 abhinavp1989 saumilajmera sbidwai zaimeibian nidhi39 hiral181994 lweicdsor ashly2018 pkharate mohanarunachalam thedeepdarkfantasies yluo39github usagi1990 minump maruoxi krishnachaitanya1995 shiwanshukr mpatel120 vidhiamin qfa18scm70g waleywei nchoudhary1 gsharma7 andy10utd rodaina96 littlepretty fashokkumar sanath-25 chenmetanoia pri12890 aaditya11 siddhantpriyadarshi jayashreegit xchen182 toopricey msharee3 isnina mayank31796 kirubadhayalan praveenkumarselvan zhaowang-iit

assignments's Issues

get_friends() doc test

Sir
When i am running the doc test for get_friends(twitter, 'aronwc')[:5] getting the list of friends but the order is not same as [695023, 1697081, 8381682, 10204352, 11669522] the order is [90155533, 216939636, 264501255, 10204352, 142594034] but the whole list has all the friends id including the one in the doctest sample
and it is also working fine in case of presidential candidates

Need information regarding source of data

Can you help me with this-

How we can download data like this link that you have done?
Can you please name the source from where this data is collected?
url = 'https://www.dropbox.com/s/h9ubx22ftdkyvd5/ml-latest-small.zip?dl=1'

If possible please help with the code.

Processing time evaluation

Hello @aronwc ,

Just a brief question regarding the assignment 1.
Is there a processing time evaluation of our algorithm regarding the computation of the score_max_depths for Bill Gates graph.
I mean, regarding the number of nodes and the number of bfs we have to process for each of them for each depth, this algorithm takes more than 10 minutes to process on my side.

Best,

Output Issue

Hi,

My output comes as per the log.txt but the order 'cluster 2 nodes:' is not same as the log file.So is this an acceptable case.The order of output for 'cluster 2 nodes:' changes,not the contents in it they are all the same for every run.

Assignment 2 different worst result

Now my output is exactly same as the Log.txt file except the following:

worst cross-validation result:
{'min_freq': 10, 'accuracy': 0.64749999999999996, 'punct': True, 'features': (<function lexicon_features at 0x10eb58e18>,)}

In the Log.txt, min_freq is 2 (everything else is the same, even accuracy). What a coincidence!

I'm just wondering why this is the case...Anyone has any idea?

def count_friends(user)

Hey!!
Can anyone explain what does this function do. I'm not able to understand it and also the example is not clear to me.

twitter.request ALWAYS returns HTTP error code 401

Hi ,
The " twitter.request('users/show',{'screen_name':'abc'}) " request always returns error code 401.
It says unauthorized access. While I cross checked my credentials multiple times and they are correct.

Is there anything missing ?

a2.py output

Hello Professor,
Everytime I run the main function I get this error " X has 17925 features per sample; expecting 17698". I don't know from where this might have orginated from. I also wanted to ask if all the output numbers should much exactly the numbers in the log.txt or there could be some small variations.

Bonus Assignment : (note the edge order)

Hello professor ,
In the return part of for the method ,there is a statement saying note the edge order , did not understand that . Does that mean we should sort the edges ?
Thanks

edges in subgraph

Hi professor,

About edges in subgraph, is the following understanding correct?

In the subgraph, the edges will be between the nodes having degree greater than min_degree. If there was an edge between two nodes, one which is having degree greater than min_degree while the other is having a degree less than min_degree, we will remove this edge.

Regarding Community Detection

Hi Professor,

I have collected tweets in collect.py. I was wondering whether clustering of friends/followers should happens in cluster.py?

[HW1] Do we need to realize BFS function without calling other library

Hi Professor
Can we call the function _single_source_shortest_path_basic(graph, s) to find the shortest path between root and other nodes? Or we must realize BFS function without calling other libraries such as calling _single_source_shortest_path_basic function?

[HW1] node2num_paths in the function bfs

Can someone give me more details about node2num_paths. According to the explanation it says for each node in the graph we should give the number of shortest paths passing through it so if this understanding of the variable is correct then the example output should have different values.
for Ex.
node2distances, node2num_paths, node2parents = bfs(example_graph(), 'E', 2)
for node D - there are 2 shortest paths passing through it and they are
[path 1] E-D-B
[path 2] E-D-G
So the number corresponding to this node should be 2 but it is being tested for 1.

Somewhere my understanding of the problem is wrong or the test case is incorrect.
Kindly someone help me realize what am I missing here.

Thank you.

Not able to return objects from get_user function

I am not able to return any user from get_users(twitter, screen_names) function

user = twitter.request('users/search', {'q': screen_names[2]}) 

print('Code returned =', user.status_code)

if user.status_code==200 :
    print('Call SuccessFul')
    for item in user.get_iterator():
        print('id=%d screen_name=%s, name=%s, location=%s' % (item['id'],item['screen_name'],item['name'], item['location']))

else:
    print('Call failed')

Output->

Traceback (most recent call last):
File "C:\Users\Swapnil\workspace\Assignment1\a0.py", line 410, in
main()
File "C:\Users\Swapnil\workspace\Assignment1\a0.py", line 387, in main
users = sorted(get_users(twitter, screen_names), key=lambda x: x['screen_name'])
TypeError: 'NoneType' object is not iterable

a1.py running for too long

Can anybody will help me to reduced running time.

I have passed my all doc-test but as my program running too long, I am not able to see its output.

How to calculate credit for node in following example

In bottom_up function-

Just tell me I am thinking right -
Consider the class example which is same as in doc test -

For credit_calculation at node D,
We have to find we need to find D's children i.e. B and G
Then,
As G has two parents, we have to divide credits like
Credits(G) divided by node2path = 1 divided by 2 = 0.5

Question ->

Is this the case. I am confused will it work for the more complex network.
Or anything extra care needs to take.

If I am right in above case,

So,
Credit(D) = sum of credits of children + 1

Don't open issues here.

Go here instead:
https://github.com/iit-cs579/main/issues

score_max_depths outputs not matching with the Professor's output

Hi,

My Score_max_depths function is returning scores with discrepancies in the 5th decimal order.
_Professor's output:: [(1, 1.0070175438596491), (2, 1.0005847953216374), (3, 0.12177725118483412), (4, 0.12177725118483412)]

My Output:: [(1, 1.0070298769771528), (2, 1.0005858230814295), (3, 0.12178041543026706), (4, 0.12178041543026706)]_

I'm trying to debug where my approach is off track.

Can someone whose values are matching post their cluster 2 nodes for every depth?[Output of Girvan-Newman partition for the range of depths] This would help me focus on the function at error faster.

My Output for each Cluster partition(printing only cluster 2 nodes)

Cluster 2 nodes of partition whose depth is 1 : ['(RED)']
Cluster 2 nodes of partition whose depth is 2 : ['Beyond Access']
Cluster 2 nodes of partition whose depth is 3 : ['The Hunger Games', 'Scholastic', 'WordGirl', 'READ 180', 'Scholastic Reading Club', 'Scholastic Canada', 'Scholastic Teachers', 'Scholastic Parents', 'Scholastic Book Fairs', 'Clifford The Big Red Dog', 'Arthur A. Levine Books']
Cluster 2 nodes of partition whose depth is 4 : ['The Hunger Games', 'Scholastic', 'WordGirl', 'READ 180', 'Scholastic Reading Club', 'Scholastic Canada', 'Scholastic Teachers', 'Scholastic Parents', 'Scholastic Book Fairs', 'Clifford The Big Red Dog', 'Arthur A. Levine Books']

I'm trying to see if my betweenness function is giving wrong scores or if my partition approach is wrong, Or i may have a problem with my Norm_Cut calculation.

Regards
Tejas

A2 top coefficients per class

The log.txt file looks like:
positive words:
neg_words: 0.66113
token_pair=the__worst: 0.37465
...
negative words:
pos_words: 0.52554
...
Why is neg_words or the_worst has so high coefficient for classifying a positive label?

I am having my testing accuracy as 0.54% is there any way to resolve this?

Scope/Code Indentation problems :|

Somehow indenting the code is not keeping it in scope.

Details : -

Have an if statement block where, if the test fails, roughly a half of the code, the first part, doesn't run but the latter half does :P

How to handle case for a graph with max_depth = 1

in partition_girvan_newman function , when I m calling to approximate_betweenness function

For max_depth == 1 , it gives me empty edges to remove.
Please suggest how to handle this condition.

G.remove_edge(*edge_to_remove)
TypeError: remove_edge() missing 2 required positional arguments: 'u' and 'v'

Do you want to use bfs algo to fill node2num_paths dict as well

in function bfs(graph, root, max_depth) - Do you want to use bfs algo to fill node2num_paths dict as well

I have filled node2distances dict. But not getting how I can manage to find multiple short_paths from bfs algo.
or

We can use _single_source_shortest_path_basic(graph, s) to do our purpose.

doctest failed on vectorize...

When I explicitly define the CSR matrix with "dtype=np.int64" the output looks like:
Expected:
array([[1, 0, 1, 1, 1, 1],
[0, 2, 0, 1, 0, 0]], dtype=int64)
Got:
array([[1, 0, 1, 1, 1, 1],
[0, 2, 0, 1, 0, 0]])
However, when I define it as "dtype=np.int32", the output looks like:
Expected:
array([[1, 0, 1, 1, 1, 1],
[0, 2, 0, 1, 0, 0]], dtype=int64)
Got:
array([[1, 0, 1, 1, 1, 1],
[0, 2, 0, 1, 0, 0]], dtype=int32)
Anyone has idea how I can pass this doctest?

File upload problem

hey
I have uploaded my assignment file in the private repository in folder ao
the files that i have uploaded are
AO.ipynb and twitter.cgf
but when i execute my code(its working correct) but i cant see my network .npg file in the repository
can anyone tell me are they facing same problem or the link where the network.png get generated

Question about Assignment 2 eval_all_combinations()

Hello Dr. Aron,

So sorry to bother during the weekend. I have a quick question about A2, function eval_all_combinations(). In the description, it says if feature_fns = [token_features, token_pair_features, lexicon_features], then we will consider all 4 combinations of features.

Assuming we have a set [A,B,C], we can have 8 combinations: (empty),A, B, C, AB, AC, BC, ABC; In the log:
features=token_pair_features lexicon_features: 0.75125
features=token_features token_pair_features lexicon_features: 0.74583
features=token_features token_pair_features: 0.73542
features=token_pair_features: 0.72875
it is something equivalent to having BC, ABC, AB and B. Why is this the case?

Regards,
Zhidu

Pushing problem

I've tried to pushed my assignment. However, my terminal shows "success". But nothing changes on the Github web page.

Is there anything I can do? Thanks.

Robust_request Function

Hey
i am having problem in understanding how robust_request function work and how these parameters will work ?

A2 output not the same as log.txt

Hello Dr. Aron,

I just finished A2, but the output is slightly different compared to our Log.txt. Besides, every time I run my code, it gives a slightly different result. Do you think it might due to the behavior of LogisticRegression? Or maybe it is not supposed to be this case?

With Respects,
Zhidu

Regarding next assignment (A2) and topics covered by midterm

Hello Dr. Aron,

How are you! I hope you are enjoying this week. I just have a quick question as below.

Would you like to disclose some information about the next upcoming assignment, A2? I'm wondering what topics will be covered by this assignment. As you know, A1 only covers up to Link prediction. But our midterm also includes Cascading and Sentiment Analysis. If A2 is about these topics, could you please post A2 today or tomorrow?

By the way, will the solution for A1 (or a sample code) be published after its due? We talked about the solution to A0 during one of our lecture; but we will not have lecture next week...

Warmest Regards,
Zhidu

How to consider parents

Hi Professor/TA,

How we should consider parent-child relation in following example.
if we have edges (node1,node2) and (node2,node1)

then they are parents to each other as well.
As I can see my output by considering both ways relation-
[('A', ['B']), ('B', ['D']), ('C', ['B']), ('D', ['B', 'E', 'F', 'G']), ('F', ['D', 'E', 'G']), ('G', ['D', 'F'])]

And your output is
[('A', ['B']), ('B', ['D']), ('C', ['B']), ('D', ['E']), ('F', ['E']), ('G', ['D', 'F'])]

I am confused whether how you consider here.
Plese, explain what you want here.
Do I need to consider max_depth parameter here as well?

iit-cs579 / assignments Goto Github PK

assignments's People

Contributors

Stargazers

Watchers

Forkers

assignments's Issues

Question ->

Recommend Projects

Recommend Topics

Recommend Org