Giter Site home page Giter Site logo

assignments's People

Contributors

aronwc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

assignments's Issues

get_friends() doc test

Sir
When i am running the doc test for get_friends(twitter, 'aronwc')[:5] getting the list of friends but the order is not same as [695023, 1697081, 8381682, 10204352, 11669522] the order is [90155533, 216939636, 264501255, 10204352, 142594034] but the whole list has all the friends id including the one in the doctest sample
and it is also working fine in case of presidential candidates

Processing time evaluation

Hello @aronwc ,

Just a brief question regarding the assignment 1.
Is there a processing time evaluation of our algorithm regarding the computation of the score_max_depths for Bill Gates graph.
I mean, regarding the number of nodes and the number of bfs we have to process for each of them for each depth, this algorithm takes more than 10 minutes to process on my side.

Best,

Output Issue

Hi,

My output comes as per the log.txt but the order 'cluster 2 nodes:' is not same as the log file.So is this an acceptable case.The order of output for 'cluster 2 nodes:' changes,not the contents in it they are all the same for every run.

Assignment 2 different worst result

Now my output is exactly same as the Log.txt file except the following:

worst cross-validation result:
{'min_freq': 10, 'accuracy': 0.64749999999999996, 'punct': True, 'features': (<function lexicon_features at 0x10eb58e18>,)}

In the Log.txt, min_freq is 2 (everything else is the same, even accuracy). What a coincidence!

I'm just wondering why this is the case...Anyone has any idea?

def count_friends(user)

Hey!!
Can anyone explain what does this function do. I'm not able to understand it and also the example is not clear to me.

twitter.request ALWAYS returns HTTP error code 401

Hi ,
The " twitter.request('users/show',{'screen_name':'abc'}) " request always returns error code 401.
It says unauthorized access. While I cross checked my credentials multiple times and they are correct.

Is there anything missing ?

a2.py output

Hello Professor,
Everytime I run the main function I get this error " X has 17925 features per sample; expecting 17698". I don't know from where this might have orginated from. I also wanted to ask if all the output numbers should much exactly the numbers in the log.txt or there could be some small variations.

Bonus Assignment : (note the edge order)

Hello professor ,
In the return part of for the method ,there is a statement saying note the edge order , did not understand that . Does that mean we should sort the edges ?
Thanks

edges in subgraph

Hi professor,

About edges in subgraph, is the following understanding correct?

In the subgraph, the edges will be between the nodes having degree greater than min_degree. If there was an edge between two nodes, one which is having degree greater than min_degree while the other is having a degree less than min_degree, we will remove this edge.

Regarding Community Detection

Hi Professor,

I have collected tweets in collect.py. I was wondering whether clustering of friends/followers should happens in cluster.py?

[HW1] node2num_paths in the function bfs

Can someone give me more details about node2num_paths. According to the explanation it says for each node in the graph we should give the number of shortest paths passing through it so if this understanding of the variable is correct then the example output should have different values.
for Ex.
node2distances, node2num_paths, node2parents = bfs(example_graph(), 'E', 2)
for node D - there are 2 shortest paths passing through it and they are
[path 1] E-D-B
[path 2] E-D-G
So the number corresponding to this node should be 2 but it is being tested for 1.

Somewhere my understanding of the problem is wrong or the test case is incorrect.
Kindly someone help me realize what am I missing here.

Thank you.

Not able to return objects from get_user function

I am not able to return any user from get_users(twitter, screen_names) function

user = twitter.request('users/search', {'q': screen_names[2]}) 

print('Code returned =', user.status_code)

if user.status_code==200 :
    print('Call SuccessFul')
    for item in user.get_iterator():
        print('id=%d screen_name=%s, name=%s, location=%s' % (item['id'],item['screen_name'],item['name'], item['location']))

else:
    print('Call failed')

Output->

Traceback (most recent call last):
File "C:\Users\Swapnil\workspace\Assignment1\a0.py", line 410, in
main()
File "C:\Users\Swapnil\workspace\Assignment1\a0.py", line 387, in main
users = sorted(get_users(twitter, screen_names), key=lambda x: x['screen_name'])
TypeError: 'NoneType' object is not iterable

a1.py running for too long

Can anybody will help me to reduced running time.

I have passed my all doc-test but as my program running too long, I am not able to see its output.

How to calculate credit for node in following example

In bottom_up function-

Just tell me I am thinking right -
Consider the class example which is same as in doc test -

For credit_calculation at node D,
We have to find we need to find D's children i.e. B and G
Then,
As G has two parents, we have to divide credits like
Credits(G) divided by node2path = 1 divided by 2 = 0.5

Question ->

Is this the case. I am confused will it work for the more complex network.
Or anything extra care needs to take.

If I am right in above case,

So,
Credit(D) = sum of credits of children + 1

score_max_depths outputs not matching with the Professor's output

Hi,

My Score_max_depths function is returning scores with discrepancies in the 5th decimal order.
_Professor's output:: [(1, 1.0070175438596491), (2, 1.0005847953216374), (3, 0.12177725118483412), (4, 0.12177725118483412)]

My Output:: [(1, 1.0070298769771528), (2, 1.0005858230814295), (3, 0.12178041543026706), (4, 0.12178041543026706)]_

I'm trying to debug where my approach is off track.

Can someone whose values are matching post their cluster 2 nodes for every depth?[Output of Girvan-Newman partition for the range of depths] This would help me focus on the function at error faster.

My Output for each Cluster partition(printing only cluster 2 nodes)

Cluster 2 nodes of partition whose depth is 1 : ['(RED)']
Cluster 2 nodes of partition whose depth is 2 : ['Beyond Access']
Cluster 2 nodes of partition whose depth is 3 : ['The Hunger Games', 'Scholastic', 'WordGirl', 'READ 180', 'Scholastic Reading Club', 'Scholastic Canada', 'Scholastic Teachers', 'Scholastic Parents', 'Scholastic Book Fairs', 'Clifford The Big Red Dog', 'Arthur A. Levine Books']
Cluster 2 nodes of partition whose depth is 4 : ['The Hunger Games', 'Scholastic', 'WordGirl', 'READ 180', 'Scholastic Reading Club', 'Scholastic Canada', 'Scholastic Teachers', 'Scholastic Parents', 'Scholastic Book Fairs', 'Clifford The Big Red Dog', 'Arthur A. Levine Books']

I'm trying to see if my betweenness function is giving wrong scores or if my partition approach is wrong, Or i may have a problem with my Norm_Cut calculation.

Regards
Tejas

A2 top coefficients per class

The log.txt file looks like:
positive words:
neg_words: 0.66113
token_pair=the__worst: 0.37465
...
negative words:
pos_words: 0.52554
...
Why is neg_words or the_worst has so high coefficient for classifying a positive label?

Scope/Code Indentation problems :|

Somehow indenting the code is not keeping it in scope.

Details : -

Have an if statement block where, if the test fails, roughly a half of the code, the first part, doesn't run but the latter half does :P

How to handle case for a graph with max_depth = 1

in partition_girvan_newman function , when I m calling to approximate_betweenness function

For max_depth == 1 , it gives me empty edges to remove.
Please suggest how to handle this condition.

G.remove_edge(*edge_to_remove)
TypeError: remove_edge() missing 2 required positional arguments: 'u' and 'v'

Do you want to use bfs algo to fill node2num_paths dict as well

in function bfs(graph, root, max_depth) - Do you want to use bfs algo to fill node2num_paths dict as well

I have filled node2distances dict. But not getting how I can manage to find multiple short_paths from bfs algo.
or

We can use _single_source_shortest_path_basic(graph, s) to do our purpose.

doctest failed on vectorize...

When I explicitly define the CSR matrix with "dtype=np.int64" the output looks like:
Expected:
array([[1, 0, 1, 1, 1, 1],
[0, 2, 0, 1, 0, 0]], dtype=int64)
Got:
array([[1, 0, 1, 1, 1, 1],
[0, 2, 0, 1, 0, 0]])
However, when I define it as "dtype=np.int32", the output looks like:
Expected:
array([[1, 0, 1, 1, 1, 1],
[0, 2, 0, 1, 0, 0]], dtype=int64)
Got:
array([[1, 0, 1, 1, 1, 1],
[0, 2, 0, 1, 0, 0]], dtype=int32)
Anyone has idea how I can pass this doctest?

File upload problem

hey
I have uploaded my assignment file in the private repository in folder ao
the files that i have uploaded are
AO.ipynb and twitter.cgf
but when i execute my code(its working correct) but i cant see my network .npg file in the repository
can anyone tell me are they facing same problem or the link where the network.png get generated

Question about Assignment 2 eval_all_combinations()

Hello Dr. Aron,

So sorry to bother during the weekend. I have a quick question about A2, function eval_all_combinations(). In the description, it says if feature_fns = [token_features, token_pair_features, lexicon_features], then we will consider all 4 combinations of features.

Assuming we have a set [A,B,C], we can have 8 combinations: (empty),A, B, C, AB, AC, BC, ABC; In the log:
features=token_pair_features lexicon_features: 0.75125
features=token_features token_pair_features lexicon_features: 0.74583
features=token_features token_pair_features: 0.73542
features=token_pair_features: 0.72875
it is something equivalent to having BC, ABC, AB and B. Why is this the case?

Regards,
Zhidu

Pushing problem

I've tried to pushed my assignment. However, my terminal shows "success". But nothing changes on the Github web page.

a52eaa9d-6e31-4d53-97f5-471f6f22a26a
1b658f5a-2b4c-41b6-8d71-676dafabd1f2

Is there anything I can do? Thanks.

Robust_request Function

Hey
i am having problem in understanding how robust_request function work and how these parameters will work ?

A2 output not the same as log.txt

Hello Dr. Aron,

I just finished A2, but the output is slightly different compared to our Log.txt. Besides, every time I run my code, it gives a slightly different result. Do you think it might due to the behavior of LogisticRegression? Or maybe it is not supposed to be this case?

With Respects,
Zhidu

Regarding next assignment (A2) and topics covered by midterm

Hello Dr. Aron,

How are you! I hope you are enjoying this week. I just have a quick question as below.

Would you like to disclose some information about the next upcoming assignment, A2? I'm wondering what topics will be covered by this assignment. As you know, A1 only covers up to Link prediction. But our midterm also includes Cascading and Sentiment Analysis. If A2 is about these topics, could you please post A2 today or tomorrow?

By the way, will the solution for A1 (or a sample code) be published after its due? We talked about the solution to A0 during one of our lecture; but we will not have lecture next week...

Warmest Regards,
Zhidu

How to consider parents

Hi Professor/TA,

How we should consider parent-child relation in following example.
if we have edges (node1,node2) and (node2,node1)

then they are parents to each other as well.
As I can see my output by considering both ways relation-
[('A', ['B']), ('B', ['D']), ('C', ['B']), ('D', ['B', 'E', 'F', 'G']), ('F', ['D', 'E', 'G']), ('G', ['D', 'F'])]

And your output is
[('A', ['B']), ('B', ['D']), ('C', ['B']), ('D', ['E']), ('F', ['E']), ('G', ['D', 'F'])]

I am confused whether how you consider here.
Plese, explain what you want here.
Do I need to consider max_depth parameter here as well?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.