Giter Site home page Giter Site logo

lda-python-beginning-practice's Introduction

Knowledge learning diaries

Something to say before start

从本科到现在浑浑噩噩几年,方向混乱没有章法,技能单一且低效,懒散却又不愿努力

选一个目前看起来自己能短时间掌握很多东西的技能

虽然不擅长,但也要开始学习这些东西 这个是用来放一些现阶段需要用到的 希望能够记录一下学习的过程 给2019年收个好的尾

因为也不是很熟悉基础知识,所以也会穿插一些统计知识之类的

2019

Aug 25-Sep 19 (Checked on Sep 19)

Plan: networkx, matplotlib visualization (how to plot)

  how to map relationship and network (初步)-- not finished, did it with ucinet and gephi  
  
  Sep 19 Meet Prof.   --about to meet  
  
  Sort paper and read how to write article   -- learned some basics
  
  统计学 Chapter 1-4   -- not finished, do it on night Sep 19.

Finished: jieba + partial LDA

Sep 19-Sep 24 web crawl basics

Plan: how to get pass offical government websites, get, read and save the data

  draw a worldmap  
  
  draw relationships and network  
  
  know numpy, pandas well
  
  统计学 Chapter 5 

October 2--

Still Web Scraping TT...

--Oct 14, Oct15-22

Went through the courses by Mr.Han and gained some thoughts on python and ML. (mostly completed)

  • Web scraping and nlp on policy, and the stuffs conducted past weeks. Learn more on modeling and algorithms. (having difficulty)

  • Look for cases and think about the application forms, think clearly about what data and what areas I may be in the future. (not yet)

  • Be more specific and clear about the things I am doing

  • Find a day and write the manuscript, sent the draft to supervisor

Oct 22-Oct 30

  • Think more about algorithms, and networkx something

  • About spark, big data.

  • Write down the methods, results, build up a framework before Oct.27

------Edited on Oct.29---------

Starting paper, painful in writing the methods, wonder to what extent should i elaborate the analysis (started)

  • two major failures:

Tried yellowbrick, falied in doing the dispersion plot, dont know why. (completed)

Tried to compute the perplexity using that by scikit-learn, failed in installing malley, dont know why. (used coherence to replace it)

  • So the research structure becomes simpler: jieba + gensim + tf-idf + LDA (completed)

Keep writing 500+ for each day, fighting~~

Nov 7-11

--------Edited on Nov.7----------

The past week is tough, many things happened, keep apologizing and really have hard time telling whether people are telling the truths.

Good thing is that I finished my visa applicaion and started writing, phew!

I have already drafted the intro, data, methods of my paper, but I found the data i use had a little problem and needed revision.

  • Revise data, find out more about DTM and how to plot, finish the results and send them to Prof before next Monday.

  • Read that book on Hong Kong by Gu Rude, very interesting, or maybe I should read some related to the housing problems

  • Sort out the knowledge map in this area, figure out what i should do after this paper

  • Learn about visualization, art and science

  • Learn the courses on DataCamp and finished that one in Coursera (Oops, my money, TT)

Nov 12-16

  • There will be some digital/technology events in Shenzhen, will go there.

Nov 24-30

Past days were in Chaos. HK has been too horrible to stay, so i went to SZ. And i met Jack, and lots of things happened. So little time to write the paper and started self learning.

After 10 days, i finally have some courage to start my paper and the due is coming.

Should start SNA and geography studies. innovation spread, network, relationship

I think the goals for the next year is text mining, social network analysis, machine learning and urban planning, also GIS

  • for 25, results plot

2020

Covid19 has caused a lot of trouble. havent been able to sit down and start to learn new things. basically i just spent my days writing the lengthy article. and now i think i can move on.

June 11 - 19

  • Figure out the method for policy text mining, read the fundemental book every day.
  • Write the intro and literature at the same time concerning the sc policy, lock down to the data source and set the objectives.
  • Try BERT + LDA + Clustering to do the topic modeling
  • Learn SNA and try to draw a concept map.

------------On June 17-----------

  • 找到了一本和policy分析和ml有点相关的书,一看是JR gil garcia写的,难怪了 写了那个initiative framework的人之一

  • Not so many information on policy semantic/text analytics, not to say topic modeling or others. some are on comments, some extract from social media like twitter. its hard to identify the objectives, plans, processes seperately. And one great drawback here is I am not capable of conducting sophisticated and integrated analysis. (should continue explore and think about this)

  • Have collected 15+paper for sc policy. truth is few scholars are in this field. some important persons are like angelidous and her teams. And other than this, nearly zero. So need to expand the literature review search to a lot more. (Possibly on public strategic planning and policy making)

  • The basic concept map is drawn using QDA tools since the edge and the weights are vague.

  • For policy network, mostly are about the parties, or some other dynamic changes, it is difficult to get data. should think about this.

  • Have started reading at the ml algorithms. Good news is it is not that hard when I read them this time, and I seemed to have found some shared things in them. (Will keep doing this)

---------- On June 19-------------

  • Last night i struggled with seaborn and bokeh 1h passed and nothing.
  • I found a book talking about deep learning stuffs, quite interesting, I think I can carry on.
  • I managed to draw sth QDA map. problem now is the categories not defined and unclear, the input should be expanded, and see if there are some other tools that I could apply.

Not so much time left, fighting!

June 20-June 30

  • This time really need to send out the sc knowledge paper
  • Draft for sc policy/PPP (content, framework, development, 8 pages is ok)
  • ML, NLP do quickly adsorp more as possible, and recap visualization like seaborn.
  • GIS beginner, draw the distribution map
  • Go through the tableau book, if my tableau is not expired.
  • Think about next step.

----------On June 22--------------------

  • Today I found that the e government things or smart governance or any management isses are about the construction of databases, like the distributed system building, hadoop, or sth. and scholars in this field are mostly in computer science. so whats left for the traditional strand, or the interdisciplinary policy modeling, or say that all need to have the simulation skills;
  • General Equilibrium Models; System Dynamic Modelling; Markov Modelling; Agent-Based Modelling; Discrete Event Modelling

Dec 07-Dec 30

Oops its been half years since my last update. What ive done in the past months: submitted one paper, got one paper accepted, keynote on one conference, and now revising another paper, but it is still far away from the graduation criteria.

Goals for the remaining days in 2020:

  • Familiar with discourse network/political network/governance network analysis; more familiar with graph theory;
  • Keep looking for postdoc
  • one conference abstract before Dec 9
  • Draft main contents framework for thesis

https://mc-stan.org/docs/2_25/stan-users-guide/index.html#overview

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.