Comments (6)
Oh, I also meant that we derive the topics once and then just check which documents in which year belong to which topic the most. I would postpone this a bit though, because of the presentation.
from cs-insights-crawler.
It would be cool if you just plot the importance of the same topics over the years if possible. See here under "Topics over time".
from cs-insights-crawler.
I'm not entirely sure which behaviour you would expect when implemented. Currently you can add date ranges that you want to plot, but this will give you only one plot for all years combined. So you want an additional option, so all years are viewed separately and then plotted in a plot like in the one you linked? So you also just want the importance of the topics, which I assume is their numbering, not the plots by pyLDAvis over time?
from cs-insights-crawler.
Once we have the topics generated for a collection, LDA allows us to identify which documents "belong" to each topic (and word within that topic). That being said, we can plot all (same) topics in a time window (let's say 2011-2020) using the count of the documents to show the importance of these topics on each year. Probably some topics are more popular in specific years. For example, we have the years 2011, 2012, 2013. and the topics A, B, C in a corpus of 10 documents. We would have:
- 2011
- A: 1
- B: 1
- C: 0
- 2012
- A: 1
- B: 3
- C: 1
- 2013
- A: 0
- B: 1
- C: 2
Using a raw count, we can see that B was a hyped topic, but it fades away, while C was a growing theme.
Of course, how we consider the documents over the years can be different. If I'm not wrong, LDA will give an actual % of how much a given word is important for that topic. And for each document, you can see how much % it "belongs" to a topic. Again, here we can try a couple of things. We can use the weight/raw output and see the amount of document per topic (probably there will be 2-3 topics dominating all the others for a document), we can obtain the top-K similar documents wrt the topics at hand, or we could use the topics vectors as the centroids in a K-Means algorithm and see how the documents are plotted. Probably there will be a lot of overlap between the clusters.
from cs-insights-crawler.
Thank you for the explanation. So, if I understand correctly we train a model on the data from 2010 to 2020. Then we have 10 topics (or however many we defined previously). Next we check e.g. which documents published in 2010 belong to which topic and count how many documents there were per topic. Then we do this for every year and plot it like Jan showed.
If this is correct, I can try to do this, but I currently do not know how to get the information from the model. I'll have to look into this first. If i run into problems I'll let you know.
from cs-insights-crawler.
I thought about deriving the topics from the entire corpus at once, from that we do the other verifications. However, your idea also seems interesting. If we have the topics per year we will be able to compare them against the overall. My guess is some topics will be always there, like AI, Machine Learning. Others will be more seasonal.
from cs-insights-crawler.
Related Issues (20)
- Extract call for papers from venue page
- DBLP Client, Processor, Backend Client HOT 1
- Implement DBLP Client HOT 1
- Implement automated storing to db/backend HOT 4
- Implement Processor class HOT 1
- Add automatic documentation and hosting on GitHub pages
- Add Dockerfile and docker-compose for grobid and project HOT 1
- Umlaute author and conference names
- Match venue names
- Expand use of --s2_use_tldrs, --s2_use_citations, --s2_use_embeddings
- Add pep8-naming
- Dataset Release v2.0 HOT 1
- Fix using all entries in export
- Remove paperAbstracts from non open access papers in zenodo
- Automatic upload to Zenodo HOT 1
- Expand use of --s2_filter_pubmed, --s2_filter_arxiv, --s2_filter_pubmedcentral HOT 1
- Add test configuration
- Add CSO annotations to release HOT 1
- Link Scopus and Web of Science to D3
- Total number of works is not equivalent to count of papers.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cs-insights-crawler.