Giter Site home page Giter Site logo

artic-poc's Introduction

ARTIC

Artic is metadata extractor from Scientific Papers using two-layer CRF. The current state of the project is not ready to be used. We have open-source this project with the intent to be used by anyone that wants to have a better understanding of what is being developed.

Currently, we are working on supporting this project as a tool and make it available as the version 1.0.

Naturally, we will introduce some changes with the goal of reducing the size of the project and make it more functional.

This is a Master project developed at the Universidade Federal do Rio Grande do Sul.

Presentation is available here.

Please find below the list of 100 papers we used to test Artic:

1 - Sesame: informing user security decisions with system visualization

2 - TALC: Using Desktop Graffiti to Fight Software Vulnerability

3 - You've been warned: an empirical study of the effectiveness of web browser phishing warnings

4 - Don't look now, but we've created a bureaucracy: the nature and roles of policies and rules in wikipedia

5 - A frequency-based and a poisson-based definition of the probability of being informative

6 - A new statistical formula for Chinese text segmentation incorporating contextual information

7 - A pseudo random coordinated scheduling algorithm for Bluetooth scatternets

8 - A similarity measure for motion stream segmentation and recognition

9 - Analysis of soft handover measurements in 3G network

10 - Conversation pivots and double pivots

11 - An expressive aspect language for system applications with Arachne

12 - A taxonomy of ambient information systems: four patterns of design

13 - Exploring the role of the reader in the activity of blogging

14 - 2-source dispersers for sub-polynomial entropy and Ramsey graphs beating the Frankl-Wilson construction

15 - A geometric constraint library for 3D graphical applications

16 - A resilient packet-forwarding scheme against maliciously packet-dropping nodes in sensor networks

17 - Looking at, looking up or keeping up with people?: motives and use of facebook

18 - A new approach to intranet search based on information extraction

19 - Ambient Social TV: Drawing People into a Shared Experience

20 - A computational approach to reflective meta-reasoning about languages with bindings

21 - Accelerated focused crawling through online relevance feedback

22 - Harvesting with SONAR: the value of aggregating social network information

23 - An intensional approach to the specification of test cases for database applications

24 - A Dependability Perspective on Emerging Technologies

25 - A Machine Learning Based Approach for Table Detection on The Web

26 - A two-phase sampling technique for information extraction from hidden web databases

27 - The Adaptation of Visual Search Strategy to Expected Information Gain

28 - Automatic extraction of titles from general documents using machine learning

29 - Heterogeneous Transfer Learning for Image Clustering via the Social Web

30 - Unsupervised Multilingual Grammar Induction

31 - Unsupervised Argument Identification for Semantic Role Labeling

32 - Automated Rich Presentation of a Semantic Topic

33 - Investigations on Word Senses and Word Usages

34 - A Comparative Study on Generalization of Semantic Roles in FrameNet

35 - Exploiting Heterogeneous Treebanks for Parsing

36 - Cross Language Dependency Parsing using a Bilingual Lexicon

37 - Topological Field Parsing of German

38 - Reinforcement Learning for Mapping Instructions to Actions

39 - A Distributed 3D Graphics Library

40 - Brutus: A Semantic Role Labeling System Incorporating CCG, CFG, and Dependency Features

41 - Temporal Summaries of News Topics

42 - Generating Event Storylines from Microblogs

43 - Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining

44 - Temporal Web Page Summarization

45 - A Cross-Collection Mixture Model for Comparative Text Mining

46 - Temporal Corpus Summarization Using Submodular Word Coverage

47 - From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series

48 - Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews

49 - Large-Scale Sentiment Analysis for News and Blogs

50 - Mining and Summarizing Customer Reviews

51 - Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment

52 - Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

53 - QoS Guaranteed Resource Block Allocation Algorithm in LTE Downlink

54 - Resource Allocation for Real Time Services Using Cooperative Game Theory and a Virtual Token Mechanism in LTE Networks

55 - Downlink Packets Scheduling in Enterprise WLAN

56 - Cross-layer Scheduling with Secrecy Demands in Delay-aware OFDMA Network

57 - Computational Analysis and Efficient Algorithms for Micro and Macro OFDMA Scheduling

58 - EFFICIENTLY LOCATING COLLECTIONS OF WEB PAGES TO WRAP

59 - Growing Parallel Paths for Entity-Page Discovery

60 - Crawling Deep Web Entity Pages

61 - Semi-Supervised Learning of Attribute-Value Pairs from Product Descriptions

62 - The Role of Query Sessions in Extracting Instance Attributes from Web Search Queries

63 - Understanding Deep Web Search Interfaces: A Survey

64 - Structured Databases on the Web: Observations and Implications

65 - Harnessing the Deep Web: Present and Future

66 - Using Latent-Structure to Detect Objects on the Web

67 - Supporting the Automatic Construction of Entity Aware Search Engines

68 - Example Based Entity Search in the Web of Data

69 - Object Search: Supporting Structured Queries in Web Search Engines

70 - Ad-hoc Object Ranking in the Web of Data

71 - Gulliver in the land of data warehousing: practical experiences and observations of a researcher

72 - Deciding the Physical Implementation of ETL Workflows

73 - Defining ETL Worfklows using BPMN and BPEL

74 - A Model-Driven Framework for ETL Process Development

75 - Modeling How Students Learn to Program

76 - The WEKA Data Mining Software: An Update

77 - GraphLab: A New Framework For Parallel Machine Learning

78 - Machine Learning in Computer Forensics (and the Lessons Learned from Machine Learning in Computer Security)

79 - Exploring Factors that Influence Computer Science Introductory Course Students to Persist in the Major

80 - Towards energy-aware scheduling in data centers using machine learning

81 - Introduction to Probabilistic Topic Models

82 - A Few Useful Things to Know about Machine Learning

83 - EnsembleMatrix: Interactive Visualization to Support Machine Learning with Multiple Classifiers

84 - You Are Where You Tweet: A Content-Based Approach to Geo-locating Twitter Users

85 - Large-Scale Machine Learning with Stochastic Gradient Descent

86 - Cellular Traffic Offloading through Opportunistic Communications: A Case Study

87 - Learning Behavior Styles with Inverse Reinforcement Learning

88 - Uncovering Social Spammers: Social Honeypots + Machine Learning

89 - The Tradeoffs of Large Scale Learning

90 - Bob: A Free Signal Processing and Machine Learning Toolbox for Researchers

91 - Connecting K-16 Curriculum & Policy: Making Computer Science Engaging, Accessible, and Hospitable for Underrepresented Students

92 - Using Scalable Game Design to Teach Computer Science From Middle School to Graduate School

93 - Expressing Computer Science Concepts Through Kodu Game Lab

94 - A Survey of Computer Science Teacher Preparation Programs in Israel Tells Us: Computer Science Deserves a Designated High School Teacher Preparation!

95 - A Geographical Analysis of Knowledge Production in Computer Science

96 - VLFeat - An open and portable library of computer vision algorithms

97 - The CS10K Project: Mobilizing the Community to Transform High School Computing

98 - Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

99 - NIMBLE: A Toolkit for the Implementation of Parallel Data Mining and Machine Learning Algorithms on MapReduce

100 - Coupled Semi-Supervised Learning for Information Extraction

Annotations for the first-level CRF of these papers can be found here. Author Information CRF annotations can be found here. Footnote annotations can be found here. Finally, the JSON gold-standard (expected output) is available here.

Papers 1 to 40 are the papers from SectLabel project.

artic-poc's People

Contributors

grommet-github-bot avatar alansouzati avatar

Stargazers

Hiber Niu avatar  avatar

Watchers

 avatar

artic-poc's Issues

k-fold cross validation

Hi Alan,

This is great work, but I've noticed that you're running k-fold cross validation only once.
Theoretically, you should run k-fold cross validation t times with a different random arrangement each time to get better accuracy estimates. It might be that your results are even better that you have today.
If you did those steps manually by executing your code t times then you can ignore this issue. ๐Ÿ‘

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.