Giter Site home page Giter Site logo

malib-dataset's Introduction

MALib-Dataset

Introduction

MALib is an open-source dataset which contains 56,000+ mobile apps gathered from Google Play and 763 corresponding third-party libraries that have been used by those apps.

It can be used for investigating the potential relationship between the existing mobile apps and third-party libraries, especially the third-party library use patterns. Besides, it can be used for recommending potential useful third-party libraries for mobile apps underdeveloped.

Please feel free to use this dataset in your research, and it would be appreciated if you could cite the following paper.

Diversified Third-Party Library Prediction for Mobile App Development. Q. He, B. Li, F. Chen, J. Grundy, X. Xia and Y. Yang. IEEE Transactions on Software Engineering (2020).

Dataset Overview

Similar to classical recommendation databases, MALib contains three tables/files, namely apk_info.csv, lib_info.csv, and relation.csv respectively.

lib_info.csv

Column 1 Column 2
Library Id Library Name

In this file, we provide the exact name of each third-party library. Those names are gathered from Maven and Github.

app_info.csv

Column 1 Column 2
App Id App Name

In this file, we provide the exact name of each app as well as its version. This information is gathered from the meta file of their .apk files.

relation.csv

Column 1 Column 2
App Id Library Id

As the usage information of third-party libraries is binary attributes, i.e., 1 stand for the corresponding app has used the corresponding library, and 0 otherwise, we only have two columns in this datasheet. The first column is the id of each app and the second column is the id of each library. For example, when we find a tuple <1,2>, it means app 1 has used library 2. In addition, if we cannot find a tuple <1,5> in this datasheet, it means app 1 hasn't used library 5.

Dataset Analysis

The following two figures demonstrate the characteristics of MALib. In general, we find that 92.25% of these collected apps use 5 TPLs or more third-party libraries, with an average of 11.81.

Usage Distribution

In addition, the top 1% most popular libraries were involved in approximately 29.91% of the app-library usage records in MALib.

Usage Distribution

malib-dataset's People

Contributors

malibdata avatar

Stargazers

Zhenpeng Zhou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.