Giter Site home page Giter Site logo

pldb's Introduction

title PLDB Readme

import rootHeader.scroll

printTitle

# A Programming Language Database

wideColumns 1

#### View this readme as HTML
 https://pldb.io/readme.html

import code/ciBadges.scroll

PLDB is a public domain ScrollSet and website containing over 135,000 facts about over 4,000 programming languages.

This repo contains the entire ScrollSet, code, and website for https://pldb.io.

## To download the data
The entire ScrollSet is ready to analyze in popular formats. Full documentation is here: https://pldb.io/csv.html
- As CSV: https://pldb.io/pldb.csv
- As TSV: https://pldb.io/pldb.tsv
- As JSON: https://pldb.io/pldb.json
- The JSON file is also available via npm:
javascriptCode
 // npm install pldb
 console.log(require("pldb").javascript.appeared)

## To add a new language
Create a new Scroll file in `concepts` with a unique URL friendly filename and send a pull request.

## To update a language
Edit the corresponding `concepts/*.scroll` file and send a pull request.

## To add a new measure
Update the file `code/measures.parsers` and add at least 1 measurement to a concept in `concepts` and send a pull request.

## To build the site locally
code
 git clone https://github.com/breck7/pldb
 cd pldb
 # Required to run this during first install only.
 npm i -g cloc
 # Required to run this on fresh checkout and when upgrading from an old checkout or periodically when there are new releases
 npm install .
 # (Optional) Run tests
 npm run test
 npm run build
 # After you make changes and before you commit make sure to run:
 npm run format

## To explore this repo
The most important folder is `concepts`, which contains the ScrollSet (a file for each concept). The file `code/measures.parsers` contains the Parsers (schema) for the ScrollSet.
You can see the `cloc` language stats on this repo at https://pldb.io/pages/about.html.

## To use as an npm package:
code
 npm install pldb
javascriptCode
 console.log(require("pldb").javascript.appeared)

import citation.scroll

All sources for PLDB can be found here: https://pldb.io/pages/acknowledgements.html

endColumns

import footer.scroll

pldb's People

Contributors

adityaxdiwakar avatar adriantintpilver avatar breck7 avatar celtic-coder avatar codelani avatar cyberneticist-uk avatar dalance avatar dancergraham avatar hassamalhajaji avatar hirrolot avatar johnwcowan avatar josevalim avatar lguzzon avatar lngns avatar mahadwaseem123 avatar martinfjohansen avatar munksgaard avatar notpeter avatar pavelvozenilek avatar pldbbot avatar rkimera94 avatar rtfeldman avatar sarang0218 avatar shaiber avatar stavares843 avatar stuartdambi avatar superfola avatar tif-calin avatar wickedsmoke avatar yairchu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pldb's Issues

Building pldb with cloc requires atleast 3 GB of memory

Locally building and testing with cloc requires at least 3 GB of RAM according to informal tests.

Should the build scripts/tests scripts be changed to warm people who want to download and build project ??

Note: Without cloc the memory requirement for building pldb seems to be quiet less.

Related Pull request: #87

EDIT LOG: Made the text description clearer.

Not able to check out the repo

This is because there is a file in the repo named nul.lani, and nul is a reserved word in Windows

Because your instructions say c# should be called c-sharp to get around filesystem limitations, I suggest the same for nul.lani

Pr incoming

john@LAPTOP-PE9BBGOJ MINGW64 ~/projects
$ git clone https://github.com/StoneCypher/codelani.git
Cloning into 'codelani'...
remote: Enumerating objects: 5964, done.
remote: Counting objects: 100% (299/299), done.
remote: Compressing objects: 100% (268/268), done.
remote: Total 5964 (delta 27), reused 250 (delta 12), pack-reused 5665
Receiving objects: 100% (5964/5964), 1.14 MiB | 1.06 MiB/s, done.
Resolving deltas: 100% (626/626), done.
error: invalid path 'database/nul.lani'
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

Spam / vandalism prevention

It looks like you're allowing anyone to make changes to the site, and any user can impersonate anyone. I don't see anything obviously malicious, although changes like 2bdc8cd look wrong.

Have you considered adding an authentication mechanism? Since the content is on GitHub, you could probably use that for auth.

Fill `screenshot` keyword for all languages with `type visual`

For visual languages, a picture is worth a thousand words. PLDB now shows a screenshot for visual languages when that keyword is present. Example: https://pldb.com/languages/scratch.html

For every visual language, let's take our own nice screenshot of using the language in action.

See the below commit which added 2 examples:
b367296

Steps:

  1. Take a screenshot of your screen using the visual language.
  2. Save it in site/screenshots/[pldbId].png
  3. Add a line to the file like this: screenshot https://pldb.com/screenshots/explorer.png
  4. If doing this locally, run npm run format before committing.
  5. Commit and push and send a PR

Broken links in "Written In"

Hi Breck (@breck7),

In the "Written In" section of the https://edit.pldb.com/pages/acknowledgements.html page, the individual URLs contain the BASE_URL replacement word, which has either not been replaced or is not required in its current form.

For example, the JavaScript link is: https://edit.pldb.com/pages/BASE_URL/languages/javascript.html

Since, I'm guessing that the URL should be either https://edit.pldb.com/languages/javascript.html or https://pldb.com/languages/javascript.html, then perhaps a replacement has already taken place?

It would appear that the WRITTEN_IN_TABLE is being replaced correctly, but that any internal replacements are being left "as-is". Should this replacement happening recursively or does it need to occur at another point in the processing of the page?

Kind Regards,
Liam

speed up `git pull`

adding the branch for the github.pldb.com mirror slowed down git pull. figure that out.

(probably just change:)
pldb/.github/workflows/buildGithubDotPldbDotCom.yaml

All ids to external sources should be urls (where possible)

Right now for things like Wikipedia the grammar asks for the full url but for things like reddit it just asks for the subreddit id. For example. subreddit Python

The url is the clear way to go. It's a little bit of redundancy, but it makes each pldb file more useful on its own. And it's clearer for a new contributor what needs to be added (always just a url, never need to look up the encoding/decoding scheme).

Features upgrade

Spot on feedback to address when upgrading features stuff:

https://news.ycombinator.com/item?id=32628257

`I understand whis is pretty much WIP, but still, it's too unorganized to be anything useful. I thought the most interesting to be features page[1], which is nearly empty, and this effort in taxonomy is rather too complicated to be crowd-sourced without supervision. For example, let's take a look at traits[2] and mixins[3]. There are a couple of issues here. First off, why it's 2 different pages? There's no real difference between a trait in PHP, and mixin in… well, no languages except for Racket actually have a syntactic construct called "mixin", but I guess modules in Ruby or Julia are close enough. Scala also has something that's called "traits", and it's also basically the same thing, but with caveats.
On the other hand, D has both "mixins" and "traits", but these are completely different features, and these "traits" have nothing to do with traits in Scala or PHP. So if somebody were to make a comprehensive list of features of D in this DB, should these "traits" appear on the same page as PHP and Scala traits (which are mixins)?

Furthermore, unlike PHP, Scala, Ruby or Julia — Python's "mixins" aren't just mixins with a different name. It's not even clear if it has mixins at all. There's something people call a "mixin" in Python, but these are just classes, so you cannot really say "yes". However, Python has multiple inheritance, which makes "mixins" borderline pointless: classes are (or can be used as) mixins, if you have multiple inheritance! Templates in some languages can be used this way as well.

Which brings us to the next issue — it's not clear, if a language should be marked as having a feature if it comes built-in, explicitly, or if a feature can be implemented in it. Does every language have a semaphore? I cannot remember any where it couldn't be implemented (that would be weird), but I cannot remember any where it's an explicit feature construct either (well, arguably, maybe some SQL-extensions?).

All this isn't to say that the current list is bad. All the questions above can be answered in any way, and it's up to a "researcher" which definition to use in order to actually get a useful taxonomy. It's a non-trivial job.`

[1] - https://pldb.com/lists/features.html [2] - https://pldb.com/languages/traits-feature.html [3] - https://pldb.com/languages/mixin-feature.html [4] - https://pldb.com/languages/semaphores-feature.html

Invalid Julia (package count) info

Hi, I like your site that I just discovered.

Regarding Julia, its package manager, the repository moved, so your info at:

https://codelani.com/posts/does-every-programming-language-have-a-central-package-repository.html

i.e. not only 1,906 (not sure even at the time, don't recall when the moved happened).

i.e. you could substitute (if you need accurate numbers)
https://github.com/JuliaRegistries/General

or juliahub.com (for user-friendly access):

for https://julialang.org/packages/

everywhere.

Website loading with SSL error (on Firefox, RX too long, on curl, wrong version number)

Edit: Sorry, Enter submitted the form without a description 😄

curl https://pldb.com

curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong version number

I tried on 2 different machines.

Maybe http could be left available, and not provide the redirect (so we can still use the site when https isn't available).

Second edit: Removing the redirect headers, those appear to come from my ISP.

database/things/* errata discussion

alfred.pldb:
https://medium.com/@nikitavoloboev/writing-alfred-workflows-in-go-2a44f62dc432
References an IDE for mac users

alcor.pldb:
Seemed to have copy pasted a few extra sentences from Wikipedia, in the summary section.

algobox.pldb:
Wikipedia page https://en.wikipedia.org/wiki/Algoboxn does not seem to exist. There are no other links.

database/things/alpha-programming-language.pldb and database/things/alpha.pldb :
Seem to be duplicates: Each file contains new information not present in the other.

build.pldb.com refresh

We need to make it easier for people to add content:
73706e5

  • when you add a subreddit the reddit importer should run and update immediately
  • we should have autocomplete for the language ids

GitHub repository count from BigQuery is inaccurate, easy fix with GitHub's API

Hello, just saw your project on HN and it looks very interesting :) One thing I immediately noticed is that pldb seems to be using bigquery github data to show repository count of different languages. Sadly that bigquery dataset seems to be quite limited, and there's a much better way to find the number of repositories written in a specific language:

# Use per_page=1 so that we don't waste much bandwidth
$ curl "https://api.github.com/search/repositories?q=language:nim&per_page=1"
{
  "total_count": 8013,
  "incomplete_results": false,
  "items": [
    {
     <omitted for readability purposes>
    }
  ]
}

The actual count is in the total_count field, and it's only unique repositories (it doesn't count forks). If you want to also count forks (but I don't think it'd be a good idea) you can do

$ curl "https://api.github.com/search/repositories?q=language:nim+fork:true&per_page=1"
{
  "total_count": 18320,
  "incomplete_results": false,
  "items": [
    {
     <omitted for readability purposes>
    }
  ]
}

I don't know if these results are 100% exact, but they seem to be much more real than the BigQuery count.

Language Count Discrepancy

Hi Breck (@breck7),

The list All Languages states that the PLDB has 4,058 languages:

All-Languages-count

Doing a search from the home page, with nothing in the search box, gives a different count for the languages:

https://edit.pldb.com/search?q=#

Blank-search-count

On my local copy of the repo, when I search the .pldb files in the /things/ folder for the "title" keyword, which the CSV Documentation says has 100% coverage, then I get the same result:

4671-files-found

There is, however, a further issue with the search results. When I search on the page for the $ language, for example, I get two matches rather than one (the URL https://pldb.com/languages/dollar-sign.html is the same for both).

Here is the first match:

Dollar-Sign-first-result

... and here is the second:

Dollar-Sign-second-result

Something is getting repeated in the search results, which can be seen about halfway down the page:

Results-getting-repeated

I am guessing that the 4,671 count is correct, but how the results are being displayed as well as the difference with the "All Languages" list would need further investigation.

Kind Regards,
Liam

influenced graph and links between languages

should start adding more and better links on how these languages are related.

i think Diarmuid Pigott's HOPL really pioneered this. does anyone know him? he would definitely be the expert here i think.

License Missing

Hi there,

fantastic project with lots of useful information on programming languages!

One suggestion for the GitHub repository. Please add a LICENSE file, then the License information is visible on the right.

Thanks!
-Thomas

Remove all *-feature.pldb files

Those files are a mistake. I think all that information should be moved to the grammar files, and then we should have a /site/features/ folder, and a buildFeaturesPagesCommand() in SiteBuilder that generates those pages. That would make the code a lot clearer and fix a number of things.

Number of papers referencing Julia is way too low.

Google scholar lists 3750 articles citing the main Julia paper (https://scholar.google.com/scholar?cites=12373977815425691465&as_sdt=40000005&sciodt=0,22&hl=en) and semantic scholar shows 38000 papers with Julia as a keyword since 2012, and of the first 10 pages, all appear to be Julia papers.

Also, github shows 14000 repositories with julia code https://github.com/search?q=language%3AJulia&type=Repositories&ref=advsearch&l=Julia&l=.

I'm also pretty sure the number of downloads is wrong given that https://www.hpcwire.com/2021/01/13/julia-update-adoption-keeps-climbing-is-it-a-python-challenger/ lists 9 million downloads in 2020.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.