breck7 / pldb Goto Github PK

PLDB: a Programming Language DataBase

JavaScript 94.01% Shell 0.52% CSS 5.47%

programming-languages data knowledge-graph

pldb's Introduction

title PLDB Readme

import rootHeader.scroll

printTitle

# A Programming Language Database

wideColumns 1

#### View this readme as HTML
 https://pldb.io/readme.html

import code/ciBadges.scroll

PLDB is a public domain ScrollSet and website containing over 135,000 facts about over 4,000 programming languages.

This repo contains the entire ScrollSet, code, and website for https://pldb.io.

## To download the data
The entire ScrollSet is ready to analyze in popular formats. Full documentation is here: https://pldb.io/csv.html
- As CSV: https://pldb.io/pldb.csv
- As TSV: https://pldb.io/pldb.tsv
- As JSON: https://pldb.io/pldb.json
- The JSON file is also available via npm:
javascriptCode
 // npm install pldb
 console.log(require("pldb").javascript.appeared)

## To add a new language
Create a new Scroll file in `concepts` with a unique URL friendly filename and send a pull request.

## To update a language
Edit the corresponding `concepts/*.scroll` file and send a pull request.

## To add a new measure
Update the file `code/measures.parsers` and add at least 1 measurement to a concept in `concepts` and send a pull request.

## To build the site locally
code
 git clone https://github.com/breck7/pldb
 cd pldb
 # Required to run this during first install only.
 npm i -g cloc
 # Required to run this on fresh checkout and when upgrading from an old checkout or periodically when there are new releases
 npm install .
 # (Optional) Run tests
 npm run test
 npm run build
 # After you make changes and before you commit make sure to run:
 npm run format

## To explore this repo
The most important folder is `concepts`, which contains the ScrollSet (a file for each concept). The file `code/measures.parsers` contains the Parsers (schema) for the ScrollSet.
You can see the `cloc` language stats on this repo at https://pldb.io/pages/about.html.

## To use as an npm package:
code
 npm install pldb
javascriptCode
 console.log(require("pldb").javascript.appeared)

import citation.scroll

All sources for PLDB can be found here: https://pldb.io/pages/acknowledgements.html

endColumns

import footer.scroll

pldb's People

Contributors

Stargazers

Watchers

Forkers

remierichards tito markbastian jp0d lguzzon vrthra adityaxdiwakar pavelvozenilek arakov superfola yang724687930 shade cajagobe aaveshdev cyberneticist-uk rohit484 paulbone dehilsterlexis stonecypher jenshaase rzimmerman cgccuser torocruzand johnwcowan smitshetye arronlacey gabssnake hherman1 akkartik shoff zsarge rtfeldman virtuoushub apb-uxvu76hj-upueekiai7w5l-twf-v nnurmano tiagozhang pengelana valrcs hax thearchiver datascientist1976 goheeca wshao12 issafrullah dancergraham sarang0218 aardappel lx-files cindywu likhithshankarprithvi joelethan mahadwaseem123 rkimera94 adriantintpilver lngns martin12333 stuartdambi hassamalhajaji nairboon tif-calin gabriel-vivas-sonarsource martinfjohansen stavares843 crt-fork daeer-projects fractalqualia ccoenen kspalaiologos shree-c georgi-sonar xzlinux kaby76 andreainfufsm jching83 seanpm2001 hg0428 ego axcheiste-rooney-mara refaktor fox-forks rochekollie raiph notpeter dalance sunbcy ell1e

pldb's Issues

Building pldb with cloc requires atleast 3 GB of memory

Locally building and testing with cloc requires at least 3 GB of RAM according to informal tests.

Should the build scripts/tests scripts be changed to warm people who want to download and build project ??

Note: Without cloc the memory requirement for building pldb seems to be quiet less.

Related Pull request: #87

EDIT LOG: Made the text description clearer.

Not able to check out the repo

This is because there is a file in the repo named nul.lani, and nul is a reserved word in Windows

Because your instructions say c# should be called c-sharp to get around filesystem limitations, I suggest the same for nul.lani

Pr incoming

john@LAPTOP-PE9BBGOJ MINGW64 ~/projects
$ git clone https://github.com/StoneCypher/codelani.git
Cloning into 'codelani'...
remote: Enumerating objects: 5964, done.
remote: Counting objects: 100% (299/299), done.
remote: Compressing objects: 100% (268/268), done.
remote: Total 5964 (delta 27), reused 250 (delta 12), pack-reused 5665
Receiving objects: 100% (5964/5964), 1.14 MiB | 1.06 MiB/s, done.
Resolving deltas: 100% (626/626), done.
error: invalid path 'database/nul.lani'
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

Spam / vandalism prevention

It looks like you're allowing anyone to make changes to the site, and any user can impersonate anyone. I don't see anything obviously malicious, although changes like 2bdc8cd look wrong.

Have you considered adding an authentication mechanism? Since the content is on GitHub, you could probably use that for auth.

another possible data source

https://glosario.carpentries.org/
https://twitter.com/gvwilson/status/1566904419857440768

look at fileType keyword

probably want to add binary + text

ab0be48

Possible duplicates: database/things/accent.pldb and database/things/accent-programming-language.pldb

Files:
database/things/accent-programming-language.pldb
and
database/things/accent.pldb
seem to be possible duplicates

as both list
https://en.wikipedia.org/wiki/Rational_Synergy#History

as reference .

Possibly related Issue #78

New language request: ZenScript

https://github.com/CraftTweaker/ZenScript

Paradigms/Categories/Use cases/Industries et cetera

The single type is obviously very limiting. Should improve the other ways of tagging things. A lot can probably be automated/AI.

refresh `creators.tree`

https://github.com/breck7/pldb/blob/main/site/lists/creators.tree

every creator should be able to add their:

appeared (birth year)
city/country of birth
website
github
twitter
anything else?

Fill `screenshot` keyword for all languages with `type visual`

For visual languages, a picture is worth a thousand words. PLDB now shows a screenshot for visual languages when that keyword is present. Example: https://pldb.com/languages/scratch.html

For every visual language, let's take our own nice screenshot of using the language in action.

See the below commit which added 2 examples:
b367296

Steps:

Take a screenshot of your screen using the visual language.
Save it in site/screenshots/[pldbId].png
Add a line to the file like this: screenshot https://pldb.com/screenshots/explorer.png
If doing this locally, run npm run format before committing.
Commit and push and send a PR

Fill column `country`

Currently less than <3000. This one should be a relatively easy one to 3x.

https://pldb.com/posts/originCountries.html

Clean up, document, and rerun `code/crawlers`

can probably delete half that code and should provide instructions on what the best patterns have turned out to be

In the "Written In" section of the https://edit.pldb.com/pages/acknowledgements.html page, the individual URLs contain the BASE_URL replacement word, which has either not been replaced or is not required in its current form.

For example, the JavaScript link is: https://edit.pldb.com/pages/BASE_URL/languages/javascript.html

Since, I'm guessing that the URL should be either https://edit.pldb.com/languages/javascript.html or https://pldb.com/languages/javascript.html, then perhaps a replacement has already taken place?

It would appear that the WRITTEN_IN_TABLE is being replaced correctly, but that any internal replacements are being left "as-is". Should this replacement happening recursively or does it need to occur at another point in the processing of the page?

Kind Regards,
Liam

speed up `git pull`

adding the branch for the github.pldb.com mirror slowed down git pull. figure that out.

(probably just change:)
pldb/.github/workflows/buildGithubDotPldbDotCom.yaml

Speed up and automate build so new edits and submissions are deployed immediately

All ids to external sources should be urls (where possible)

Right now for things like Wikipedia the grammar asks for the full url but for things like reddit it just asks for the subreddit id. For example. subreddit Python

The url is the clear way to go. It's a little bit of redundancy, but it makes each pldb file more useful on its own. And it's clearer for a new contributor what needs to be added (always just a url, never need to look up the encoding/decoding scheme).

Wrong layout for `Add a language` page

My screen width is 1920px. It seems only occurred when width > 1600px.

Question: anyone know if GitHub has a URL scheme where I can generate a GET URL with params to pre-populate a pull request?

Does that make sense? The idea is as you edit a file on build, you have the choice to save directly or go to GitHub and submit it as a PR.

Features upgrade

Spot on feedback to address when upgrading features stuff:

https://news.ycombinator.com/item?id=32628257

`I understand whis is pretty much WIP, but still, it's too unorganized to be anything useful. I thought the most interesting to be features page[1], which is nearly empty, and this effort in taxonomy is rather too complicated to be crowd-sourced without supervision. For example, let's take a look at traits[2] and mixins[3]. There are a couple of issues here. First off, why it's 2 different pages? There's no real difference between a trait in PHP, and mixin in… well, no languages except for Racket actually have a syntactic construct called "mixin", but I guess modules in Ruby or Julia are close enough. Scala also has something that's called "traits", and it's also basically the same thing, but with caveats.
On the other hand, D has both "mixins" and "traits", but these are completely different features, and these "traits" have nothing to do with traits in Scala or PHP. So if somebody were to make a comprehensive list of features of D in this DB, should these "traits" appear on the same page as PHP and Scala traits (which are mixins)?

Furthermore, unlike PHP, Scala, Ruby or Julia — Python's "mixins" aren't just mixins with a different name. It's not even clear if it has mixins at all. There's something people call a "mixin" in Python, but these are just classes, so you cannot really say "yes". However, Python has multiple inheritance, which makes "mixins" borderline pointless: classes are (or can be used as) mixins, if you have multiple inheritance! Templates in some languages can be used this way as well.

Which brings us to the next issue — it's not clear, if a language should be marked as having a feature if it comes built-in, explicitly, or if a feature can be implemented in it. Does every language have a semaphore? I cannot remember any where it couldn't be implemented (that would be weird), but I cannot remember any where it's an explicit feature construct either (well, arguably, maybe some SQL-extensions?).

All this isn't to say that the current list is bad. All the questions above can be answered in any way, and it's up to a "researcher" which definition to use in order to actually get a useful taxonomy. It's a non-trivial job.`

[1] - https://pldb.com/lists/features.html [2] - https://pldb.com/languages/traits-feature.html [3] - https://pldb.com/languages/mixin-feature.html [4] - https://pldb.com/languages/semaphores-feature.html

Fix contributors section on acknowledgements page

When I visit: https://api.github.com/repos/breck7/pldb/contributors, it seems to be pretty unstable (some days contributors disappear).

Perhaps

pldb/code/SiteBuilder.ts

Line 218 in 1361617

CONTRIBUTORS_TABLE: JSON.parse(

should be figuring out the contributors from the git repo, and not from that GET url.

Help! Fix `video` tag width issues on the new #BuildPublicDomain page ahead of the release of the launch video

Here's what the page looks like now:
https://pldb.com/posts/buildPublicDomain.html

Obviously got some issues.

Anyone know <video> tags and can come up with a fix(es)? Pull requests wanted!!!

(and of course we have to use the video tag and host these videos ourself, obviously). Cannot use a 3rd party video service.

nits from reddit to fix

from Kokaiinum

https://www.reddit.com/r/ProgrammingLanguages/comments/x2m24s/comment/imrf4j5/?context=3

If you're taking issue reports, one I noticed - "Cish" and "SuperForth" are apparently the same language (SuperForth renamed to Cish).

Also the examples for BEEF appear to be those of BeefLang (although I must admit I've no idea what actual BEEF looks like)

Invalid Julia (package count) info

Hi, I like your site that I just discovered.

Regarding Julia, its package manager, the repository moved, so your info at:

https://codelani.com/posts/does-every-programming-language-have-a-central-package-repository.html

i.e. not only 1,906 (not sure even at the time, don't recall when the moved happened).

i.e. you could substitute (if you need accurate numbers)
https://github.com/JuliaRegistries/General

or juliahub.com (for user-friendly access):

for https://julialang.org/packages/

everywhere.

Help filling in field `originCommunity` across all languages

https://pldb.com/lists/originCommunities.html

Create scroll keyword for use in posts named something like `filledMissingTable`

https://pldb.com/posts/originCountries.html

Suggested importer: Sample programs in every language

Hi Breck (@breck7),

Back in mid-July, you added the leachim6 importer (8fdd117) for the "hello-world" programs.

Might it also be possible to create an importer for the Sample Programs in Every Language, a collection started in 2018 by Jeremy Grifski (@jrg94) as part of The Renegade Programmer project? At the end of July, the repo contained 162 languages with 597 code snippets.

Kind Regards,
Liam

Add ci

https://scroll.pub/ has it done well.

New source: https://github.com/EbookFoundation/free-programming-books

Thanks JS!

Website loading with SSL error (on Firefox, RX too long, on curl, wrong version number)

Edit: Sorry, Enter submitted the form without a description 😄

curl https://pldb.com

curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong version number

I tried on 2 different machines.

Maybe http could be left available, and not provide the redirect (so we can still use the site when https isn't available).

Second edit: Removing the redirect headers, those appear to come from my ISP.

find and add more `demoVideo` data to languages

example: https://pldb.com/languages/explorer.html demoVideo https://www.youtube.com/watch?v=0l2QWH-iV3k

database/things/* errata discussion

alfred.pldb:
https://medium.com/@nikitavoloboev/writing-alfred-workflows-in-go-2a44f62dc432
References an IDE for mac users

alcor.pldb:
Seemed to have copy pasted a few extra sentences from Wikipedia, in the summary section.

algobox.pldb:
Wikipedia page https://en.wikipedia.org/wiki/Algoboxn does not seem to exist. There are no other links.

database/things/alpha-programming-language.pldb and database/things/alpha.pldb :
Seem to be duplicates: Each file contains new information not present in the other.

build.pldb.com refresh

We need to make it easier for people to add content:
73706e5

when you add a subreddit the reddit importer should run and update immediately
we should have autocomplete for the language ids

GitHub repository count from BigQuery is inaccurate, easy fix with GitHub's API

Hello, just saw your project on HN and it looks very interesting :) One thing I immediately noticed is that pldb seems to be using bigquery github data to show repository count of different languages. Sadly that bigquery dataset seems to be quite limited, and there's a much better way to find the number of repositories written in a specific language:

# Use per_page=1 so that we don't waste much bandwidth
$ curl "https://api.github.com/search/repositories?q=language:nim&per_page=1"
{
  "total_count": 8013,
  "incomplete_results": false,
  "items": [
    {
     <omitted for readability purposes>
    }
  ]
}

The actual count is in the total_count field, and it's only unique repositories (it doesn't count forks). If you want to also count forks (but I don't think it'd be a good idea) you can do

$ curl "https://api.github.com/search/repositories?q=language:nim+fork:true&per_page=1"
{
  "total_count": 18320,
  "incomplete_results": false,
  "items": [
    {
     <omitted for readability purposes>
    }
  ]
}

I don't know if these results are 100% exact, but they seem to be much more real than the BigQuery count.

Make Scroll Dataset for PLDB contributors. Then acknowledgements page: everyone who has contributed code to this repo should be able to add their preferred website/email/twitter/etc if they want

Language Count Discrepancy

Hi Breck (@breck7),

The list All Languages states that the PLDB has 4,058 languages:

Doing a search from the home page, with nothing in the search box, gives a different count for the languages:

https://edit.pldb.com/search?q=#

On my local copy of the repo, when I search the .pldb files in the /things/ folder for the "title" keyword, which the CSV Documentation says has 100% coverage, then I get the same result:

There is, however, a further issue with the search results. When I search on the page for the $ language, for example, I get two matches rather than one (the URL https://pldb.com/languages/dollar-sign.html is the same for both).

Here is the first match:

... and here is the second:

Something is getting repeated in the search results, which can be seen about halfway down the page:

I am guessing that the 4,671 count is correct, but how the results are being displayed as well as the difference with the "All Languages" list would need further investigation.

Kind Regards,
Liam

Steal Massimo's Idea: list of most famous bugs?

https://twitter.com/Rainmaker1973/status/1568258255088156672

"github" should possibly be named something else, like "repository"

I suggest the data entry "github" should possibly be named something else, like "repository". There are other places like gitlab.com, so naming a data key after just one platform (no matter how popular) when quite a few others are used just is a bit weird.

new source?

@celtic-coder you thinking what I'm thinking?

https://github.com/dbohdan/compilers-targeting-c

influenced graph and links between languages

should start adding more and better links on how these languages are related.

i think Diarmuid Pigott's HOPL really pioneered this. does anyone know him? he would definitely be the expert here i think.

New language request: ccs

Ccs, a scripting language for infoblox netmri. Here’s a link to official documentation https://www.infoblox.com/wp-content/uploads/infoblox-eval-download-netmri-NetMRI_CCS_Scripting_Guide.pdf

turn on incremental parsing in goaccess to reduce server load

It looks like the goaccess cronjob is now taking too long and causing some spikes.

No reason for us to reparse everything each time, there is an incremental option:
PROCESSING LOGS INCREMENTALLY
https://goaccess.io/man

Add left arrow and right arrow for previous/next to default scroll layout that functions the same way as keyboard shortcuts left and right

See mockup, with the < and >:

CipherLab seems not a programming language

As wikipedia, it's a company name.

As their homepage, they have C and Basic compiler products.

Every language with one should have a `documentation` keyword

Example:
https://pldb.com/languages/alumina.html: documentation https://docs.alumina-lang.net/

License Missing

Hi there,

fantastic project with lots of useful information on programming languages!

One suggestion for the GitHub repository. Please add a LICENSE file, then the License information is visible on the right.

Thanks!
-Thomas

add more information about how // was put first in c++ then moved to c

2 people have mentioned it:
https://news.ycombinator.com/item?id=32621392
https://www.reddit.com/r/ProgrammingLanguages/comments/x2m24s/comment/imqyscm/?context=3

important we get the history correct.

Feedback from TW: improve `type` column?

It needs to be improved.

#39

Remove all *-feature.pldb files

Those files are a mistake. I think all that information should be moved to the grammar files, and then we should have a /site/features/ folder, and a buildFeaturesPagesCommand() in SiteBuilder that generates those pages. That would make the code a lot clearer and fix a number of things.

Number of papers referencing Julia is way too low.

Google scholar lists 3750 articles citing the main Julia paper (https://scholar.google.com/scholar?cites=12373977815425691465&as_sdt=40000005&sciodt=0,22&hl=en) and semantic scholar shows 38000 papers with Julia as a keyword since 2012, and of the first 10 pages, all appear to be Julia papers.

Also, github shows 14000 repositories with julia code https://github.com/search?q=language%3AJulia&type=Repositories&ref=advsearch&l=Julia&l=.

I'm also pretty sure the number of downloads is wrong given that https://www.hpcwire.com/2021/01/13/julia-update-adoption-keeps-climbing-is-it-a-python-challenger/ lists 9 million downloads in 2020.

breck7 / pldb Goto Github PK

pldb's Introduction

pldb's People

Contributors

Stargazers

Watchers

Forkers

pldb's Issues

Steps:

Recommend Projects

Recommend Topics

Recommend Org