Giter Site home page Giter Site logo

fast's People

Contributors

b1naryth1ef avatar etcimon avatar martinnowak avatar mleise avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fast's Issues

Please drop me an email

Hi Marko.

Hope you're well.

When you have time, would you mind dropping me an email please ?

Laeeth

At

Kaleidicassociates.com

Thanks a lot.

Laeeth

failure on os x

when i try to run the benchmark (or include the library) on os x it fails with

dub --build=release -c benchmark
Performing "release" build using dmd for x86_64.
fast 0.3.1+commit.3.gbd6c92e: building configuration "benchmark"...
source/fast/parsing.d(808,29): Error: cannot directly load global variable 'SIMDFromString' with PIC code
source/fast/parsing.d(602,3): Error: template instance fast.parsing.vpcmpistri!(char, "\x01\x1f\"\"\\\\\x7f\xff", cast(Operation)4, cast(Polarity)0, false) error instantiating
source/fast/json.d(375,11):        instantiated from here: seekToRanges!"\x00\x1f\"\"\\\\\x7f\xff"
source/fast/json.d(328,7):        instantiated from here: scanString!true
source/fast/benchmarks.d(171,17):        instantiated from here: Json!(2u, true)
source/fast/parsing.d(808,29): Error: cannot directly load global variable 'SIMDFromString' with PIC code
source/fast/parsing.d(595,3): Error: template instance fast.parsing.vpcmpistri!(char, "\"\\", cast(Operation)0, cast(Polarity)0, false) error instantiating
source/fast/json.d(414,10):        instantiated from here: seekToAnyOf!"\\\"\x00"
source/fast/json.d(328,7):        instantiated from here: scanString!false
source/fast/json.d(97,1):        instantiated from here: Json!(0u, false)
source/fast/parsing.d(808,29): Error: cannot directly load global variable 'SIMDFromString' with PIC code
source/fast/parsing.d(655,3): Error: template instance fast.parsing.vpcmpistri!(char, " \x09\x0d\x0a", cast(Operation)0, cast(Polarity)16, false) error instantiating
source/fast/parsing.d(675,3):        instantiated from here: skipAllOf!" \x09\x0d\x0a"
source/fast/parsing.d(808,29): Error: cannot directly load global variable 'SIMDFromString' with PIC code
source/fast/parsing.d(690,3): Error: template instance fast.parsing.vpcmpistri!(char, "\x0d\x0a", cast(Operation)0, cast(Polarity)0, false) error instantiating

i guess its because OS X has PIC by default. so can i disable this or could there be a workaround in fast to make it work on OS X for development?

parse stream?

Hi Marco.

Small enhancement request. (Apologies if it's implemented already and I didn't see).

Quite often one wants to parse a JSON stream (like from Twitter or the Reddit comment dump). It would be nice to have that implemented as part of the library, so it's very easy to use. I have written a small range to do this, but it's quite crude, and I haven't paid attention to efficiency. I can make a pull request if you would like (and you can refine it later), but you may prefer to implement yourself - let me know.

Here is some very simple code to process Reddit comments:
https://gist.github.com/Laeeth/bbd08dd576cb7aeff444

The original comments are here:
https://archive.org/details/2015_reddit_comments_corpus

On one core it takes 35 minutes to process one month's data (35 Gig).

Thanks for getting in touch by email. That was about something else - have had to figure out some other things but will respond shortly.

Laeeth.

Pull out custom File from fast.json?

I noticed you use mmap in fast.json.

It is a nice trick, but also doesn't need to be.

  1. Refactor it so this mmap file implementation is in separate module, fast.file ? It uses alias m_json this; Json m_json;, but it can simply be a template parameter, so it can still leave be outside of json module, and json module would simply isntantiate it FastFile!Json for example. It looks to be static class/struct, so that is all. If you need access to some outer class variables, this can still be as template in separate module and used using mixing mixin FastFile!Json file;, with protocol defined and documented.

  2. There is std.mmfile which is working nice, and has read-only mode too, so it should be used instead. If there are some flags (like madvise, etc) that makes a difference they could be upstramed to std.mmfile as options.

  3. For small files it probably doesn't make sense to use mmap, as it can be expensive, not scale with number of cores / threads, and waste memory. Each mmap will probably mmap whole 4kB page, if left there for long. So for small files, it is better to just read explicitly into the properly-sized buffer, or even only do so for strings. A custom arena allocator can also be used. Yes, there is a function to parse from memory block, but then you put a lot of extra logic on the caller side, instead making it in a library.

  4. What about reading a stream from network socket, Unix pipe, or from decompression library in chunks? It can't be easily mmaped, but is often an important application of parsing, and probably most common use case. Right now the only option is to buffer whole thing, and then parse, which in some synthetic scenarios can be about 2x more memory usage.

No support for Apple Silicon and fails on virtualized x86_64 (EC2)

I'm unable to get this to run successfully on Apple Silicon because of an illegal hardware instruction error, which comes from the use of vpcmpistri, an AVX instruction not supported by Apple Rosetta. I've tried using dub with LDC2, GDC, and DMD to build with the same issue at runtime.

Here's the problematic assembly from lldb:

Process 25914 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x000000010000d275 app`_D4fast7parsing__T10vpcmpistriTaVyAaa2_225cVEQBrQBp9Operationi0VEQClQCj8Polarityi0Vbi0ZQCrFNaNbNiKPxaZv(p=0x000000030456ae38) at parsing.d:817
   814                                          movdqu      XMM0, [RAX];
   815                                          mov         RAX, [RDI];
   816                                  L1:
-> 817                                          vpcmpistri  XMM0, [RAX], mode;
   818                                          add         RAX, 16;
   819                                          cmp         ECX, 16;
   820                                          je          L1;

I have a use case for parsing JSON as fast as possible to sequentially load a massive pile of JSON files into a processing queue, so I decided to try this on an Xeon EC2 instance (which does support AVX-512) – the same line of assembly throws a SIGSEGV on an m5.xlarge EC2 running Ubuntu with the following trace:

* thread #1, name = 'historical-load', stop reason = signal SIGSEGV: address access protected (fault address: 0x7ffff6d2e000)
    frame #0: 0x000055555568ac21 app`_D4fast7parsing__T10vpcmpistriTaVyAaa3_227b7dVEQBtQBr9Operationi0VEQCnQCl8Polarityi0Vbi0ZQCtFNaNbNiKPxaZv(p=0x00007fffffffdcf8) at parsing.d:817
   814                                          movdqu      XMM0, [RAX];
   815                                          mov         RAX, [RDI];
   816                                  L1:
-> 817                                          vpcmpistri  XMM0, [RAX], mode;
   818                                          add         RAX, 16;
   819                                          cmp         ECX, 16;
   820                                          je          L1;

I also tried this on an m5.metal EC2 instance and got the same result.

My simple test implementation:

auto parsedJSON = parseJSON(`{"now":10}`);
writeln(parsedJSON.singleKey!"now");

Is there any plan to provide broader CPU architecture support?

How to use fast's jsonparser

Hi Marco, I really like the speed that comes with your pull based approach.
I have a simple program, that I would like to implement, but I am struggling applying the pull based thing to the problem:
I want to analyse nist's cve data (https://nvd.nist.gov/vuln/data-feeds) e.g. by searching through a datafile and printing out the whole json entry that matches an id.
The datalooks like this:

"CVE_Items" : [ {
  "cve" : {
    "data_type" : "CVE",
    "data_format" : "MITRE",
    "data_version" : "4.0",
    "CVE_data_meta" : {
      "ID" : "CVE-1999-0001",
      "ASSIGNER" : "[email protected]"
    },
    "affects" : {
      ...
    },
    "problemtype" : {
      ...
    },
    "references" : {
      ...
    },
    "description" : {
      ...
    }
  },
  "configurations" : {
    ...
  },
  "impact" : {
    ...
  },
  "publishedDate" : "1999-12-30T05:00Z",
  "lastModifiedDate" : "2010-12-16T05:00Z"
}, {
  "cve" : {
   ...

with your nice library I can easily write something like this:

  foreach (cveFile; cves) {
      foreach (item; cveFile.CVE_Items) {
          cveFile.cve.CVE_data_meta.keySwitch!("ID")(
                                             {
                                                 auto id = cveFile.read!string;
                                                 if (id in toFind) writeln(id);
                                             });
      }
  }

But instead of just outputting the id, i would like to dump everything, that belongs to the object that contains the matching id.

Whats the best way to do this?

GDC compile failure (gdc-5.2.0-1 arch linux)

laeeth@engine parsereddit]$ dub build --compiler=gdc
WARNING: A deprecated branch based version specification is used for the dependency fast. Please use numbered versions instead. Also note that you can still use the dub.selections.json file to override a certain dependency to use a branch instead.
Performing "debug" build using gdc for x86_64.
fast ~master: building configuration "library"...
parsereddit ~master: building configuration "application"...
/home/laeeth/.dub/packages/fast-master/source/fast/json.d: In member function 'skipWhitespace':
/home/laeeth/.dub/packages/fast-master/source/fast/parsing.d:661:6: error: inlining failed in call to always_inline 'skipAsciiWhitespace': function body not available
void skipAsciiWhitespace(ref const(char)* p)
^
/home/laeeth/.dub/packages/fast-master/source/fast/json.d:1336:4: error: called from here
m_text.skipAsciiWhitespace();
^
gdc failed with exit code 1.

License Issue

Hello,

I'm currently upgrading the spasm framework to be able to develop web applications in webassembly much like you do with React or Angular. I modified your library to be compatible with better C to generate wasm code: https://github.com/etcimon/libwasm/tree/master/fast

I intend on putting more work on libwasm to allow developers to create their apps through it mostly for mobile development, but I noticed the GPL3 license recently. I don't think anyone would want to create an application using a tool that forces them to make it open source. Do you think it could be changed to a more permissive license for this specific case? (removing the SIMD parts)

Thanks!

build break on dmd v2.092.0

On Windows 10:

Performing "debug" build using C:\project\dmd2\windows\bin64\dmd.exe for x86_64.
fast 0.3.5: building configuration ""...


\fast-0.3.5\fast\source\fast\cstring.d(198,59): Error: function fast.cstring.string2wstringSize(const(char[]) src) is not callable using argument types (const(ushort[]))
\fast-0.3.5\fast\source\fast\cstring.d(198,59):        cannot pass argument fname of type const(ushort[]) to parameter const(char[]) src
...

etc.

json: wrong utf validation

Example:

import std.utf;
import fast.json;

auto str = `{"a":"SΛNNO"}`;
str.validate;

auto js = Json!validateAll(str);
foreach(key; js.byKey)
{
    if(key == "a")
    {
        auto t = js.read!string;
        writeln(t);
    }
    else
    {
        writeln(key);
        js.skipValue();
    }
}

Error:

std.json.JSONException@std/json.d(1168): Byte 0xce forms invalid UTF-8 sequence in string. (Line 1:7)
----------------
../../.dub/packages/fast-0.3.0/source/fast/json.d:1317 void fast.json.Json!(2u, true).Json.handleError(immutable(char)[]) [0x48f7f3]
../../.dub/packages/fast-0.3.0/source/fast/json.d:1286 void fast.json.Json!(2u, true).Json.expectNot(immutable(char)[]) [0x48f63d]
../../.dub/packages/fast-0.3.0/source/fast/json.d:403 bool fast.json.Json!(2u, true).Json.scanString!(true).scanString() [0x499955]
../../.dub/packages/fast-0.3.0/source/fast/json.d:322 const(char)[] fast.json.Json!(2u, true).Json.borrowString() [0x48ec66]
../../.dub/packages/fast-0.3.0/source/fast/json.d:277 immutable(char)[] fast.json.Json!(2u, true).Json.read!(immutable(char)[]).read(bool) [0x49ce76]
source/app.d:23 int app.main().__foreachbody1(ref const(char[])) [0x48bf33]
../../.dub/packages/fast-0.3.0/source/fast/json.d:1197 bool fast.json.Json!(2u, true).Json.iterationGuts!("{}", const(char)[], int delegate(ref const(char[]))).iterationGuts(ref int, const(char)[], scope int delegate(ref const(char[])), immutable(char)[]) [0x49bbc6]
../../.dub/packages/fast-0.3.0/source/fast/json.d:906 int fast.json.Json!(2u, true).Json.byKeyImpl(scope int delegate(ref const(char[]))) [0x48f200]
source/app.d:19 _Dmain [0x48be79]
??:? _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ6runAllMFZ9__lambda1MFZv [0x4a3bfe]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.