Giter Site home page Giter Site logo

Comments (2)

dimroc avatar dimroc commented on August 25, 2024

Cool. How long did it take on your computer?

I've got quite a few PRs to go through so I'm all about lightening the load right now 😅.

from etl-language-comparison.

maxgrenderjones avatar maxgrenderjones commented on August 25, 2024

Turns out the biggest issue was the regex - I was trying to do more than simply match a static string. Modified to the below, it takes 13s on a Macbook Pro.

from multiprocessing import Pool, Queue, cpu_count
from collections import Counter
import streamutils as su
import re

KNICKS=re.compile('knicks')

def process(f):
    return su.read(fname=f) | su.split(sep='\t') | su.sfilter(lambda x: KNICKS.match(x[3])) | su.smap(lambda x: x[1]) | su.bag()

if __name__=='__main__':
    bag=Pool(cpu_count()).map(process, su.find('tmp/tweets/tweets_*')) | su.sreduce(lambda x, y: x+y, Counter())
    bag.most_common() | su.smap(lambda x: '%s\t%s\n' % (x[0], x[1])) | su.write('tmp/python_parallelstreamoutput')

from etl-language-comparison.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.