Giter Site home page Giter Site logo

df's People

Contributors

datafibers avatar schubertzhu avatar willddy avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

df's Issues

Batching files to HDFS

This is new feature to batch load file to HDFS.
In this case, the file need to be arrived in DF server first. DF just move/copy the file to HDFS.

DF Active Server Refactory

Need to refactor the server of DF as follows

  • Use event bus
  • Split code to different sevice verticle
  • Redefine/Polish MetaData
  • Redefine/Polish communication protocal
  • Use HTTPS
  • Add authentication

Configurable persist layer on HDFS

Make the data is archived into Hadoop and/or a file storage web service before it expires from Kafka.
This will be a far away feature. Put it here as placeholder.

DF Agent unblocking isssue

DF Agent is now unblocking with verx.
When 1st thread is not fininshed streaming while 2nd thread starts. There are chances to get both threads's data mixed up. As result, the Kafka will have bad data.

The resolution is to make it blocking to stream file one by one.

Stream file need to watch folder changes

Stream file function requires following improvement

  • While loop to watch folder changes with timeout
  • Need to support file filters so that we do not stream arriving files
  • Need to archive the streamed files somewhere so that we do not messed up

df-data-collector need 24hrs function

DF Demo df-data-collector can only get updated data when US stock is open. For demo purpose, we also need data available when the market is closed. We'll consider to use spoof data and also consider to add China market as another option.

Streaming to HDFS need imporvement

Current, the streamed data is saved to local file in df server first, then upload to HDFS. If the file is too big, we will not see the file in HDFS. A better way is to start writing to HDFS once the block size is reached. Later, we can merge the file together.

Metadata Logic Improvement

  • Need to update the job status in terms of metadata
  • For mongo, we can update
  • For Kafaka, we can send another message

Add filter and move option for stream files

We should add options to move the files which are processed to some archive folder so that we know the files are processed.
We also need to support filter files to be process, such as files with leading _

In this case, we can collaborate with stream generator to cosume files smoothly

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.