Giter Site home page Giter Site logo

recommendsys's Introduction

recommendSys

  • 推荐系统
  • 离线计算和实时计算

本项目主要分为WEB(产生数据)、离线和实时三大模块

WEB(产生数据即用户的行为数据)

  1. 用户对物品的操作(查看,浏览,购买)ugcLOG
  2. 通过flume采集ugcLOG日志到HDFS

离线处理(hadoop+mahout)基于用户和物品的协同过滤

  1. 通过定时(oozie、crontab)任务(mr),处理HDFS上面的ugcLOG

  2. 清理后的数据(用户id,itemID,评分),给mahout

  3. mahout清理之后就是每个用户对应的item物品列表

  4. 清洗后的结果数据,然后通过sqoop导入到数据库mysql中或者放入到hive中(web展现或者交给数据分析人员)

  5. 当天的数据:当日凌晨截至到统计时间点的数据

  6. 之前的历史数据:截至到今天凌晨的历史数据

实时处理(kafka+stome)基于用户和物品标签

  1. 收集:收集用户的特征向量(用户和标签的矩阵),(userID tag1 tag2)

  2. 收集:收集物品的特征向量(物品和标签的矩阵),(itemID tag1 tag2 tag5)

  3. 计算:然后通过1,2计算出用户和物品的特征值(矩阵乘积)

  4. 过滤:通过userID item列表过滤掉已经产生行为的物品/通过运营决策过滤/用户自定义过滤

  5. 排序:topN(包括自定义权重,比如想在周末推销某个产品等)

  6. 通过web收集特征行为数据(用户标签,评论数据)

  7. 把收集的数据实时传入kafka

  8. 特征行为数据和用户属性特征数据(数据库)共同组装成用户特征向量

  9. 用户特征向量和物品的特征矩阵(用户和系统打的标签,权重等)计算出矩阵乘积

  10. 过滤,计算topN

博客地址

小小默:http://blog.xiaoxiaomo.com

recommendsys's People

Contributors

jasontangxd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recommendsys's Issues

请教

我如果把 spark hadoop zookeeper等一系列组件弄成伪分布安装 您这个代码能work吗? 还有想问 您这个项目是在哪学的啊 新人就看代码摸不着头脑 求指点 我对推荐系统很感兴趣

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.