eric1688 Goto Github PK
Type: User
Type: User
"结巴"中文分词的C++版本
各大电商网站数据抓取分析
hadoop_storm_spark结合实验的例子,模拟淘宝双11节,根据订单详细信息,汇总出总销售量,各个省份销售排行,以及后期的SQL分析,数据分析,数据挖掘等。 --------大概流程------- 第一阶段(storm实时报表) 第二阶段(离线报表)第三阶段(大规模订单即席查询,和多维度查询) 第四阶段(数据挖掘和图计算)
1. 主要分为三个模块,一个爬虫抓取模块,一个是数据处理模块,一个是用户模块。 2. 爬虫抓取模块主要是从直播吧、新浪体育、网易体育上爬取有关足球的新闻和用户关于足球的评论,利用集群HADOOP抓取网页,分析得出URL集,提取特征URL 3. 网页linux脚本过滤得到原始网页,然后二次过滤得到文本,并使用分布式储存。 4. 处理模块主要是根据训练集规则一和规则二,得到分词器,然后对文本进行操作,得出训练结果。 5. 通过特征脚本得到训练结果的特征词分类,然后提取出球队模糊集和球星模糊集。 6. 过滤得到球队精确集和球星精确集,并存入MYSQL数据库。 7. 从数据库中提取球星和球队的信息进行图表分析,并动态显示WIKI信息,调入显示模块中和用户进行交换
High performance chinese tokenizer with both GBK and UTF-8 charset support developed by ANSI C
Giggle是一个开源的淘宝商家展示产品的社区。
商户推荐算法demo。仅仅够参考
Sphinx for Chinese
Sphinx search server
Sreg可对使用者通过输入email、phone、username的返回用户注册的所有互联网护照信息。
yincart is a online shop system which base on yiiframework
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.