Giter Site home page Giter Site logo

wechat's Introduction

华尔街和新浪财经新闻采集文档

一.技术选型

1.requests作为爬虫程序处理网络请求的框架

2.Beautiful作为解析网页的dom解析器

3.数据库用postgresql,自动化测试工具使用的是谷歌浏览器的测试软件chromedriver

二.开发环境

1.下载安装python3.4,debian8最新发行版自带python3.4版本,可通过在终端输入python3测试;若系统没有python3.4,使用以下命令安装:

# apt-get install python3.4

2.安装python包管理工具pip

# apt-get install python3-pip

3.下载安装pandas、python-dateutil、numpy、requests、bs4、selenium、psycopg2、lxml、tushare、forex_python、sqlalchemy包

# apt-get install python3-pandas
# pip3 install python-dateutil
# pip3 install numpy
# pip3 install requests
# pip3 install bs4
# pip3 install selenium
# pip3 install psycopg2
# pip3 install lxml
# pip3 install tushare
# pip3 install forex_python
# pip3 install sqlalchemy

4.下载谷歌浏览器,终端输入以下命令

# wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
# dpkg -i google-chrome*; sudo apt-get -f install

下载与当前谷歌浏览器对应的chromedriver,下载地址如下(二选一): https://npm.taobao.org/mirrors/chromedriver/ http://chromedriver.storage.googleapis.com/index.html

谷歌浏览器对应的chromdriver版本: http://blog.csdn.net/huilan_same/article/details/51896672

5.配置postgresql数据库

# apt-get install -y postgresql-9.4 postgresql-client-9.4 postgresql-contrib-9.4 postgresql-server-dev-9.4
  1. 安装postgresql图形化客户端
# apt-get install pgadmin3

三.前期准备

1.创建数据库表:在pgadmin III使用如下表语句创建table

新闻表:

CREATE TABLE public.news_cj(
        news_date character varying(20),
        spider_data timestamp,
        news_source text,
        news_type text,
        news text
);
字段名 字段说明
news_data 新闻日期
spider_data 操作时间
news_source 新闻来源
news_type 新闻类型
news 新闻内容
2.晨报相关指数信息及涨跌幅数据库:middle_news_market会自动创建
字段名 字段说明
code 指数代码
name 指数名称
change 涨跌幅
open 字段说明
preclose 开盘点位
close 收盘点位
high 最高点位
low 最低点位
volume 成交量(手)
amount 成交金额(亿元)
3.美元指数信息表
CREATE TABLE public.stock_code (
        stock_time character(20),
        spider_data timestamp,
        stock_name bpchar(5) NULL,
        stock_price numeric(5,4) NULL,
        stock_applies numeric(5,4) NULL
);
字段名 字段说明
stock_time 行情时间
spider_data 操作时间
stock_name 指数名称
stock_price 指数价格
stock_applies 涨跌幅

四、项目工程目录

系统各个模块之间的联系如下:

  • common/pgutils.py:主要连接postgresql数据库
  • db_init/morningdata.sql:主要是数据库建表语句
  • config.py:调用配置文件
  • index_read_time.py:主要是通过第三方包对相关指数信息及涨跌幅的信息进行采集
  • new_cj.py:主要是对新浪财经及华尔街见闻进行数据采集
  • settings.conf.template:主要是项目的一些配置文件模板,需拷贝至当前目录配置环境
  • logging.conf.template:主要是项目的日志文件模板
  • task_morning.sh:shell脚本执行程序

wechat's People

Contributors

reainl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

wechat's Issues

运行wechat_friend.py,提示未定义config模块

"C:\Program Files\Anaconda3\envs\VNPY\python.exe" C:/VNPY/study/Python-web-scraping-master/wechat-master/pro_script/wechat_friend.py
Traceback (most recent call last):
File "C:/VNPY/study/Python-web-scraping-master/wechat-master/pro_script/wechat_friend.py", line 12, in
from config import logger_path
ImportError: No module named 'config'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.