Giter Site home page Giter Site logo

hanc00l / wooyun_public Goto Github PK

View Code? Open in Web Editor NEW
4.4K 226.0 1.9K 1.97 MB

This repo is archived. Thanks for wooyun! 乌云公开漏洞、知识库爬虫和搜索 crawl and search for wooyun.org public bug(vulnerability) and drops

Home Page: http://www.wooyun.org

Python 27.35% JavaScript 6.99% HTML 10.03% Shell 0.14% PHP 46.65% CSS 2.84% Hack 6.00%

wooyun_public's Introduction

wooyun_public

乌云公开漏洞、知识库爬虫和搜索 crawl and search for wooyun.org public bug(vulnerability) and drops

1.wooyun公开漏洞爬虫版

index search

源自2016年6月底使用scrapy爬虫爬取的4W公开漏洞和知识库,虚拟机为ubuntu14.04,采用python2+mongodb+flask(tornado)和Elasticsearch搜索引擎。

虚拟机下载地址:

https://pan.baidu.com/s/1HkR4ggvAwTikshsjcKYBuA ,提取密码:8wnb(2018.4.23更新)

安装、使用指南(点我)

2.wooyun公开漏洞纪念版

index_final search_final

漏洞信息和代码来自于m0l1ce的wooyun_all_bugs_8.8W,包含8.8W漏洞信息(不含知识库);搜索和漏洞信息代码位于wooyun_final,对代码进行了部份修改:

  • 修改了搜索功能,支持多关键字搜索和在漏洞详情中搜索
  • 修改了离线图片文件的位置,搜索结果直接使用虚拟机中的离线图片
  • 修改了代码以适应PHP5.6及更新版本

虚拟机为ubuntu16.04,PHP5.6+MySQL5.7+Apache2。虚拟机下载地址: https://pan.baidu.com/s/1qYRqa3U 密码: w8vb (2017.7.4)

虚拟机用户名、密码为hancool/qwe123

3.其它

  • 本程序只用于技术研究和个人使用,程序组件均为开源程序,漏洞和知识库来源于乌云公开漏洞,版权归wooyun.org。

wooyun_public's People

Contributors

hanc00l avatar secwsstest avatar z-y00 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wooyun_public's Issues

关于scrapy的问题

打算用4w的那个虚拟机去爬那个8w的虚拟机,以达到完全复原,有什么需要注意的事情?或者任何建议?我不大确定你scrapy里的设置,我还在阅读你的代码

找不到wooyun_publich目录

我下载了2017年07月04日的虚拟机。
登陆进去找不到那些目录。。。想请问是不是进入wooyun_publich目录下的flask,运行./app.py,启动web服务就行了
但是那个目录在哪

使用elasticsearch后只能搜索出部分结果

按照markdown里面的步骤,每一项都配置成功,中间也没有报错。但是搜索时,只显示2015年之前的文章和漏洞。把elasticsearch的进程kill掉后,虽然搜素速度变慢,但是却能搜索出2016年的内容。莫非是在同步elasticsearch数据时出了问题?如何能重新同步呢?

mongodb数据库

不好意思,请问可以单独上传一份mongo数据库文件吗,那个镜像文件也是解压crc出错,但是可以运行。想把数据库文件导出来,但是连接不上,无法删除mongod.lock,提示read-only file system,google了大半天也没有解决。。

mongo-connector 同步到ealasicsearch问题

当不修改vi /usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic2_doc_manager.py
页面的时候,只同步了几分钟,Logging to mongo-connector.log就中断了,bugs只有500个,drops文章一篇也没有.

然后我按照所说修改了
sudo vi /usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic2_doc_manager.py
将:
self.elastic = Elasticsearch(hosts=[url],**kwargs.get('clientOptions', {}))

修改为:
self.elastic = Elasticsearch(hosts=[url],timeout=200, **kwargs.get('clientOptions', {}))

删除Elasticsearch data 下的目录, 然后重启服务 service mongodb restart 普通账户运行 elasticsearch-2.3.4/bin/elasticsearch -d

当我sudo mongo-connector -m localhost:27017 -t localhost:9200 -d elastic2_doc_manager
的时候,我发现Logging to mongo-connector.log同步是并不能像您所说的那样,大概完全同步需要30分钟,我只进行了几分钟,导致没有同步完全,bugs只有11000个,drops文章一篇也没有

然后就去看看mongo-connector 同步时的日志,谷歌了一遍没有找到解决办法..........心酸,然后又去修改
将:
self.elastic = Elasticsearch(hosts=[url],timeout=200, **kwargs.get('clientOptions', {}))

修改为:
self.elastic = Elasticsearch(hosts=[url],timeout=20000, **kwargs.get('clientOptions', {}))

删除Elasticsearch data下的目录 重新进行同步,同样只同步了几分钟,Logging to mongo-connector.log就中断了,这次bugs只有9500个,drops文章一篇也没有

cat mongo-connector.log 的日志如下:

cat mongo-connector.log 
2016-11-21 01:44:43,039 [CRITICAL] mongo_connector.oplog_manager:630 - Exception during collection dump
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 583, in do_dump
    upsert_all(dm)
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 567, in upsert_all
    dm.bulk_upsert(docs_to_dump(namespace), mapped_ns, long_ts)
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 32, in wrapped
    return f(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic2_doc_manager.py", line 229, in bulk_upsert
    for ok, resp in responses:
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 162, in streaming_bulk
    for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 87, in _process_bulk_chunk
    resp = client.bulk('\n'.join(bulk_actions) + '\n', **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 69, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 785, in bulk
    doc_type, '_bulk'), params=params, body=self._bulk_body(body))
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 327, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 112, in perform_request
    raw_data, duration)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 62, in log_request_success
    body = body.decode('utf-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
MemoryError
2016-11-21 01:44:43,054 [ERROR] mongo_connector.oplog_manager:638 - OplogThread: Failed during dump collection cannot recover! Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, replicaset=u'rs0'), u'local'), u'oplog.rs')
2016-11-21 01:44:44,009 [ERROR] mongo_connector.connector:304 - MongoConnector: OplogThread <OplogThread(Thread-2, started 140037811336960)> unexpectedly stopped! Shutting down


cat oplog.timestamp 里面什么也没有

求辅助!!!!!!!!!!

另,小问题

我在原来的上面升级,到了导数据那步会卡住,已经修改了超时时间,一次2小时,一次6小时,都没有倒完,然后重新下虚拟机了,不知道是不是我个人的问题,服务器上跑的,8核,8g内存,性能应该不是瓶颈

心下虚拟机,启动es的时候提示有问题,也不让用root跑,把es的文件夹全部重新给hancool以后,可以跑起来了

密码问题

请问ubuntu login和pw 是不是就是hancool/qwe123呀?如果是的话请问为什么登入不了

虚拟机里只有20条数据

"虚拟机1:在2016年6月底爬的wooyun全部漏洞库和知识库内容"

与readme里的描述不符啊

mongo里也只有一点数据100条+20条

一搜索就报错 状态码500

Internal Server Error

The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
咋搞啊.......

一点小问题

unzip elasticsearch-analysis-ik-1.9.4.zip
这一步应该是
unzip elasticsearch-analysis-ik-1.9.4.zip -d elasticsearch-analysis-ik
不然会解压到当前目录一堆东西

关于“[Errno 32] Broken pipe”的解决

app.py在运行中,可能会出现以下异常:

Exception happened during processing of request from ('172.16.80.1', 52437)
Traceback (most recent call last):
File "/usr/lib/python2.7/SocketServer.py", line 593, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib/python2.7/SocketServer.py", line 334, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python2.7/SocketServer.py", line 651, in init
self.finish()
File "/usr/lib/python2.7/SocketServer.py", line 710, in finish
self.wfile.close()
File "/usr/lib/python2.7/socket.py", line 279, in close
self.flush()
File "/usr/lib/python2.7/socket.py", line 303, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])

error: [Errno 32] Broken pipe

发生这个错误的原因为:服务器在接受一个请求,如果处理时间较长,当服务器还没有处理完而客户端中断了连接时(比如关闭了浏览器),由于flask server在发送数据时没有确认连接状态而直接进行flush,就会导致发生这个错误。这个异常只会影响服务端的单个请求,不会影响并行的其它请求。

感兴趣的同学可以参见stackoverflow上的两个贴子(链接见下面)。flask是一个推荐用于小型产品或开发环境的web框架,如果是具有高并发的环境,可能就不适合了,需要使用其它web框架。

根据stackoverflow上的方法,可以采用app.run(threaded=True),但并不能从根本上解决Broken pipe的异常发生。

http://stackoverflow.com/questions/12591760/flask-broken-pipe-with-requests
http://stackoverflow.com/questions/31265050/how-to-make-an-exception-for-broken-pipe-errors-on-flask-when-the-client-discon

查询漏洞的时候超过500页后的漏洞就会出现 Internal Server Error的解决

elasticsearch默认的最大分页数是10000条记录(也就是500页),因此超过500页会报错。解决办法:
在命令行下执行

curl -XPUT "http://localhost:9200/wooyun/_settings" -d '{ "index" : { "max_result_window" : 500000 } }'

修改默认最大分页记录。

参考链接:http://stackoverflow.com/questions/35206409/elasticsearch-2-1-result-window-is-too-large-index-max-result-window

即将完成ealasicsearch搜索的集成

mongodb搜索的确慢得难以忍受了,所以,这几天一直在研究用elasticsearch引擎来快速搜索内容,经过测试已完成了配置和代码的测试,配置文档和虚拟机准备在这个周末会上传到网上。
要使用elasticsearch搜索,可以直接下载即将打包好的虚拟机(包含wooyun网站的数据),或者根据我测试的配置文档自己手动配置。
家里网速实在太慢了,所以争取周末能传完虚拟机吧。

欢迎大家一起探讨和交流。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.