Giter Site home page Giter Site logo

lz0211 / wedge Goto Github PK

View Code? Open in Web Editor NEW
72.0 5.0 24.0 24.36 MB

可配置的小说下载及电子书生成工具

JavaScript 76.80% Batchfile 0.65% HTML 21.13% CSS 1.42%
ebook-downloader ebook-generator ebook crawler downloader nodejs-modules javascript-applications epub fb2 rtf

wedge's People

Contributors

lz0211 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

wedge's Issues

提一个bug?

在这个网站上http://www.23us.com/抓取时,由于网站偶尔会出问题导致无法连接,但在log文件中一样会有相关记录,然而实际保存文件却不是连续的(eg:实际存在的文件为00001.json,00002.json,00003.json,00004.json,00005.json,00015.json,中间会缺少06-14的json文件)。

下载章节的请求最后变负数了……

小说来自:

http://www.23us.com/

其中一段log记录:

[getChapters]:-1269052 quests is running, 0 quests is waiting...
[getChapters]:-1269052 quests is running, 0 quests is waiting...
[getChapters]:-1269052 quests is running, 0 quests is waiting...
[getChapters]:-1269052 quests is running, 0 quests is waiting...
[getChapters]:-1271291 quests is running, 0 quests is waiting...
[getChapters]:-1271291 quests is running, 0 quests is waiting...
[getChapters]:-1271291 quests is running, 0 quests is waiting...
[getChapters]:-1271291 quests is running, 0 quests is waiting...

起点的一旦章节数目过多好像会失效

刚刚试了一下,其中有一个一千多章失效了,其他的有一两千章的也失效了……
限免里的:神道丹尊随身带着女神皇抬棺匠
都是这种情况

testRule输出:

{ title: '神道丹尊',
  author: '孤单地飞',
  classes: '东方玄幻',
  uuid: 'b2ca589a-c4da-04ee-193c-06dc368bd5da',
  source: 'http://book.qidian.com/info/3607314',
  origin: 'http://book.qidian.com/info/3607314',
  isend: false,
  date: 1492745610721,
  brief: '绝世强者、一代丹帝凌寒为追求成神之路而殒落,万年后携《不灭天经》重生于
同名少年,从此风云涌动,与当世无数天才争锋,重启传奇之路,万古诸天我最强!\n普通
群:273857096。VIP群(只限付费读者):539195580',
  cover: 'http://qidian.qpic.cn/qdbimg/349573/3607314/180' }
[]

提交规则

plugins网址:http://www.23us.com/

index.js

module.exports = {
    "host":"www.23us.com",
    "match":[
        "www.23us.com"
    ],
    "charset":"gbk",
    "selector":require("./selector"),
    "replacer":require("./replacer")
}

replacer.json

{
  "infoPage": {
    "match": null,
    "indexPage": null,
    "footer": null,
    "bookInfos": {
      "source": null,
      "title": null,
      "author": null,
      "classes": null,
      "isend": null,
      "cover": null,
      "brief": null,
      "keywords": null
    }
  },
  "indexPage": {
    "match": null,
    "infoPage": null,
    "footer": null,
    "bookIndexs": null
  },
  "contentPage": {
    "match": null,
    "footer": null,
    "chapterInfos": {
      "title": null,
      "source": null,
      "content": "[(.]*未完待续.*[)]"
    }
  }
}

selector.json

{
  "infoPage": {
    "match": "/\\/book\\/\\d+/i.test($.location())",
    "indexPage": "$.location($('.btnlinks > a.read').attr('href'))",
    "footer": "$('.footer').length > 0",
    "bookInfos": {
      "source": "$.location()",
      "title": "$('h1').text().replace(' 全文阅读','')",
      "author": "$('th:contains(文章作者)').next('td').text()",
      "classes": "$('th:contains(文章类别)').next('td').text()",
      "isend": "$('th:contains(文章状态)').next('td').text()",
      "cover": "$.location($('a.hst img').attr('src'))",
      "brief": " $('#sidename').prev('p').html()"
    }
  },
  "indexPage": {
    "match": "/\\/html\\/\\d+\\/\\d+/i.test($.location())",
    "infoPage": "$.location($('a:contains(返回书页)').attr('href'))",
    "footer": "$('#a_footer').length > 0",
    "filter": null,
    "bookIndexs": "$('td.L a').map((i,v)=>({href:$.location($.location()+$(v).attr('href')),text:$(v).text()})).toArray()"
  },
  "contentPage": {
    "match": "/\\/html\\/\\d+\\/\\d+\\/\\d+\\.html$/i.test($.location())",
    "footer": "$('#a_footer').length > 0",
    "chapterInfos": {
      "title": "$('dd > h1').text()",
      "source": "$.location()",
      "content": "$('#contents').html()"
    }
  }
}

EBK 转TXT

有偿求助EBK格式文件转TXT格式算法。

书客的规则好像失效了

刚刚试了下这本,发现爬下来的章节内容都是undefined。

{
  "title": "本书中比企谷八幡的资料",
  "id": "00000",
  "source": "http://www.hbooker.com/chapter/book_chapter_detail/100068865",
  "date": 1492498776504,
  "content": "undefined"
}

tadu的要更新一下了

刚刚试了下,抓不到章节,然后去查了下,应该要改成这样的了:

"content": "$('#partContent').html()"

default里的www.tadu.com

请教一下章节内容规则

刚刚试着写了8站的规则,基本上需要抓的信息都抓到了,查看日志时发现章节目录列表应该是抓成功的,但只剩下章节内容没抓成功,请问一下规则哪里写错了?
测试小说:白雪的祭祀(限免的)

selector.json

{
  "infoPage": {
    "match": "/^http:\\/\\/www\\.8kana\\.com\\/book\\/\\d+\\.html$/i.test($.location())",
    "indexPage": "$.location()",
    "footer": "$('.footer').length > 0",
    "bookInfos": {
      "origin": "$.location()",
      "source": "$.location()",
      "title": "$('h2').text()",
      "author": "$('a.authorName.manColor').text()",
      "classes": "$('ul > li.navTopL_list.type_slide.navTopL_current > a').text()",
      "isend": "$('div.BookInfoCalendar_MainBtn').text()",
      "cover": "$.location($('div.left.bookContainImgBox > a > img').attr('src'))",
      "brief": "$('a > p').html()",
      "keywords": "[this.title($),this.author($),this.classes($)].concat($().map((i,x)=>$(x).text()).toArray()).join(',')"//不知道用途,复制自起点的
    }
  },
  "indexPage": {
    "match": "/^http:\\/\\/www\\.8kana\\.com\\/book\\/\\d+\\.html$/i.test($.location())",
    "infoPage": "$.location()",
    "footer": "$('.footer').length > 0",
    "filter": "$('#btnChapterMore').remove()",
    "bookIndexs": "$('#chapter_con').find('a').map((i,v)=>({href:$.location($(v).attr('href')),text:$(v).text().trim()})).toArray()"
  },
  "contentPage": {
    "match": "/^http:\\/\\/www\\.8kana\\.com\\/read\\/\\d+\\.html$/i.test($.location())",
    "footer": "$('.footer').length > 0",
    "filter": "$('.myContent').find('span').remove()",
    "chapterInfos": {
      "title": "$('h2').text()",
      "source": "$.location()",
      "content": "$('.myContent').html()"
    }
  }
}

部分日志记录:

getBookMeta
createBook
checkBookCover
getBookIndexs
[getBookIndexs]:1 quests is running, 0 quests is waiting...
[getBookIndexs]:0 quests is running, 0 quests is waiting...
getChapters
[getChapters]:1 quests is running, 41 quests is waiting...
getChapter
………………………………
[getChapters]:16 quests is running, 0 quests is waiting...
getChapter
[getChapters]:15 quests is running, 0 quests is waiting...
getChapterContent
[getChapters]:14 quests is running, 0 quests is waiting...
[getChapters]:13 quests is running, 0 quests is waiting...
[getChapters]:12 quests is running, 0 quests is waiting...
[getChapters]:11 quests is running, 0 quests is waiting...
[getChapters]:10 quests is running, 0 quests is waiting...
[getChapters]:9 quests is running, 0 quests is waiting...
[getChapters]:8 quests is running, 0 quests is waiting...
[getChapters]:7 quests is running, 0 quests is waiting...
[getChapters]:6 quests is running, 0 quests is waiting...
[getChapters]:5 quests is running, 0 quests is waiting...
[getChapters]:4 quests is running, 0 quests is waiting...
[getChapters]:3 quests is running, 0 quests is waiting...
[getChapters]:2 quests is running, 0 quests is waiting...
[getChapters]:1 quests is running, 0 quests is waiting...
[getChapters]:0 quests is running, 0 quests is waiting...
end...

起点的偶尔会有章节内容抓取过多的现象

稍微看了看,抓多的几乎都是作者开的非正文单章(一般都是求月票或者请假什么的)。
json示例:

{
  "title": "开单章拉月票!!!",
  "id": "00149",
  "source": "http://vipreader.qidian.com/chapter/1255901/24891099",
  "date": 1492925305318,
  "content": "<img src=\"//qidian.qpic.cn/qdbimg/349573/1255901/300\" width=\"300\" height=\"300\">\n手机阅读\n扫描下载起点读书客户端\n<img src=\"00149/00000.png\">\n最近阅读\n快速导航\n分类频道\n其它\n\n\n\n\n点击书签后,可收藏每个章节的书签,“阅读进度”可以在个人中心书架里查看\n开单章拉月票!!!\n179字\n2009.09.02 10:01\n写书难,写好书更难!!每个VIP的作者拉月票的都很正常!!也都有这个资格。所以,老安就来拉月票了。\n老安素来信奉与人为善的做人原则。但不知自己哪里得罪了一些人,老是挂马甲来我书评区捣乱,对本书进行打压。\n还请各位善意或者恶意的人,对本书“笔下留情”,适当批评是可以的,但谩骂,还外带侮辱作者的智商,就有些太过了。\n老安,以后每天力求9000字保证,在此求月票!!!\n|\n|\n目录\n设置\n手机阅读\n加入书架\n返回书页\n起点游戏\n打赏\n投票\n评论"
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.