Giter Site home page Giter Site logo

ma6254 / fictiondown Goto Github PK

View Code? Open in Web Editor NEW
691.0 12.0 137.0 470 KB

小说下载|小说爬取|起点|笔趣阁|导出Markdown|导出txt|转换epub|广告过滤|自动校对

License: GNU General Public License v3.0

Go 99.21% Makefile 0.79%
biquge qidian fiction novels spider crawler golang

fictiondown's Issues

example failed

donwload release and install phantomjs, then run example:

$ ./FictionDown --url https://book.qidian.com/info/3249362 d 
2019/03/11 16:11:02 Init PhantomJS
2019/03/11 16:11:03 URL: "https://book.qidian.com/info/3249362"
2019/03/11 16:11:03 Close PhantomJS
2019/03/11 16:11:03 failed
$ phantomjs --version
1.9.8

生成多平台的可执行文件

Flag --rm-dist has been deprecated, please use --clean instead
• starting release...
⨯ release failed after 0s error=yaml: unmarshal errors:
line 41: field replacements not found in type config.Archive

加个规则

www.zhuishubang.com

Code

package site

import (
	"fmt"
	"io"
	"net/url"
	"strings"

	"github.com/antchfx/htmlquery"
	"github.com/ma6254/FictionDown/store"
	"golang.org/x/text/encoding/simplifiedchinese"
	"golang.org/x/text/transform"
)

type wwwZhuishubangCom struct {
}

func (b *wwwZhuishubangCom) BookInfo(body io.Reader) (s *store.Store, err error) {
	body = transform.NewReader(body, simplifiedchinese.GBK.NewDecoder())
	doc, err := htmlquery.Parse(body)
	if err != nil {
		return
	}

	s = &store.Store{}

	// Book Name
	node_title := htmlquery.Find(doc, `//div[@class="bookPhr"]/h2`)
	if len(node_title) == 0 {
		err = fmt.Errorf("No matching title")
		return
	}
	s.BookName = htmlquery.InnerText(node_title[0])

	// Description
	node_desc := htmlquery.Find(doc, `//*[@class="introCon"]/p`)
	if len(node_desc) == 0 {
		err = fmt.Errorf("No matching desc")
		return
	}
	s.Description = strings.Replace(
		htmlquery.OutputHTML(node_desc[0], false),
		"<br/>", "\n",
		-1)

	// Author
	var author = htmlquery.Find(doc, `//div[@class="bookPhr"]/dl/dd`)
	s.Author = htmlquery.OutputHTML(author[0], false)

	// Contents
	node_content := htmlquery.Find(doc, `//div[@class="chapterCon"]/ul/li/a`)
	if len(node_desc) == 0 {
		err = fmt.Errorf("No matching contents")
		return
	}

	var vol = store.Volume{
		Name:     "正文",
		Chapters: make([]store.Chapter, 0),
	}

	//for  _, v := range node_content {
  for idx:=len(node_content)-1;idx>=0;idx--{
    v:=node_content[idx]
		//fmt.Printf("href: %v\n", chapter_u)
		chapterURL, err := url.Parse(htmlquery.SelectAttr(v, "href"))
		if err != nil {
			return nil, err
		}

		vol.Chapters = append(vol.Chapters, store.Chapter{
			Name: strings.TrimSpace(htmlquery.InnerText(v)),
			URL:  chapterURL.String(),
		})
	}
	s.Volumes = append(s.Volumes, vol)

	s.CoverURL = htmlquery.SelectAttr(htmlquery.FindOne(doc, `//*[@class="bookImg"]/img`), "src")

	return
}

func (b *wwwZhuishubangCom) Chapter(body io.Reader) ([]string, error) {
	body = transform.NewReader(body, simplifiedchinese.GBK.NewDecoder())
	doc, err := htmlquery.Parse(body)
	if err != nil {
		return nil, err
	}

	M := []string{}
	//list
	// nodeContent := htmlquery.Find(doc, `//div[@id="content"]/text()`)
	nodeContent := htmlquery.Find(doc, `//div[@class="articleCon"]/p/text()`)
	if len(nodeContent) == 0 {
		err = fmt.Errorf("No matching content")
		return nil, err
	}
	for _, v := range nodeContent {
		t := htmlquery.InnerText(v)
		t = strings.TrimSpace(t)

		switch t {
		case
			"本↘书↘首↘发↘追↘书↘帮↘http://m.zhuishubang.com/",
			"":
			continue
		}

		M = append(M, t)
	}

	return M, nil
}

win10管理员运行程序内存溢出

FictionDown.exe -i .\一念永恒-耳根-起点中文网.FictionDown s -k 一念永恒 -p
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x18 pc=0x9acbec]

goroutine 1 [running]:
github.com/ma6254/FictionDown/site.Type1SearchAfter.func1(0xc00007a0c0, 0xc, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/runner/work/FictionDown/FictionDown/site/sites.go:200 +0x24c
github.com/ma6254/FictionDown/site.Search(0xc00007a0c0, 0xc, 0xc000163980, 0xc00007a0c0, 0xc, 0xc0000a57a0, 0x4efadf)
/home/runner/work/FictionDown/FictionDown/site/site.go:238 +0x13c
main.glob..func6(0xc0000b8dc0, 0x100, 0xc0000b8dc0)
/home/runner/work/FictionDown/FictionDown/search.go:33 +0x7f
github.com/urfave/cli.HandleAction(0xa2c980, 0xb279f0, 0xc0000b8dc0, 0xc000163900, 0x0)
/home/runner/go/pkg/mod/github.com/urfave/[email protected]/app.go:490 +0xcf
github.com/urfave/cli.Command.Run(0xaf472e, 0x6, 0x0, 0x0, 0x1118610, 0x1, 0x1, 0xb00290, 0x12, 0x0, ...)
/home/runner/go/pkg/mod/github.com/urfave/[email protected]/command.go:210 +0x99d
github.com/urfave/cli.(*App).Run(0x1121fc0, 0xc0000ae000, 0x7, 0x8, 0x0, 0x0)
/home/runner/go/pkg/mod/github.com/urfave/[email protected]/app.go:255 +0x6b6
main.main()
/home/runner/work/FictionDown/FictionDown/main.go:87 +0x125
image

[Enhanced] 顶点小说网域名更新,Xpath不需要变动

www.booktxt.net 301 跳转到 www.ddxstxt8.com

  • 章节目录结构Xpath等均不需要变动
  • https://github.com/ma6254/FictionDown/blob/35edca3576102a93f6c2a894e9b232155cbf92e5/sites/booktxt_net/main.go下的
		Match: []string{
			`https://www\.booktxt\.net/\d+_\d+/*`,
			`https://www\.booktxt\.net/\d+_\d+/\d+\.html/*`,
			`http://www\.booktxt\.net/book/goto/id/\d+`,
		},

需要替换为301跳转域名

chromedp更新了

chromedp更新后,方法名改了,所有调用chromedp的地方基本全不行了

无法下载,起点

您好,我是第一次用这个FictionDown,我想用它下载"诡秘之主",带到kindle上二刷.

由于没有用过go语言相关程序,我又害怕出错,所有我的安装方式为:
1.打开了v2rayNG翻墙
2.下载安装了go语言支持(.msi for amd64)
3.go env -w GO111MODULE=on
4.go env -w GOPROXY=https://goproxy.cn,direct
5.go get -v github.com/ma6254/FictionDown@latest

似乎是安装成功了
image

然而接下来无论是我尝试搜索
image

还是我尝试提供网站直接下载
image

似乎运行并不正常,是我没安装好吗?

平台:
win10 企业版 LTSC 1809(os内部版本 17763.1637)
go version go1.15.6 windows/amd64

2021.1.9

感谢您的项目,提一个小小的建议

q(≧▽≦q)感谢您的项目,解决了在下的痛。
但是提个小建议:能否在发布release时给文件签名呢?
(目的:

  1. 防止您的权益受到侵害,毕竟国内有很多无良,从github上盗窃项目,套层壳然后收费售卖......
  2. 不知道您有没有用过kms pico呢?作者发布在一个国外的论坛上(被墙了),然后很多人建立仿站在文件里藏上挖矿⛏病毒......希望您能签名......最好给一个MD5码哦~)
    最后的最后,再次感谢orz!

dep ensure failed

$ dep ensure -v
(1/12) Wrote github.com/benbjohnson/phantomjs@master
(2/12) Wrote github.com/gofrs/[email protected]
(3/12) Wrote github.com/bmaupin/[email protected]
(4/12) Wrote golang.org/x/[email protected]
(5/12) Wrote github.com/go-yaml/[email protected]
(6/12) Wrote github.com/mattn/[email protected]
(7/12) Failed to write golang.org/x/net@master
(8/12) Failed to write golang.org/x/sys@master
(9/12) Failed to write gopkg.in/cheggaaa/[email protected]
(10/12) Failed to write github.com/antchfx/[email protected]
(11/12) Failed to write github.com/antchfx/xpath@master
(12/12) Failed to write github.com/urfave/[email protected]
grouped write of manifest, lock and vendor: error while writing out vendor tree: failed to write dep tree: failed to export golang.org/x/net: fatal: failed to unpack tree object 3a22650c66bd7f4fb6d1e8072ffd7b75c8a27898
: exit status 128

$ dep version
dep:
 version     : devel
 build date  : 
 git hash    : 
 go version  : go1.9.4
 go compiler : gc
 platform    : linux/amd64
 features    : ImportDuringSolve=false
$ go version
go version go1.11 linux/amd64

runtime error 搜索各站点时出现运行时错误

R:\down\FictionDown_0.1.3_Windows_x86_64.tar>FictionDown s -k '赛博剑仙铁雨'
2021/09/04 00:37:18 搜索站点: 新八一中文网 https://www.81new.net/ 404 404 Not Found
2021/09/04 00:37:19 搜索站点: 结果: 0 笔趣阁1 https://www.biquge5200.cc/
2021/09/04 00:37:20 搜索站点: 结果: 0 起点中文网 https://www.qidian.com/
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x18 pc=0x9acbec]

goroutine 1 [running]:
github.com/ma6254/FictionDown/site.Type1SearchAfter.func1(0xc00002c340, 0x14, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/runner/work/FictionDown/FictionDown/site/sites.go:200 +0x24c
github.com/ma6254/FictionDown/site.Search(0xc00002c340, 0x14, 0xc00013d9e0, 0xc00002c340, 0x14, 0xc0000897a0, 0x4efadf)
        /home/runner/work/FictionDown/FictionDown/site/site.go:238 +0x13c
main.glob..func6(0xc000092f20, 0x0, 0xc000092f20)
        /home/runner/work/FictionDown/FictionDown/search.go:33 +0x7f
github.com/urfave/cli.HandleAction(0xa2c980, 0xb279f0, 0xc000092f20, 0xc00013d900, 0x0)
        /home/runner/go/pkg/mod/github.com/urfave/[email protected]/app.go:490 +0xcf
github.com/urfave/cli.Command.Run(0xaf472e, 0x6, 0x0, 0x0, 0x1118610, 0x1, 0x1, 0xb00290, 0x12, 0x0, ...)
        /home/runner/go/pkg/mod/github.com/urfave/[email protected]/command.go:210 +0x99d
github.com/urfave/cli.(*App).Run(0x1121fc0, 0xc000054100, 0x4, 0x4, 0x0, 0x0)
        /home/runner/go/pkg/mod/github.com/urfave/[email protected]/app.go:255 +0x6b6
main.main()
        /home/runner/work/FictionDown/FictionDown/main.go:87 +0x125

软件版本: v0.1.3
运行环境: windows x64, linux x64
网络环境: 海外IP

稳定复现

Windows下通过pandoc转换输出epub发生错误

环境

软件版本:commit 1c10eae tag: v0.1.3
Pandoc版本:

PS C:\Users\mjc\git\FictionDown\release> pandoc -v
pandoc.exe 2.9.2
Compiled with pandoc-types 1.20, texmath 0.12.0.1, skylighting 0.8.3.2
Default user data directory: C:\Users\mjc\AppData\Roaming\pandoc
Copyright (C) 2006-2019 John MacFarlane
Web:  https://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.

操作系统:任意Windows版本

复现方法

PS C:\Users\mjc\git\FictionDown\release> .\FictionDown.exe -i .\诡秘之主-爱潜水的乌贼-笔趣阁1.FictionDown conv -f epub
2020/02/19 00:05:36 Loading cache file: .\诡秘之主-爱潜水的乌贼-笔趣阁1.FictionDown
2020/02/19 00:05:36 Start Conversion: Format:"epub" OutPath:"诡秘之主.epub"
2020/02/19 00:05:36 Save Cover Image: "C:\\Users\\mjc\\AppData\\Local\\Temp\\book_cover_126653631.jpg"
2020/02/19 00:05:40 中间文件转换完成: "诡秘之主.epub.md"
2020/02/19 00:05:40 调用Pandoc: "C:\\ProgramData\\chocolatey\\bin\\pandoc.exe" []string{"pandoc", "--epub-chapter-level", "2", "-o", "诡秘之主.epub", "诡秘之主.epub.md"}   
pandoc.exe: C:_cover_126653631.jpg: openBinaryFile: does not exist (No such file or directory)
exit status 1

或者

PS C:\Users\mjc\git\FictionDown\release> pandoc -o a.epub 诡秘之主.md
pandoc.exe: C:_cover_703999991.jpg: openBinaryFile: does not exist (No such file or directory)

MetaData部分

title: 诡秘之主
description: |-
  蒸汽与机械的浪潮中,谁能触及非凡?历史和黑暗的迷雾里,又是谁在耳语?我从诡秘中醒来,睁眼看见这个世界:
  枪械,大炮,巨舰,飞空艇,差分机;魔药,占卜,诅咒,倒吊人,封印物……光明依旧照耀,神秘从未远离,这是一段“愚者”的传说。
creator: 爱潜水的乌贼
lang: zh-CN
cover-image: C:\Users\mjc\AppData\Local\Temp\book_cover_703999991.jpg

推测为Pandoc和go-yaml的YAML实现不一致导致

已向Pandoc提交Issue:jgm/pandoc#6150

无法读取起点章节,内容为空

bookurl: https://book.qidian.com/info/1025813823/
bookname: 仙朝纪元
author: 西城冷月
coverurl: https://bookcover.yuewen.com/qdbimg/349573/1025813823/180
description: |-
旧世之末,余火回光!
龙蛇起陆的仙道盛景、缱绻多情的绝代佳人,春色绚烂下,是那腐朽的灰败。
仙人在沉沦中徘徊,旧神在欲望中复苏……
建仙朝、铸仙鼎,口含天宪,言出法随,叫那天地换个新纪元!
这是一个幽幽长夜之内,一点星火乍起,煦照九天十地,三界六道……的故事。
tmap: []
volumes:

  • name: 作品相关
    isvip: false
    chapters: []
  • name: 潜龙勿用
    isvip: false
    chapters: []
  • name: 潜龙勿用
    isvip: true
    chapters: []
  • name: 见龙在田
    isvip: true
    chapters: []
  • name: 终日乾乾
    isvip: true
    chapters: []

自定义书源

有个未实现的功能就是自定义书源,这个刚好能用上。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.