Giter Site home page Giter Site logo

qq-chat-history's Introduction

QQ 聊天记录提取器

简介

从 QQ 聊天记录文件中提取聊天信息,仅支持 txt 格式的聊天记录。

安装

使用 pip 安装,要求 Python 3.9 或以上版本。

> pip install -U qq-chat-history

使用

最简单的启动方式如下,它会自动在当前目录下创建 output.json 进行输出(如果安装到虚拟环境请确保已激活)。

> qq-chat-history /path/to/file.txt

启动时输入 --help 参数查看更多配置项。

> qq-chat-history --help

或者,可以作为一个第三方库使用,如下:

import qq_chat_history

lines = '''
=========
假装我是 QQ 自动生成的文件头
=========

1883-03-07 11:22:33 A<[email protected]>
Text A1
Text A2

1883-03-07 12:34:56 B(123123)
Text B

1883-03-07 13:24:36 C(456456)
Text C

1883-03-07 22:00:51 A<[email protected]>
Text D
'''.strip().splitlines()

# 这里的 lines 也可以是文件对象或者以字符串或者 Path 对象表示的文件路径。
for msg in qq_chat_history.parse(lines):
    print(msg.date, msg.id, msg.name, msg.content)

注意 parse 方法返回的是一个 Body 对象,一般以 Iterable[Message] 的形式使用。当然 Body 也提供了几个函数,虽然一般也没什么用

Tips

  • 如果当作一个第三方库来用,例如 find_xxx 方法,可以从数据中查找指定 idname 的消息;save 方法可以将数据以 yamljson 格式保存到文件中,虽然这个工作一般都直接以 CLI 模式启动来完成。

  • 函数 parse 可以处理多样的类型。

    • Iterable[str]:迭代每行的可迭代对象,如 listtuple 等。
    • TextIOBase:文本文件对象,如用 open 打开的文本文件,或者 io.StringIO 都属于文本文件对象。
    • str, Path:文件路径,如 ./data.txt

    这些参数都将以对应的方法来构造 Body 对象。

qq-chat-history's People

Contributors

hikariyo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

qq-chat-history's Issues

部分聊天记录解析异常

Hi! 首先我很感谢这个项目,给我自己处理数据帮了忙。但是在我使用的过程中发现似乎部分正常的聊天记录,解析的时候会有些异常。具体如下所示:

  1. 消息中如果是个纯日期,无法被正确解析

输入

2021-11-30 22:30:36 username_hidden(123456789)
go 的这个语法比较特殊,是 month, day, hour, minute, second, year

2021-11-30 22:30:21 username_hidden(123456789)
2006-01-02 15:04:05

输出

Traceback (most recent call last):
  File "C:\Users\[REMOVED]\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\[REMOVED]\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "[REMOVED]Scripts\qq-chat-history.exe\__main__.py", line 7, in <module>
  File "[REMOVED]lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "[REMOVED]lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "[REMOVED]lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "[REMOVED]lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "[REMOVED]lib\site-packages\qq_chat_history\cli.py", line 18, in run
    messages = [msg.__dict__ for msg in parser.parse(lines)]
  File "[REMOVED]lib\site-packages\qq_chat_history\cli.py", line 18, in <listcomp>
    messages = [msg.__dict__ for msg in parser.parse(lines)]
  File "[REMOVED]lib\site-packages\qq_chat_history\parser.py", line 63, in parse
    extracted_id = self._extract_id(line)
  File "[REMOVED]lib\site-packages\qq_chat_history\parser.py", line 116, in _extract_id
    raise LookupError(f'cannot find id in line {line}')
LookupError: cannot find id in line 2006-01-02 15:04:05
  1. 如果昵称包含括号,可能会导致解析结果错误(id 和 name 字段)

输入

2021-12-22 21:18:57 ( _ ?(123456789)
消息内容

2021-12-22 21:20:41 ( * ?(123456789)
消息内容

2020-03-15 21:21:45 (o´・ω・`)σ<[email protected]>
消息内容

输出

[
  {
    "date": "2021-12-22 21:18:57",
    "id": " _ ?(123456789",
    "name": "",
    "content": "消息内容"
  },
  {
    "date": "2021-12-22 21:20:41",
    "id": " * ?(123456789",
    "name": "",
    "content": "消息内容"
  },
  {
    "date": "2020-03-15 21:21:45",
    "id": "o´・ω・`",
    "name": "(o´・ω・`)σ<mail@someaddr",
    "content": "消息内容"
  }
]

希望可以得到修复,让这个项目更健壮!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.