Giter Site home page Giter Site logo

knadh / tg-archive Goto Github PK

View Code? Open in Web Editor NEW
812.0 19.0 122.0 63 KB

A tool for exporting Telegram group chats into static websites like mailing list archives.

License: MIT License

Python 68.71% HTML 19.16% CSS 10.52% JavaScript 1.61%
telegram telegram-api static-site static-site-generator exporter telegram-export

tg-archive's Introduction

favicon

tg-archive is a tool for exporting Telegram group chats into static websites, preserving chat history like mailing list archives.

Preview

The @fossunited Telegram group archive.

image

How it works

tg-archive uses the Telethon Telegram API client to periodically sync messages from a group to a local SQLite database (file), downloading only new messages since the last sync. It then generates a static archive website of messages to be published anywhere.

Features

  • Periodically sync Telegram group messages to a local DB.
  • Download user avatars locally.
  • Download and embed media (files, documents, photos).
  • Renders poll results.
  • Use emoji alternatives in place of stickers.
  • Single file Jinja HTML template for generating the static site.
  • Year / Month / Day indexes with deep linking across pages.
  • "In reply to" on replies with links to parent messages across pages.
  • RSS / Atom feed of recent messages.

Install

  • Get Telegram API credentials. Normal user account API and not the Bot API.
    • If this page produces an alert stating only "ERROR", disconnect from any proxy/vpn and try again in a different browser.
  • Install with pip3 install tg-archive (tested with Python 3.8.6).

Usage

  1. tg-archive --new --path=mysite (creates a new site. cd into mysite and edit config.yaml).
  2. tg-archive --sync (syncs data into data.sqlite). Note: First time connection will prompt for your phone number + a Telegram auth code sent to the app. On successful auth, a session.session file is created. DO NOT SHARE this session file publicly as it contains the API autorization for your account.
  3. tg-archive --build (builds the static site into the site directory, which can be published)

Customization

Edit the generated template.html and static assets in the ./static directory to customize the site.

Note

  • The sync can be stopped (Ctrl+C) any time to be resumed later.
  • Setup a cron job to periodically sync messages and re-publish the archive.
  • Downloading large media files and long message history from large groups continuously may run into Telegram API's rate limits. Watch the debug output.

Licensed under the MIT license.

tg-archive's People

Contributors

abhinavxd avatar b1tg avatar dependabot[bot] avatar djerryz avatar faraazb avatar farzat07 avatar harduino avatar iamcool0090 avatar jcahill avatar knadh avatar l3str4nge avatar microchipq avatar milahu avatar scarlion1 avatar seele0oo avatar thunderbottom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tg-archive's Issues

New Function

Hi owner, could you please add export date like telegram ? As I know, telegram has given owner of the groups to disable/enable function export data on official app. So I came to this tool and it works very well. But it exports all message, so it runs very slow and I dont really need the previous message too much.
So coud you please add it ? Thank you very much and sorry if I disturbing you.
image

sqlite3.OperationalError: near "ON": syntax error

When I run tg-archive --sync I get the error:

Traceback (most recent call last):
  File "/usr/local/bin/tg-archive", line 11, in <module>
    sys.exit(main())
 File "/usr/local/lib/python3.6/dist-packages/tgarchive/__init__.py", line 119, in main
    Sync(cfg, args.session, DB(args.data)).sync(args.id)
 File "/usr/local/lib/python3.6/dist-packages/tgarchive/sync.py", line 69, in sync
    self.db.insert_user(m.user)
 File "/usr/local/lib/python3.6/dist-packages/tgarchive/db.py", line 174, in insert_user
    """, (u.id, u.username, u.first_name, u.last_name, " ".join(u.tags), u.avatar))
sqlite3.OperationalError: near "ON": syntax error

Maybe I was wrong somewhere with the configuration or is it a bug in the code?

The phone number is invalid (caused by SendCodeRequest) and no data found to publish site

Running into this error when running tg-archive --sync

Traceback (most recent call last):
File "/home/lode/.local/bin/tg-archive", line 11, in <module>
sys.exit(main())
File "/home/lode/.local/lib/python3.6/site-packages/tgarchive/__init__.py", line 119, in main
Sync(cfg, args.session, db.DB(args.data)).sync(args.id)
File "/home/lode/.local/lib/python3.6/site-packages/tgarchive/sync.py", line 33, in __init__
self.client.start()
File "/home/lode/.local/lib/python3.6/site-packages/telethon/client/auth.py", line 133, in start
else self.loop.run_until_complete(coro)
File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
return future.result()
File "/home/lode/.local/lib/python3.6/site-packages/telethon/client/auth.py", line 189, in _start
await self.send_code_request(phone, force_sms=force_sms)
File "/home/lode/.local/lib/python3.6/site-packages/telethon/client/auth.py", line 515, in send_code_request
phone, self.api_id, self.api_hash, types.CodeSettings()))
File "/home/lode/.local/lib/python3.6/site-packages/telethon/client/users.py", line 30, in __call__
return await self._call(self._sender, request, ordered=ordered)
File "/home/lode/.local/lib/python3.6/site-packages/telethon/client/users.py", line 79, in _call
result = await future
telethon.errors.rpcerrorlist.PhoneNumberInvalidError: The phone number is invalid (caused by SendCodeRequest)

session.session file is created though, running tg-archive --build gives
2021-05-25 17:02:31,213: building site 2021-05-25 17:02:31,236: no data found to publish site

Do I need to paste the group ID with "@" in the yaml file?
This is my yaml file, do I need to edit something?:

# Telegram API ID and hash from the Telegram dev portal.
# Signup for it here: https://my.telegram.org/auth?to=apps
api_id: "redacted"
api_hash: "redacted"
# Telegram channel / group name to import. Your user account
# that was used to creat the API ID should be a member of this group.
group: "redacted"
# Avatars and media will be downloaded into media_dir.
download_media: True
download_avatars: True
avatar_size: [64, 64] # Width, Height.
media_dir: "media"
# These should be configured carefully to not get rate limited by Telegram.
# Number of messages to fetch in one batch.
fetch_batch_size: 2000
# Seconds to wait after fetching one full batch and moving on to the next one.
fetch_wait: 5
# Max number of messages to fetch across all batches before the stopping.
# This should be greater than fetch_batch_size.
# Set to 0 to never stop until all messages have been fetched.
fetch_limit: 0
publish_dir: "site"
static_dir: "static"
per_page: 500
show_day_index: True
# URL to link Telegram group names and usernames.
telegram_url: "https://t.me/{id}"
# IMPORTANT: Telegram shows the full name on your (API key holder's)
# phonebook for users who are in your phonebook.
show_sender_fullname: False
publish_rss_feed: True
rss_feed_entries: 100 # Show Latest N messages in the RSS feed.
# Root URL where the site will be hosted. No trailing slash.
site_url: "https://mysite.com"
site_name: "@{group} - Telegram group archive"
site_description: "Public archive of Telegram messages."
meta_description: "@{group} {date} - Telegram message archive."
page_title: "Page {page} - {date} @{group} Telegram message archive."
                                                                                                                                                                                                                          

TypeError: 'NoneType' object is not iterable

Hi

Just tried tg-archive with a random group and stumbled upon the following error:

2021-05-27 11:49:19,763: error downloading avatar: #123456789: cannot identify image file <_io.BytesIO object at 0x6af3b1758900>
Traceback (most recent call last):
  File "/usr/local/bin/tg-archive", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/site-packages/tgarchive/__init__.py", line 112, in main
    Sync(cfg, args.session, db.DB(args.data)).sync(args.id)
  File "/usr/local/lib/python3.9/site-packages/tgarchive/sync.py", line 60, in sync
    for m in self._get_messages(group_id,
  File "/usr/local/lib/python3.9/site-packages/tgarchive/sync.py", line 121, in _get_messages
    med = self._make_poll(m)
  File "/usr/local/lib/python3.9/site-packages/tgarchive/sync.py", line 182, in _make_poll
    for i, r in enumerate(msg.media.results.results):
TypeError: 'NoneType' object is not iterable

Any idea what's going on?

Incorrect sort order for years in sidebar

I'm using tg-archive 0.3.9 with a Telegram group which existed for a few years already.

When creating the static website with tg-archive --build, the generated sidebar is sorting the years in the sidebar incorrectly (see screenshot).

Sort order in sidebar

AttributeError: 'Channel' object has no attribute 'bot'

tg-archive --sync

It was second run, at first run I entered user tel and code

2021-03-24 12:04:10,638: starting Telegram sync (batch_size=2000, limit=0, wait=5)
2021-03-24 12:04:10,655: Connecting to 149.154.167.51:443/TcpFull...
2021-03-24 12:04:10,703: Connection to 149.154.167.51:443/TcpFull complete!
Traceback (most recent call last):
  File "/usr/local/bin/tg-archive", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/tgarchive/__init__.py", line 112, in main
    Sync(cfg, args.session, db.DB(args.data)).sync(args.id)
  File "/usr/local/lib/python3.8/site-packages/tgarchive/sync.py", line 57, in sync
    for m in self._get_messages(self.config["group"],
  File "/usr/local/lib/python3.8/site-packages/tgarchive/sync.py", line 137, in _get_messages
    user=self._get_user(m.sender),
  File "/usr/local/lib/python3.8/site-packages/tgarchive/sync.py", line 143, in _get_user
    if u.bot:
AttributeError: 'Channel' object has no attribute 'bot'

Permission denied on example project files

when i package tg-archive with the nix package manager
the example files in result/lib/python3.9/site-packages/tgarchive/example/ are read only

when i run tg-archive --new, the files in example/ are still read only

$ find example/ | xargs stat -c"%a %n"
555 example/
444 example/rss_template.html
444 example/config.yaml
444 example/template.html
555 example/static
444 example/static/thumb.png
444 example/static/main.js
444 example/static/style.css
444 example/static/logo.svg
444 example/static/favicon.png

solution https://stackoverflow.com/a/2853934/10440128

Links in message text

Sometimes a part of the message text hides a link to another message or even an external website. For example, the TelegramTips channel is full of such links. However, when the channel is archived and built into a website, these links are shown as regular text.

I am not sure whether the data is saved into the database but not included in the html, or the data is not even saved in the database in the first place. If it is saved in the database, I would appreciate if the relevant field is specified, so that I can include it in the template.

Download all groups additional setting

Have you thought about add settings that will specify to download all groups instead of one (which is False by default), something like that (in yaml file):

download_all_user_groups: True

tg-archive --sync

2022-03-19 17:16:35,013: Failed to load SSL library: <class 'OSError'> (no library called "ssl" found)
2022-03-19 17:16:35,015: cryptg detected, it will be used for encryption
Traceback (most recent call last):
File "C:\Users\Chy\AppData\Local\Programs\Python\Python38\Scripts\tg-archive-script.py", line 33, in
sys.exit(load_entry_point('tg-archive==0.5.4', 'console_scripts', 'tg-archive')())
File "C:\Users\Chy\AppData\Local\Programs\Python\Python38\lib\site-packages\tg_archive-0.5.4-py3.8.egg\tgarchive_init_.py", line 124, in main
cfg = get_config(args.config)
File "C:\Users\Chy\AppData\Local\Programs\Python\Python38\lib\site-packages\tg_archive-0.5.4-py3.8.egg\tgarchive_init_.py", line 46, in get_config
with open(path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'config.yaml'

Get error when catch null poll

The Group had pool, where user not vote. When programm try catch message - the program crash!
If user vote - programm no error!

2021-10-26 17:20:12,606: fetching from last message id=7898 (2020-11-15 00:00:00)
Traceback (most recent call last):
File "/usr/local/bin/tg-archive", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/tgarchive/init.py", line 112, in main
Sync(cfg, args.session, db.DB(args.data)).sync(args.id)
File "/usr/local/lib/python3.9/site-packages/tgarchive/sync.py", line 60, in sync
for m in self._get_messages(group_id,
File "/usr/local/lib/python3.9/site-packages/tgarchive/sync.py", line 121, in _get_messages
med = self._make_poll(m)
File "/usr/local/lib/python3.9/site-packages/tgarchive/sync.py", line 182, in _make_poll
for i, r in enumerate(msg.media.results.results):

TypeError: 'NoneType' object is not iterable
Снимок экрана от 2021-10-26 17-30-32

Снимок экрана от 2021-10-26 17-33-04

-if vote
Снимок экрана от 2021-10-26 17-30-52

then OK

Loosen version requirements

When running tg-archive -b on Arch this error occurs:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 568, in _build_master
    ws.require(__requires__)
  File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 886, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 777, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (Pillow 8.4.0 (/usr/lib/python3.9/site-packages), Requirement.parse('Pillow==8.3.2'), {'tg-archive'})

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/tg-archive", line 33, in <module>
    sys.exit(load_entry_point('tg-archive==0.5.1', 'console_scripts', 'tg-archive')())
  File "/usr/lib/python3.9/site-packages/tgarchive/__init__.py", line 138, in main
    from .build import Build
  File "/usr/lib/python3.9/site-packages/tgarchive/build.py", line 5, in <module>
    import pkg_resources
  File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 3243, in <module>
    def _initialize_master_working_set():
  File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 3226, in _call_aside
    f(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 3255, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 570, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 583, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 772, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'Pillow==8.3.2' distribution was not found and is required by tg-archive

Running tg-archive -s though works perfectly.

Package from which tg-archive was installed: https://aur.archlinux.org/packages/tg-archive-git/
Version of Pillow installed: 8.4.0

Maybe the version requirements should be relaxed a bit? For example by specifying Pillow>=8.3.2 instead. This should help support package installation on rolling release systems.

FEATURE REQUEST - selective download of attachments

Would there be a simple way to not download video content? It's quite easy for groups to have gigs of data per day and would be nice to be able to not download that content.

I put this:

if isinstance(msg.media, telethon.tl.types.MessageMediaPhoto):

before this:

      fpath = self.client.download_media(msg, file=tempfile.gettempdir())
      basename = os.path.basename(fpath)

      newname = "{}.{}".format(msg.id, self._get_file_ext(basename))
      shutil.move(fpath, os.path.join(self.config["media_dir"], newname))

and it seems to work... I'm not sure what i'm excluding other than images though, i was hoping to exclude video and that seems to work.

Thanks for a great program

'NoneType' object has no attribute 'mime_type'

Hi folks!

I'm trying to sync a one-to-one chat and see this error:

2021-05-24 17:34:12,013: downloading media #1122666
2021-05-24 17:34:12,015: Starting direct file download in chunks of 131072 at 0, stride 131072
Traceback (most recent call last):
  File "/usr/local/bin/tg-archive", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/tgarchive/__init__.py", line 112, in main
    Sync(cfg, args.session, db.DB(args.data)).sync(args.id)
  File "/usr/local/lib/python3.7/site-packages/tgarchive/sync.py", line 59, in sync
    ids=ids):
  File "/usr/local/lib/python3.7/site-packages/tgarchive/sync.py", line 112, in _get_messages
    m.media.document.mime_type == "application/x-tgsticker":
AttributeError: 'NoneType' object has no attribute 'mime_type'

[FEATURE REQUEST] Searchengine

Is it possible to add a sitewide search into the static html-page? We got a pretty big archive and finding old posts will be much easier with a searchengine included.

Django Support

Are you thinking to convert this script to Django support?

Run as bot

telethon.errors.rpcerrorlist.BotMethodInvalidError: The API access for bot users is restricted. The method you tried to invoke cannot be executed as a bot (caused by GetDialogsRequest)

It does not appear to support running as a bot.
bot has been granted group administrator privileges and has permission to read messages.
Will it support running in bot?

Support takeout mode

It's the mode used by Telegram Desktop to export chats. Basically, it wraps every request in a "takeout session", which presumably lowers limits (as in, you can iter_messages with a wait_time=0). I took a quick look at the project files and it doesn't seem to be used yet. See documentation at client.takeout for usage.

Support user comments on posts

Love the tool and I think it's gonna play a very important role in data archiving.

Is it possible to implement downloading the comments on posts to archive the discussion that people from a group had.
That would be a really important feature for my project.

Thank a lot.

Option to remove timestamp

Hello,

Thank you so much for this! An option to hide the timestamp from the pages (largely irrelevant, as message ID still serves the purpose for threading / chronology) would do better to preserve privacy of members. We at CashlessConsumer are considering using this and this really unlocks a wealth of information within the community to a broader audience.

Ability to embed video and generate thumbnails

Hello, this script is great, but when comparing it to inbuild telegram export feature (whih is slow and unefficient) i very miss embeding mp4 videos in page, i mean by it the thumbnail. Im have small knowrgle about html, but im almost sure editing template.html would not help due thumbs missing :/

It would be cool to see added this option to script :)

Messages from foreign channel

While archiving messages from a public channel, messages from another channel @DebugSchool where included in the built website. Is this behaviour expected? And how can these messages be removed?

With the introduction of sponsors, I'm afraid that these might be sponsor messages. If that is the case, I believe an option to exclude these when building the website, or at least the rss feed, would be a good idea.

Sync private group

Hey, I'm trying to sync a private group, and seeing some issues. Private groups doesn't have public handle like @group_name.
I've tried to retrieve a group id from the telegram web version:

  1. Open chat in web.telegram.org
  2. Link should look like https://web.telegram.org/#/im?p=sXXXXXXXXXX_YYYYYYYYYYYYYYYYYYYY
  3. Group ids usually look like -100XXXXXXXXXX.

I put that id into config but it crashes. I tried without -100 prefix and also seeing the crash.

Here is the stacktrace:

2021-03-22 15:10:10,754: starting Telegram sync (batch_size=2000, limit=0, wait=5)
2021-03-22 15:10:10,774: Connecting to 149.154.167.51:443/TcpFull...
2021-03-22 15:10:10,826: Connection to 149.154.167.51:443/TcpFull complete!
Traceback (most recent call last):
  File "/usr/local/bin/tg-archive", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/tgarchive/__init__.py", line 112, in main
    Sync(cfg, args.session, db.DB(args.data)).sync(args.id)
  File "/usr/local/lib/python3.7/site-packages/tgarchive/sync.py", line 59, in sync
    ids=ids):
  File "/usr/local/lib/python3.7/site-packages/tgarchive/sync.py", line 101, in _get_messages
    reverse=True):
  File "/usr/local/lib/python3.7/site-packages/telethon/sync.py", line 39, in syncified
    return loop.run_until_complete(coro)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py", line 583, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.7/site-packages/telethon/client/messages.py", line 574, in get_messages
    return await it.collect()
  File "/usr/local/lib/python3.7/site-packages/telethon/requestiter.py", line 113, in collect
    async for message in self:
  File "/usr/local/lib/python3.7/site-packages/telethon/requestiter.py", line 58, in __anext__
    if await self._init(**self.kwargs):
  File "/usr/local/lib/python3.7/site-packages/telethon/client/messages.py", line 26, in _init
    self.entity = await self.client.get_input_entity(entity)
  File "/usr/local/lib/python3.7/site-packages/telethon/client/users.py", line 432, in get_input_entity
    await self._get_entity_from_string(peer))
  File "/usr/local/lib/python3.7/site-packages/telethon/client/users.py", line 570, in _get_entity_from_string
    'Cannot find any entity corresponding to "{}"'.format(string)
ValueError: Cannot find any entity corresponding to "<redacted_group_id>"

Configurable Privacy Options

While discussing this project + chat privacy in general, we (at CashlessConsumer) had a small discussion around chat privacy and it will be useful for this project to have configurable privacy options so that this can serve varied groups which have different privacy needs.

image

Posting some initial thoughts on this

  1. Option to show / hide timestamp - the template can show / hide timestamp based on this.
  2. Link Mode :- Option to pick only links / media and scrub off any comments. This lets the group archive only resources while skipping the chatter. This lets community have freedom to chat, prevents chatty noise on archives, while preserving value. This can be True / False
  3. Include / Exclude from archives based on select hashtags - If LinkMode is True, a set of hashtags can be added as 'Include' hashtags so those important messages get archived, wihle leaving out remaining chats. If LinkMode is False, a set of hashtags can be added to 'Exclude' hashtag (like #DontArchive #KeepThisPrivate), so that the group can still have private non-archiving conversations even while on Full Archive mode.

Thoughts?

ERROR: No matching distribution found for cryptg==0.2.post2

Thank for the tg-archive, it looks useful

Unfortunately, I got the error during installing tg-archive

$ pip3 -V
pip 21.0.1 from /usr/local/lib/python3.9/site-packages/pip (python 3.9)
$ python -V
Python 2.7.16
$ python3 -V
Python 3.9.1
$ pip3 install tg-archive
Collecting tg-archive
  Using cached tg-archive-0.3.0.tar.gz (24 kB)
Collecting telethon==1.21
  Using cached Telethon-1.21-py3-none-any.whl (515 kB)
Collecting jinja2==2.11.3
  Using cached Jinja2-2.11.3-py2.py3-none-any.whl (125 kB)
Collecting PyYAML==5.4.1
  Using cached PyYAML-5.4.1-cp39-cp39-macosx_10_9_x86_64.whl (259 kB)
Collecting tg-archive
  Downloading tg-archive-0.2.0.tar.gz (23 kB)
ERROR: Cannot install tg-archive==0.2.0 and tg-archive==0.3.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    tg-archive 0.3.0 depends on cryptg==0.2.post2
    tg-archive 0.2.0 depends on cryptg==0.2.post2

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies
$ pip3 install tg-archive==0.3.0
Collecting tg-archive==0.3.0
  Using cached tg-archive-0.3.0.tar.gz (24 kB)
Collecting telethon==1.21
  Using cached Telethon-1.21-py3-none-any.whl (515 kB)
Collecting jinja2==2.11.3
  Using cached Jinja2-2.11.3-py2.py3-none-any.whl (125 kB)
Collecting PyYAML==5.4.1
  Using cached PyYAML-5.4.1-cp39-cp39-macosx_10_9_x86_64.whl (259 kB)
ERROR: Could not find a version that satisfies the requirement cryptg==0.2.post2 (from tg-archive)
ERROR: No matching distribution found for cryptg==0.2.post2

Todo: Incremental builds

For large groups, re-publishing and re-uploading every page isn't ideal. There should be a mechanism to build incrementally, maybe with an optional --incremental flag.

  1. The dynamic date index on the left sidebar is present in every single page. This should become a standalone page and be iframed in all other pages so that it can change independently without modifying older pages.
  2. What about pagination? As the month progresses, new page numbers appear at the bottom of every page for that month. Make this also an iframe? Or let the current month be rebuilt every time anyway as a trade-off?

Why running with crontab showing sqlite3.OperationalError: near "(": syntax error

The script works well with command line . But while running with cron it shows the follwing error.

2022-05-08 19:40:02,089: cryptg detected, it will be used for encryption
2022-05-08 19:40:02,596: starting Telegram sync (batch_size=4000, limit=0, wait=5, mode=standard)
2022-05-08 19:40:02,601: Connecting to 91.108.56.108:443/TcpFull...
2022-05-08 19:40:02,606: Connection to 91.108.56.108:443/TcpFull complete!
2022-05-08 19:40:02,688: fetching from last message id=2102 (2022-05-07 00:00:00)
2022-05-08 19:40:02,875: finished. fetched 0 messages. last message = 2022-05-07 00:00:00
2022-05-08 19:40:28,689: building site
Traceback (most recent call last):
File "/usr/local/bin/tg-archive", line 11, in
load_entry_point('tg-archive==0.5.4', 'console_scripts', 'tg-archive')()
File "/usr/local/lib/python3.6/site-packages/tgarchive/init.py", line 150, in main
b.build()
File "/usr/local/lib/python3.6/site-packages/tgarchive/build.py", line 58, in build
for d in self.db.get_dayline(month.date.year, month.date.month, self.config["per_page"]):
File "/usr/local/lib/python3.6/site-packages/tgarchive/db.py", line 128, in get_dayline
""", (limit, "{}{:02d}".format(year, month)))
sqlite3.OperationalError: near "(": syntax error

Code

After entering mobile number, no code is sent.

Telegram requires new update

telethon.errors.rpcbaseerrors.AuthKeyError: RPCError 406: UPDATE_APP_TO_LOGIN
As I know telegram has updated something like this. So hope you can fix it soon ^^. Thank you for creating a greatest app
image

sqlite3.OperationalError

I am getting the following error when running tg-archive --sync

# tg-archive --sync 2022-03-03 08:49:22,650: cryptg detected, it will be used for encryption 2022-03-03 08:49:22,901: starting Telegram sync (batch_size=2000, limit=0, wait=5, mode=standard) 2022-03-03 08:49:23,122: Connecting to 149.154.175.60:443/TcpFull... 2022-03-03 08:49:23,165: Connection to 149.154.175.60:443/TcpFull complete! 2022-03-03 08:49:23,629: fetching from last message id=0 (None) Traceback (most recent call last): File "/usr/local/bin/tg-archive", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.6/dist-packages/tgarchive/__init__.py", line 132, in main s.sync(args.id, args.from_id) File "/usr/local/lib/python3.6/dist-packages/tgarchive/sync.py", line 65, in sync self.db.insert_user(m.user) File "/usr/local/lib/python3.6/dist-packages/tgarchive/db.py", line 174, in insert_user """, (u.id, u.username, u.first_name, u.last_name, " ".join(u.tags), u.avatar)) sqlite3.OperationalError: near "ON": syntax error

No Images/Movies Downloaded since Oct 2020

I have setup tg-archive to download the channel MARKmobil just to have a backup of all his work.

The website can be found here: https://markmobil.borg.ch/telegram/2020-10.html#2020-10-06

The web pages generated from the downloaded data look fine until September 2020, the last picture i can see is from October 2020, and after that all movies and pictures are missing.

I can see many of the following error messages with differing media number:
2022-02-24 11:13:23,855: downloading media #2690
2022-02-24 11:13:23,856: Starting direct file download in chunks of 131072 at 0, stride 131072
2022-02-24 11:13:24,030: error downloading media: #2690: The file reference has expired and is no longer valid or it belongs to self-destructing media and cannot be resent (caused by GetFileRequest)

If I look into the channel using telegram-desktop I can still see the latest pictures.

skipping admin posts

Hello,

My tg-archive seems to be skipping admin posts, ever since a group I follow has multiple admins.. The admins post under NAMEOFGROUP (Admin1) or NAMEOFGROUP (Admin2)..

Tg-archive doesn't seem to raise an exception, it just says fetched 0 messages when I point to the post in particular.

Any ideas?

Thanks in advance

timeless build

add option for "timeless build" = on build, dont update these times:

site/index.xml
<lastBuildDate>Wed, 08 Jun 2022 08:57:59 +0000</lastBuildDate>

site/index.atom
<updated>2022-06-08T08:57:59.732044+00:00</updated>

these updates produce "diff noise" when the build is stored in git

workaround: remove the tags from site/index.xml and site/index.atom

sed -i -E 's|<lastBuildDate>[^<]+</lastBuildDate>||g' site/index.xml
sed -i -E 's|<updated>[^<]+</updated>||g' site/index.atom

Unicode Error

When building website, receive the following error:


  File "C:\Users\PycharmProjects\vsett_backup\venv2\Scripts\tg-archive-script.py", line 33, in <module>
    sys.exit(load_entry_point('tg-archive==0.3.4', 'console_scripts', 'tg-archive')())
  File "c:\users\pycharmprojects\vsett_backup\venv2\lib\site-packages\tgarchive\__init__.py", line 126, in main
    b.build()
  File "c:\users\pycharmprojects\vsett_backup\venv2\lib\site-packages\tgarchive\build.py", line 83, in build
    self._render_page(messages, month, dayline,
  File "c:\users\pycharmprojects\vsett_backup\venv2\lib\site-packages\tgarchive\build.py", line 117, in _render_page
    f.write(html)
  File "C:\Users\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f923' in position 12123: character maps to <undefined>

This seems to be resolved by editing line 116 in build.py to include encoding='utf-8' after "w".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.