mlomb / chat-analytics Goto Github PK

View Code? Open in Web Editor NEW

584.0 8.0 42.0 14.12 MB

Generate interactive, beautiful and insightful chat analysis reports

Home Page: https://chatanalytics.app

License: GNU Affero General Public License v3.0

TypeScript 89.48% HTML 3.07% Less 5.03% JavaScript 2.19% C++ 0.17% Dockerfile 0.07%

chat analytics analysis analyzer app telegram discord whatsapp chat-analysis chat-analyzer data-visualization cli

chat-analytics's Introduction

Generate interactive, beautiful and insightful chat analysis reports

Open App • View Demo • Use CLI

A web app that takes chat exports from supported platforms and generates a single HTML file containing information, statistics and interactive graphs about them. Privacy is its main concern; chat data never leaves the device when generating reports. Selfhost with Docker!

💬 MESSAGES	🅰️ LANGUAGE	😃 EMOJI	🔗 LINKS	📞 CALLS	🌀 INTERACTION	💙 SENTIMENT	📅 TIMELINE

You can interact with the demo here!

Chat platform support

You can generate reports from the following platforms:

Platform	Formats supported	Text content	Edits & Replies	Attachment Types	Reactions	Profile picture	Mentions	Calls
Discord	`json` from DiscordChatExporter	✅	✅	✅	✅	✅ (until link expires)	✅ (as text)	✅
Messenger	`json` from Facebook DYI export	✅	❌	✅	❌	❌	✅ (as text)	❌
Telegram	`json` from Telegram Desktop	✅	✅	✅	❌ (not provided)	❌	✅ (as text)	✅
WhatsApp	`txt` or `zip` exported from a phone	✅	❌ (not provided)	✅* (if exported from iOS) 🟦 (generic if exported from Android)	❌ (not provided)	❌	✅ (as text)	❌

* not all languages are supported, check WhatsApp.ts.

You can't combine exports from different platforms.
The contribution of new platform parsers is always welcomed 🙂

Privacy & Analytics

Since all chat data always stays in the browser, there is zero risk of someone reading your chats. Note that the report HTML file contains sensitive information (one may reconstruct message contents for every message), so share your reports with people you trust.

The site does not use cookies either and uses a self-hosted version of Plausible. All events do not contain PII and the information is segmented (e.g. 1MB-10MB, etc.). Since I want full transparency, you can check the site analytics here.

CLI

You can generate reports from the command line using npx chat-analytics:

Usage: chat-analytics -p <platform> -i <input files>

Options:
      --help      Show help                                            [boolean]
      --version   Show version number                                  [boolean]
  -p, --platform  The platform to generate for
   [string] [required] [choices: "discord", "messenger", "telegram", "whatsapp"]
  -i, --inputs    The input file(s) to use (glob)             [array] [required]
  -o, --output    The output HTML filename     [string] [default: "report.html"]
      --demo      Mark the report as a demo           [boolean] [default: false]

For example:

npx chat-analytics -p discord -i "exported/*.json" -o report.html

Docs & Development

You can read docs/README.md for technical details, and docs/DEV.md for development instructions.
In docs/TODO.md you can find ideas and pending stuff to be implemented.

Acknowledgements

FastText, a library by Facebook for efficient sentence classification. MIT licensed.
lid.176.ftz model, provided by FastText developers for language identification. Distributed under CC BY-SA 3.0.
multilang-sentiment, for the translated AFINN database. MIT licensed.
Emoji sentiment data from the work of Kralj Novak, Petra; Smailović, Jasmina; Sluban, Borut and Mozetič, Igor, 2015, Emoji Sentiment Ranking 1.0, Slovenian language resource repository CLARIN.SI, http://hdl.handle.net/11356/1048. Licensed with CC BY-SA 4.0.
stopwords-iso for a collection of stopwords in a variety of languages. MIT licensed.
All the libraries and tools that made this project possible 😀

License

AGPLv3. See LICENSE.

chat-analytics's People

Contributors

Stargazers

Watchers

Forkers

dutchosintguy luckymurari anodynos hopperelec rchinerman mikeion fossabot umaar derusgaming hartmutlicht lynx1153 stephen-netu cyberflamego wiiiiam278 muhammadimyusoff directionyu konstantin-goldman kazin-kharizma robbaxi azeezx ar4s-eth boss2256 kihayu rociodcasco bosmanek akn101 metrbox khr0n0x prasad-chavan1 inujulianlee sk3artemis patient-zer0 finickyspider autoatende ohgodanoob dimitreeis eborrallo d3zyre monster1237 tomar-vrdate jpfl decentralizedinfo

chat-analytics's Issues

[Refactor] Create a new progress as part of downloadFile

Whenever a file is being downloaded, a new progress is created to display it with the title always being the same "Downloading file" and the subject always being the filename which downloadFile can easily extract from filepath. For example, here is an extract from DatabaseBuilder where 4 different files are downloaded. Only 3 of them use progress.new, but it seems reasonable to show a progress for "lid.176" too.

chat-analytics/pipeline/process/DatabaseBuilder.ts

Lines 85 to 120 in 5546201

    
           // load stopwords 
        
           { 
        
               progress.new("Downloading file", "stopwords-iso.json"); 
        
               interface StopwordsJSON { 
        
                   [lang: string]: string[]; 
        
               } 
        
               const data = (await downloadFile("/data/stopwords-iso.json", "json")) as StopwordsJSON; 
        
               // combining all stopwords is a mistake? 
        
               this.stopwords = new Set( 
        
                   Object.values(data) 
        
                       .reduce((acc, val) => acc.concat(val), []) 
        
                       .map((word) => matchFormat(word)) 
        
               ); 
        
               progress.done(); 
        
           } 
        
           // load language detector model 
        
           this.langPredictModel = await loadFastTextModel("lid.176"); 
        
           // load emoji data 
        
           { 
        
               progress.new("Downloading file", "emoji-data.json"); 
        
               const data = await downloadFile("/data/emoji-data.json", "json"); 
        
               this.emojisData = new Emojis(data); 
        
               progress.done(); 
        
           } 
        
           // load sentiment data 
        
           { 
        
               progress.new("Downloading file", "AFINN.zip"); 
        
               const afinnZipBuffer = await downloadFile("/data/AFINN.zip", "arraybuffer"); 
        
               progress.done(); 
        
               this.sentiment = new Sentiment(afinnZipBuffer, this.emojisData); 
        
           }

So, I think it would make sense to create this progress as part of downloadFile

chat-analytics/pipeline/File.ts

Lines 29 to 45 in 5546201

    
           export function downloadFile(filepath: string, responseType: "json"): Promise<any>; 
        
           export function downloadFile(filepath: string, responseType: "text"): Promise<string>; 
        
           export function downloadFile(filepath: string, responseType: "arraybuffer"): Promise<ArrayBuffer>; 
        
           export function downloadFile(filepath: any, responseType: XMLHttpRequestResponseType): Promise<any> { 
        
               return new Promise((resolve, reject) => { 
        
                   let xhr = new XMLHttpRequest(); 
        
                   xhr.responseType = responseType; 
        
                   xhr.open("GET", filepath); 
        
                   xhr.onload = function () { 
        
                       if (xhr.status === 200) resolve(xhr.response); 
        
                       else reject(xhr.statusText); 
        
                   }; 
        
                   xhr.onerror = (e) => reject("XHR Error"); 
        
                   xhr.onprogress = (e) => progress.progress("bytes", e.loaded || 0, e.total <= 0 ? undefined : e.total); 
        
                   xhr.send(); 
        
               }); 
        
           }

Don't show legend in 'Timeline' tab for reports with only one guild

[Feature request] Store dates and times as UNIX timestamps

This relates to one of the items listed in TODO.md

chat-analytics/TODO.md

Line 13 in 5546201

    
           * Currently, the timezone is picked from the machine that generates the report, it should be an option.

Currently, dates and times are stored as the number of days since the first date included in the report (dayIndex) and the number of seconds into that day (secondsOfDay).

chat-analytics/pipeline/serialization/MessageView.ts

Lines 8 to 9 in 4482358

    
           readonly dayIndex: Index; 
        
           readonly secondOfDay: number;

Of course, these values are timezone-specific, resulting in the need for a config option as listed in the TODO above. However, storing these as a timestamp will allow the date and time shown to the user to be adjusted based on the timezone suggested by their own browser. Most platforms will export dates and times using timestamps anyways, so maintaining this will slightly improve the speed a report can be generated and the maintainability of the code since the timestamp doesn't have to be converted to a dayIndex and secondsOfDay

Now, I understand that the purpose off dayIndex and secondsOfDay is to improve compression, but I'd argue a UNIX timestamp could be compressed just as well, if not better. First, I'll talk theoretically, then I'll use more realistic estimates.

Most platforms are designed to support dates from between 01/01/1970 and 19/01/2038, so the theoretical max size of dayIndex is 24855 which would take $log_2 24855 \approxeq 15$ bits to store. There are $60\cdot60\cdot24 = 86400$ seconds in a day which takes $log_2 86400 \approxeq 17$ bits to store, as correctly used below.

chat-analytics/pipeline/serialization/MessageSerialization.ts

Line 31 in 4482358

stream.setBits(17, message.secondOfDay); // 0-2^17 (needed 86400)

This adds up to make 32 bits, which is the same number of bits used to store UNIX timestamps. Which makes sense, since they're storing roughly the same information.

Now, realistically, users usually only have something like 3 years worth of data included in a report, which is 1095 days. This would make dayIndex require $log_2 1095 \approxeq 11$ bits and 28 bits once you include secondsOfDay. In contrast, a UNIX timestamp is essentially a 'secondIndex', so would only require $log_2(1095\cdot86400) \approxeq 27$ bits. Of course, this is a small difference, but it shows that UNIX timestamps would never require more bits to store, but can sometimes use less. This is because it can take advantage of the remainder bits from dayIndex.

I would still suggest that a config option be included, because sometimes the user generating a report may not want results to be shown differently based on the timezone the report is shown in since this could cause confusion. However, for most users I don't think this would be an issue.

PS: I don't know why I put so much effort writing this issue considering I already discussed this with mlomb in DMs lol

[Question]: How is sentiment calculated

One of my chats has around 20% negative messages, but I don't understand what this means. How is it determined if a message is "positive" or "negative"?

Less/least order for stats

~~Instead of a positive/negative dropdown,~~ I think there should be a dedicated icon for changing the direction of the sort. This icon could be used ~~elsewhere~~ like:

'Messages sent by channel' to switch to showing the channels with the least messages, which could be useful for identifying stale channels that an owner may want to delete
'Emojis used in [text / reactions]' to switch to showing the least used emojis. This is only really useful for custom emojis, but can again be used for identifying pointless emojis that an owner may want to delete. This is especially useful since Discord servers only have a limited number of custom emoji slots, so if an owner wants to add a new one they may need to find one to replace
'Messages sent by author' to switch to showing the authors who have sent the least messages. This might be useful for small private servers where owners only want to keep active users. It is also useful for staff applications, as an owner could reject a user's application if they aren't active enough

Originally posted by @hopperelec in #26 (comment)

I like this idea that was proposed by @hopperelec in another issue. I have no plans to implement it myself for now, but I'll keep the issue open

DST in Telegram exports

Hello, I get this error when I wait for the report to be generated, a friend told me that the cause might be the summer/winter time change if that helps.

_Error parsing file "result.json":

MessagesInterval can only be extended to the future_

Big export (>20GB) has broken data

When compacting message data, if it is larger than 16,776,540 messages, the program will crash due to the NodeJS maximum map size being exceeded. Below is the error (with some information redacted, like my windows username, for privacy reasons):

Compacting messages data 16.776.540/18.172.487 C:\Users(redacted)\node_modules\chat-analytics\dist\pipeline\process\DatabaseBuilder.js:306
messageAddresses.set(id, finalMessages.byteLength);
^

RangeError: Map maximum size exceeded
at Map.set ()
at DatabaseBuilder.compactMessagesData (C:\Users(redacted)\node_modules\chat-analytics\dist\pipeline\process\DatabaseBuilder.js:306:34)
at DatabaseBuilder.build (C:\Users(redacted)\node_modules\chat-analytics\dist\pipeline\process\DatabaseBuilder.js:331:40)
at generateDatabase (C:\Users(redacted)\node_modules\chat-analytics\dist\pipeline\index.js:15:20)
at async C:\Users(redacted)\node_modules\chat-analytics\dist\lib\CLI.js:93:16

Node.js v18.10.0

Remove misleading nicks

The exported analytics page seemingly uses the user's nickname instead of username, with their discriminator at the end. This leads to misleading and confusing data, where if a user is nicked, they can appear to be another user.

Hopefully this can be changed to instead show the proper username, instead of their nickname. Or possibly a way to chose if you want to see the nickname or username.

Incorrect counting of mentions (Discord) QoL

This is an amazing app, your work motivated me to start learning about language parsing and processing.

But I've noticed that in (Discord) report → Interaction → Most Mentioned has mentions that doesn't exist and were counted from links in send messages.
ie. message:

have you seen this: https://medium.com/@mostsignificant/3-ways-to-parse....

will count mostsignificant as a mention
I'm unable to find solution nor problematic function (I've never much used javascript/typescript). I don't find it problematic, it is just a feedback.

Have the most common capitalization shown for top words

Wouldn't it make more sense to have the most common capitalization shown for top words instead of (I assume) the first occurrence it detected?

In my case, the only reason for this would be because it's the ~~only~~ actually there are two top words shown capitalized, making them look just a little out of place next to the others:

Screenshot

Or would that be too performance-intensive?

Options to disable analytics

Hello !

In https://github.com/mlomb/chat-analytics/blob/main/assets/Plausible.ts, if we run this app self hosted, there are still data sent to your servers without any "proper" way to disable it entirely.

If you agree I can open a PR adding an env variable to disable (default enabled) analytics report.

Waiting for your green light or remarks.

Categorize domains by content

People using the 'External' tab are most likely trying to discover what type of content people usually link to.
I think they should be categorized, likely using some sort of website classification API such as Klazify. This information could be displayed as another bar chart or a pie chart.
However, some people may consider using a website classification API as a privacy issue. It might be possible to do this offline using a dataset or machine learning model, but I have not been able to find one available for free.

Negative `editedAfter` value causing serialization to break

Exported 500K Telegram messages, Chat Anal worked fine. Tried with 1M from an other group so twice as much and now I'm getting some error:

msedge_e92VWudO2V.mp4

Messages -> Sentiment -> Timeline

Messages are being count twice (duplicates)

if you upload two of the same discord reports, it counts each message twice.
this is important as if you have exports from overlapping time periods, messages will not be handled correctly

[Feature Request] RegEx in language filter

This might already be possible but id like the ability to use a regular expression to have a more customized filtered search when looking up the most used words.

[Feature request] Use the most up-to-date user information available

This is another niche thing, and I'm honestly not expecting you to want it implemented, but it's something that bothers me with the specific way I happen to be using ChatAnalytics, so I thought I'd mention it.

I like to use exports from some guilds or group chats which have often been deleted since I exported them meaning I am not longer able to update them. I'm honestly surprised the data is still in a format ChatAnalytics can read- one of the guilds used to have an error relating to some property being missing from the export but something changed in this PR has caused even this export to work (I have yet to actually look in to which property it is or what fixed it, I'll do this later).

But anyways, it works except for one annoying thing which is that lots of the user information is outdated in this reports, so if I upload any of them first then I notice lots of old (and nostalgic!) usernames, lots of missing profile pictures and for some reason deleted users display as 'Deleted User0' instead of 'Deleted User#USER_ID', which I've noticed ChatAnalytics usually does for Webhooks but not users.

What I'd like to suggest is that this type of information is overwritten by more recent exports. So, if one export is read, but then a newer one is read with different information, the information is changed to match that of the newer export instead of staying as the old data.

Call statistics

Hey; this is an amazing tool and it's really great to look at all the stats. While using it, I've thought that it could be a great addition if it also supported call statistics.
At least the discord exports saves the beginning and ending time of each call (in DMs & group chats). That could be parsed as well and stats about it could be displayed in a new tab.

I'd imagine it to be similar to the Message statistics, just using the call information instead of the message information:

The time of calls spent over the time by month / week / day (could be used in the same way as Messages sent over time) as the main graph
Activity by week day & hour (split / heat map) (could also work in the same way as the message activity, just using the time on call per day instead of messages per day as the base data)
Call statistics (such as the Message statistics in the message stats) could be
- Total amount of calls
- Total time of all calls
- Average duration of a call
- Median duration of a call
- Three out of four calls were shorter / longer than
- Longest call
- Amount of calls longer / shorter than e.g. an hour
- Total amount of messages sent during calls
- Average amount of messages sent during a call
- Average amount of calls per month
- Average time between calls
- Median time between calls
- Most active… (used just like the message pendant using call time instead of the amount of messages)
  - year ever
  - month ever
  - day ever
Duration of the calls (could be a chart similar to Time between sending and editing, showing how long the calls typically are)
Total amount of messages sent during calls / Average amount of messages sent during a call by author (could be like the Messages sent by author part; maybe also split into an "amount of messages" and a "% of messages" part like the Edited messages by author part)
Amount of calls / Total time of calls by call starter (could work in the same way as Messages sent by author)

All these statistics and graphs could also be filtered by the Call starter (or Call author), meaning the person who started the call. This could also be done just as in the message tab where you can filter all the data by the message author(s).

What do you think about this?

Identify words which only/mostly appear together and group them together

For example, I don't know what DefleMask is but I'd imagine most people using the words 'Defle' and 'Mask' are usually using them to say 'Defle Mask', so instead of being listed as two separate words they could be joined together. This would make 'Most used words' and 'Top reacted messages' easier to read, I think with minimal effect to the data size.

Error message when running locally on linux

Tried compiling a report locally using the CLI:

npx chat-analytics -p discord -i <json_glob> -o report.html

And received the following error:

npx: installed 38 in 3.662s
Unexpected token '?'

The report gets generated successfully using the online app.

Tool to obfuscate exports

Original mentioned in #79

It is often useful for people to share exports as a sample for testing or debugging Chat Analytics, however it can be tedious to obfuscate them to remove any personal data while keeping enough data for the report to appear realistic. There should be a tool which does this automatically, replacing any potentially personal information (message contents, IDs, author information, URLs) with gibberish (could just be some fixed text such as "REDACTED" or could be something like Lorem Ipsum). This tool should also compress the file to make it easier to share.

Add 'by author' and 'in channel' dropdown to 'Conversations started'

Currently, this card only shows the top authors for conversations started, but it could also be useful to see the top channels for conversations

URL tokenization does not match GET params

Pressing Language will also count words that appear in links, if you have sent many of them on the server it will result in everything being full of link metadata. There is already a filtering option but it would also be helpful if you had a button to exclude links from the count.

Example:

[WhatsApp] Bug with some "language words"

Hey! I found a bug analyzing my WhatsApp chat.

On the "Language" window, If I search on the search bar for a very specific type of word, like "Ana", it doesn't work, I have a chat, with 600 words literally saying "Ana", and it doesn't count.

Thanks!

Most positive/negative users based on sentiment

[Discord][Feature request] Show guild name next to channel name when multiple guilds are included in the report

I understand most users of chat-analytics are using it for researching an individual guild they own, but it is also really useful for people interested in researching themselves across guilds (the quantified self). Since this means including several guilds in the report, often which use many similar channel names (general, off-topic, vc-chat, chat-1, chat-2...), it can be hard to differentiate between some channels. If a user includes multiple guilds in a report, it would be useful to show the guild's name next to any listed channel names .

[Discord][Feature request] Expand emoji usage list

Hello! I have to say that this is a fantastic tool, especially for solving one's need for statistical data.

Recently, I've run out of custom emoji slots in my Discord server so I wanted to remove the least used ones. I tried using this tool since no other tool had historical or reaction data, but I ran into the problem that it only shows the most used emotes. So I'd be very grateful if the functionality of expanding the list to show all emotes was added or alternatively, a button to inverse the sorting of the list (Or all lists, for that matter).

Thanks.

Add demo screenshots to README

Provide a docker image

Hello !

It would be nice to have a ready to use docker image.

If you agree, I can create a PR with a Dockerfile and github actions to build on every push on main and tags and push to the github registry.

Waiting for your green light or any remarks !

[Feature request] Estimated time remaining when generating report

Generation time is already being tracked, and combined with file sizes, number of messages and average time spent parsing each message, it should be possible to estimate how long it will take to generate a report

Normalize word usage over time

Currently, the "[word] Usage over time" graph is pretty much no different to "Messages sent over time by [month]". Instead of showing the number of usages, it should a relative quantity. Percentage of words might not be very good because this would reach sub-1% very quickly, but what could work is a percentage relative to the most used word (but don't show it as a percentage because this would be confusing). For example, if "lol" is the most used word and written 162 times in one month and "lmao" is the second most used word and written 41 times in that same month, "lol" should show as 100 and "lmao" should show as $41/162\approx25$. One problem I can see with this is that the top word would likely show as 100 for all months, but there might be a way to mix some of these methods to provide a more useful number.

Full offline use

Hello !

In this line

chat-analytics/assets/app.html

Line 8 in 452037f

, we still refer to your main server to get the favicon, or we should use the current server we deployed ourselves (user IP not leaked to your servers).

chat-analytics/assets/app.html

Line 12 in 452037f

    
           <link href="https://fonts.googleapis.com/css2?family=Fira+Sans:wght@400;500;600&display=swap" rel="stylesheet">

we refer to external ressources, or we could serve them locally (better privacy).

If you agree I can open a PR and try to fix that.

W

Filterung mit Wordlist;
Sorry, mein Englisch ist zu schlecht, so versuche ich es auf Deutsch: Ich möchte im Rahmen einer Masterarbeit den Inhalt von ausgesuchten Telegram-Chats nach bestimmten Worten durchsuchen. Ich habe eine relevante Wordlist zusammengestellt, die ich in einem Suchlauf über die Chats laufen lassen möchte. Aber anscheinend kann Chat Analytics dies nicht. Oder mache ich was falsch?
Zweite Frage: Nach welchen Parametern ordnet Chat Analytics den Inhalt im Bereich "Sentiment" den Begriffen "positive" und "negative" bzw. "neutral" zu? Auch diese Funktion ist für meine Masterarbeit sehr wichtig, aber ich muss die Ergebnisse erklären können.
Vielen Dank für eure Hilfe
Hartmut

Names of Reaction Users are Missing

Noticed that the names of users who react to messages are not downloaded, just the reaction and total count of users who reacted to them. Would this be a feature that could be possible to add?

Server/groups comparisons

Use examples:

Compare which guild sends more messages
Compare sentiment between guilds
Compare in which guild a user is more active
Individual 'Server/group growth' per server (right now, this takes into account all servers in the export). These could be overlayed for a better comparison

[Discord][Feature request] Show ID for deleted users

Deleted users have duplicate usernames and discriminators but their user ID does not change so this can be used to still differentiate (and possibly identify) these users. For identification (such as if the report includes multiple guilds and you don't know which guild a user is from), it could be useful to just show this for all users. This doesn't need to be on display at all times and could just be something shown on hover (title attribute, which currently just shows the username) or when toggled.

[Discord] Implement DiscordChatExporter's support for stickers

DiscordChatExporter didn't previously support stickers, so they are currently not supported by ChatAnalytics either. However, DiscordChatExporter has since fixed this, so they can now be supported by ChatAnalytics

chat-analytics/pipeline/parse/parsers/DiscordParser.ts

Lines 64 to 65 in 9e7a0f8

    
           // NOTE: stickers right now are messages with empty content 
        
           //       see https://github.com/Tyrrrz/DiscordChatExporter/issues/638

chat-analytics/pipeline/Platforms.ts

Lines 17 to 19 in 9e7a0f8

    
           name: "Discord", 
        
           support: { 
        
               stickers: false,

More text classification data

Currently, text is only being classified by sentiment and language, but there's quite a few other ways to classify text. I propose text should be classified by formality, controversialness (this can be similar to sentiment, but most negative messages wouldn't be considered controversial and not all controversial messages would be considered negative), spamminess (this kind of goes hand-in-hand with formality) and confidence (so a question would be considered unconfident, a response like "I think", "maybe", "probably" or "I don't know" would be slightly more confident and a statement-like response would be much more confident). Of course, this could increase generation time and finding useful models for these classifications could be difficult, but I think they would be very useful so it's worth looking into.

I cannot reduce the timeline from the left on Brave

GM guys.

Sick tool, I have a small issue, I cannont reduce the timeline from the starting days on brave:

It works on opera tho I just tried. Have a nice day :)

[Feature request] Compress text using the letters they can contain

I've not looked very far into this just yet, but I believe all text being stored in the database as Unicode, but not all of them can use all characters in Unicode (most only require 6 bits whereas Unicode requires a minimum of 8). For example, Discord user avatars can only contain lowercase letters or digits (36 characters or 6 bits), domain names can only contain letters, digits, hyphens and periods (special characters are produced with domain-specific codes, so only 64 characters or 6 bits), I believe emoji names can only contain letters, spaces and colons (the colons are only used for the start and end, so this could be used for further compression. Only 54 characters or 6 bits)

Avoid rounding sentiment scores (to positive, neutral and negative)

Currently, sentiment data for a message is simplified to positive, neutral or negative, but sentiment is much more specific than that and being able to see how extreme the positivity/negativity is could be useful. For example, the pie chart could be a gradient where greener colors mean more positive messages and redder colors mean more negative messages. Sentiment over time could also be changed to (or have another dropdown option for) showing the average or total sentiment score.

[Discord][Feature request] Use a bot for live updates

I think it would be quite nice to be able to host a website which uses chat-analytics to share analytics about a guild. This is already possible by just generating a report using all the channels in the guild and hosting the report, but a Discord bot could be used to live-update the analytics too.

One downside to this idea is that it could encourage spam if people want to show on the page, but I don't think that would be for chat-analytics to worry about and instead the moderators of the guild it is being used (the website wouldn't necessarily need to be public anyways and be made accessible only to moderators).

I'd imagine the main issue implementing this wouldn't be creating the bot but instead how to handle the database and compression. However, compression is most important for downloading and sharing the report with others, but if the data is being stored on a web server then the data could be kept server-side and the report would be dynamically generated depending on what the client is viewing. If doing this, it might also make sense to use a SQL database for storing the data so the server can request only the data needed for an individual page on the report.

Allow quicker navigation of the generation instructions (primarily for development or debugging)

I find it mildly tedious having to go through the generation instructions every time I make a minor change that I want to quickly test or debug.

It would be useful to include sample exports which can quickly be used from the home page (if isDev). Having standardized exports is also useful for ensuring consistency between updates (for example, ensure an update didn't change the number of messages counted for whatever reason). It's also useful for people like me who mostly just uses one of the supported platforms, so doesn't have much data to use for testing, but would still like to test or work on other platforms. Finally, these sample exports could also be used for the demo(s), allowing developers to update the demo(s) to match changes.

For when a different report does still need to be uploaded, the 'Export your chats' window could be skipped, or have the 'Continue' button moved further up.

During stages 1, 2 and 3, everything needed (other than the exports) could start getting downloaded. During stage 3, uploaded exports could also start being parsed.

Finally, and this one may also apply to non-developers, once files are uploaded, the 'Generate report!' button should be automatically highlighted so that you can just press 'Enter' to continue.

last day cuts off graph

If you have data that goes to for example the 7th inclusively, the graph only shows up to the 6th.

in example photo you can see a very small sliver of 1px of the 7th day on the right.

[Feature request] User aliases

Many people use alt accounts, and if you're trying to compare a few specific users who use such alt accounts you might want to combine the data (such as messages sent by author, emojis sent by author, conversations started, authors that reply the most messages...) for each of their accounts. This could be implemented as a config option for generating the report, but since this data exists already (i.e. it can be manually compared, but this isn't ideal) it could be something selected while viewing the report.

Word/Character count

Word/character count would be slightly more useful than message count since some of my chats tend to have longer paragraph-style messages with others having mostly one-word messages. Great tool otherwise!

Message statistics mostly blank.

LMK if theres more information I can/I need to provide! :)

Inconsistency with author/channel dropdown

In the 'Messages' tab, 'Messages sent by author' and 'Messages sent by channel' are separate cards.
However, in the 'Emojis' tab, 'Emojis sent by author' and 'Emojis sent in channel' are one card with a dropdown, as with 'Authors that get the most reactions' and 'Channels that get the most reactions' in the 'Interaction' tab.
I'm not sure which of these formats is better, but I think it would be best to stick to one.

	// load stopwords
	{
	progress.new("Downloading file", "stopwords-iso.json");
	interface StopwordsJSON {
	[lang: string]: string[];
	}
	const data = (await downloadFile("/data/stopwords-iso.json", "json")) as StopwordsJSON;

	// combining all stopwords is a mistake?
	this.stopwords = new Set(
	Object.values(data)
	.reduce((acc, val) => acc.concat(val), [])
	.map((word) => matchFormat(word))
	);
	progress.done();
	}

	// load language detector model
	this.langPredictModel = await loadFastTextModel("lid.176");

	// load emoji data
	{
	progress.new("Downloading file", "emoji-data.json");
	const data = await downloadFile("/data/emoji-data.json", "json");
	this.emojisData = new Emojis(data);
	progress.done();
	}

	// load sentiment data
	{
	progress.new("Downloading file", "AFINN.zip");
	const afinnZipBuffer = await downloadFile("/data/AFINN.zip", "arraybuffer");
	progress.done();

	this.sentiment = new Sentiment(afinnZipBuffer, this.emojisData);
	}

	export function downloadFile(filepath: string, responseType: "json"): Promise<any>;
	export function downloadFile(filepath: string, responseType: "text"): Promise<string>;
	export function downloadFile(filepath: string, responseType: "arraybuffer"): Promise<ArrayBuffer>;
	export function downloadFile(filepath: any, responseType: XMLHttpRequestResponseType): Promise<any> {
	return new Promise((resolve, reject) => {
	let xhr = new XMLHttpRequest();
	xhr.responseType = responseType;
	xhr.open("GET", filepath);
	xhr.onload = function () {
	if (xhr.status === 200) resolve(xhr.response);
	else reject(xhr.statusText);
	};
	xhr.onerror = (e) => reject("XHR Error");
	xhr.onprogress = (e) => progress.progress("bytes", e.loaded \|\| 0, e.total <= 0 ? undefined : e.total);
	xhr.send();
	});
	}

	// NOTE: stickers right now are messages with empty content
	// see https://github.com/Tyrrrz/DiscordChatExporter/issues/638

mlomb / chat-analytics Goto Github PK

chat-analytics's Introduction

Generate interactive, beautiful and insightful chat analysis reports

Chat platform support

Privacy & Analytics

CLI

Docs & Development

Acknowledgements

License

chat-analytics's People

Contributors

Stargazers

Watchers

Forkers

chat-analytics's Issues

PS: I don't know why I put so much effort writing this issue considering I already discussed this with mlomb in DMs lol

Recommend Projects

Recommend Topics

Recommend Org