Giter Site home page Giter Site logo

cclgroupltd / ccl_chrome_indexeddb Goto Github PK

View Code? Open in Web Editor NEW
126.0 8.0 32.0 276 KB

(Sometimes partial) Python re-implementations of the technologies involved in reading various data sources in Chrome-esque applications.

License: MIT License

Python 94.08% HTML 5.92%
python chrome indexeddb leveldb snappy cache dfir digitalforensics localstorage sessionstorage

ccl_chrome_indexeddb's Introduction

ccl_chromium_reader

This repository contains a package of (sometimes partial) re-implementations of the technologies used by Chrome/Chromium/Chrome-esque applications to store data in a range of data-stores in Python. These libraries provide programmatic access to these data-stores with a digital forensics slant (e.g. for most artefacts, offsets or IDs for the data are provided so that they can be located and manually checked).

The technologies supported are:

  • Snappy decompression
  • LevelDB
  • Protobuf
  • Pickles
  • V8 object deserialization
  • Blink object deserialization
  • IndexedDB
  • Web Storage (Local Storage and Session Storage)
  • Cache (both Block File and Simple formats)
  • SNSS Session files (partial support)
  • FileSystem API
  • Notifications API (Platform Notifications)
  • Downloads (from shared_proto_db)
  • History

Additionally, there are a number of utility scripts included such as:

  • ccl_chromium_cache.py - using the cache library as a command line tool dumps the cache and all HTTP header information.
  • ccl_chrome_audit.py - a tool which can be used to scan the data-stored supported by the included libraries, plus a couple more, for records related to a host - designed as a research tool into data stored by web apps.

Python Versions

The code in this library was written and tested using Python 3.10. It should work with 3.9, but uses language features which were not present in earlier versions. Some parts of the library will probably work OK going back a few versions, but if you report bugs related to any version before 3.10, the first question will be: can you upgrade to 3.10?

A Note On Requirements

This repository contains a requirements.txt in the pip format. Other than Brotli The dependencies listed are only required for the ccl_chrome_audit.py script or when using the ccl_chromium_cache module as a script for dumping the cache; the libraries work using only the other scripts in this repository and the Python standard library.

Documentation

The documentation in the libraries is currently sparser than ideal, but some recent work has been undertaken to add more usage strings and fill in some gaps in the type-hints. We welcome pull requests to fill in gaps in the documentation.

ccl_chrome_audit

This script audits multiple data stores in a Chrom(e|ium) profile folder based on a fragment (regex) of a host name. It is designed to aid in research into web apps by quickly highlighting what data related to that domain is stored where (also of us with Electron apps etc.)

Caveats

At the moment, the script is designed primarily for use on Windows and on the host where the data was populated (this is because of the Cookie decryption being achieved using DPAPI).

Usage

ccl_chrome_audit.py <chrome profile folder> [cache folder (for mobile)]

Current Supported Data Sources

  • Bookmarks
  • History
  • Downloads (from History)
  • Downloads (from shared_proto_db)
  • Favicons
  • Cache
  • Cookies
  • Local Storage
  • Session Storage
  • IndexedDb
  • File System API
  • Platform Notifications
  • Logins
  • Sessions (SNSS)

ChromiumProfileFolder

The ChromiumProfileFolder class is intended to act as a convenient entry-point to much of the useful functionality in the package. It performs on-demand loading of data, so the "start-up cost" of using this object over the individual modules is near-zero, but with the advantage of better searching and filtering functionality built in and an easier interface to bring together data from these different sources.

In this version ChromiumProfileFolder supports the following data-stores:

  • History
  • Cache
  • IndexedDB
  • Local Storage
  • Session Storage

To use the object, simply pass the path of the profile folder into the constructor (the object supports the context manager interface):

import pathlib
from ccl_chromium_reader import ChromiumProfileFolder

profile_path = pathlib.Path("profile path goes here")

with ChromiumProfileFolder(profile_path) as profile:
    ...  # do things with the profile

Most of the methods of the ChromiumProfileFolder object which retrieve data can search/filter through a KeySearch interface which in essence is on of:

  • a str, in which case the search will try to exactly match the value
  • a collection of str (e.g., list or tuple), in which case the search will try to exactly match one of the values contained therein
  • a re.pattern in which case the search attempts to match the pattern anywhere in the search (same as re.search)
  • a function which takes a str and returns a bool indicating whether it's a match.
import re
import pathlib
from ccl_chromium_reader import ChromiumProfileFolder

profile_path = pathlib.Path("profile path goes here")

with ChromiumProfileFolder(profile_path) as profile:
    # Match one of two possible hosts exactly, then a regular expression for the key
    for ls_rec in profile.iter_local_storage(
            storage_key=["http://not-a-real-url1.com", "http://not-a-real-url2.com"], 
            script_key=re.compile(r"message\d{1,3}?-text")):
        print(ls_rec.value)
        
    # Match all urls which end with "&read=1"
    for hist_rec in profile.iterate_history_records(url=lambda x: x.endswith("&read=1")):
        print(hist_rec.title, hist_rec.url)

IndexedDB

The ccl_chromium_indexeddb.py library processes IndexedDB data found in Chrome et al.

Blog

Read a blog on the subject here: https://www.cclsolutionsgroup.com/post/indexeddb-on-chromium

Caveats

There is a fair amount of work yet to be done in terms of documentation, but the modules should be fine for pulling data out of IndexedDB, with the following caveats:

LevelDB deleted data

The LevelDB module will spit out live and deleted/old versions of records indiscriminately; it's possible to differentiate between them with some work, but that hasn't really been baked into the modules as they currently stand. So you are getting deleted data "for free" currently...whether you want it or not.

Blink data types

I am fairly satisfied that all the possible V8 object types are accounted for (but I'm happy to be shown otherwise and get that fixed of course!), but it is likely that the hosted Blink objects aren't all there yet; so if you hit upon an error coming from inside ccl_blink_value_deserializer and can point me towards test data, I'd be very thankful!

Cyclic references

It is noted in the V8 source that recursive referencing is possible in the serialization, we're not yet accounting for that so if Python throws a RecursionError that's likely what you're seeing. The plan is to use a similar approach to ccl_bplist where the collection types are subclassed and do Just In Time resolution of the items, but that isn't done yet.

Using the modules

There are two methods for accessing records - a more pythonic API using a set of wrapper objects and a raw API which doesn't mask the underlying workings. There is unlikely to be much benefit to using the raw API in most cases, so the wrapper objects are recommended unless you have a compelling reason otherwise.

Wrapper API

import sys
from ccl_chromium_reader import ccl_chromium_indexeddb

# assuming command line arguments are paths to the .leveldb and .blob folders
leveldb_folder_path = sys.argv[1]
blob_folder_path = sys.argv[2]

# open the indexedDB:
wrapper = ccl_chromium_indexeddb.WrappedIndexDB(leveldb_folder_path, blob_folder_path)

# You can check the databases present using `wrapper.database_ids`

# Databases can be accessed from the wrapper in a number of ways:
db = wrapper[2]  # accessing database using id number
db = wrapper["MyTestDatabase"]  # accessing database using name (only valid for single origin indexedDB instances)
db = wrapper["MyTestDatabase", "file__0@1"]  # accessing the database using name and origin
# NB using name and origin is likely the preferred option in most cases

# The wrapper object also supports checking for databases using `in`

# You can check for object store names using `db.object_store_names`

# Object stores can be accessed from the database in a number of ways:
obj_store = db[1]  # accessing object store using id number
obj_store = db["store"]  # accessing object store using name

# Records can then be accessed by iterating the object store in a for-loop
for record in obj_store.iterate_records():
    print(record.user_key)
    print(record.value)

    # if this record contained a FileInfo object somewhere linking
    # to data stored in the blob dir, we could access that data like
    # so (assume the "file" key in the record value is our FileInfo):
    with record.get_blob_stream(record.value["file"]) as f:
        file_data = f.read()

# By default, any errors in decoding records will bubble an exception 
# which might be painful when iterating records in a for-loop, so either
# passing True into the errors_to_stdout argument and/or by passing in an 
# error handler function to bad_deserialization_data_handler, you can 
# perform logging rather than crashing:

for record in obj_store.iterate_records(
        errors_to_stdout=True, 
        bad_deserializer_data_handler= lambda k,v: print(f"error: {k}, {v}")):
    print(record.user_key)
    print(record.value)

Raw access API

import sys
from ccl_chromium_reader import ccl_chromium_indexeddb

# assuming command line arguments are paths to the .leveldb and .blob folders
leveldb_folder_path = sys.argv[1]
blob_folder_path = sys.argv[2]

# open the database:
db = ccl_chromium_indexeddb.IndexedDb(leveldb_folder_path, blob_folder_path)

# there can be multiple databases, so we need to iterate through them (NB 
# DatabaseID objects contain additional metadata, they aren't just ints):
for db_id_meta in db.global_metadata.db_ids:
    # and within each database, there will be multiple object stores so we
    # will need to know the maximum object store number (this process will be
    # cleaned up in future releases):
    max_objstore_id = db.get_database_metadata(
            db_id_meta.dbid_no, 
            ccl_chromium_indexeddb.DatabaseMetadataType.MaximumObjectStoreId)
    
    # if the above returns None, then there are no stores in this db
    if max_objstore_id is None:
        continue

    # there may be multiple object stores, so again, we iterate through them
    # this time based on the id number. Object stores start at id 1 and the
    # max_objstore_id is inclusive:
    for obj_store_id in range(1, max_objstore_id + 1):
        # now we can ask the indexeddb wrapper for all records for this db
        # and object store:
        for record in db.iterate_records(db_id_meta.dbid_no, obj_store_id):
            print(f"key: {record.user_key}")
            print(f"key: {record.value}")

            # if this record contained a FileInfo object somewhere linking
            # to data stored in the blob dir, we could access that data like
            # so (assume the "file" key in the record value is our FileInfo):
            with record.get_blob_stream(record.value["file"]) as f:
                file_data = f.read()

Local Storage

ccl_chromium_localstorage contains functionality to read the Local Storage data from a Chromium/Chrome profile folder.

Blog

Read a blog on the subject here: https://www.cclsolutionsgroup.com/post/chromium-session-storage-and-local-storage

Using the module

An example showing how to iterate all records, grouped by host is shown below:

import sys
import pathlib
from ccl_chromium_reader import ccl_chromium_localstorage

level_db_in_dir = pathlib.Path(sys.argv[1])

# Create the LocalStoreDb object which is used to access the data
with ccl_chromium_localstorage.LocalStoreDb(level_db_in_dir) as local_storage:
    for storage_key in local_storage.iter_storage_keys():
        print(f"Getting records for {storage_key}")
      
        for record in local_storage.iter_records_for_storage_key(storage_key):
            # we can attempt to associate this record with a batch, which may
            # provide an approximate timestamp (withing 5-60 seconds) for this
            # record.
            batch = local_storage.find_batch(record.leveldb_seq_number)
            timestamp = batch.timestamp if batch else None
            print(record.leveldb_seq_number, record.script_key, record.value, sep="\t")

Session Storage

ccl_chromium_sessionstorage contains functionality to read the Session Storage data from a Chromium/Chrome profile folder.

Blog

Read a blog on the subject here: https://www.cclsolutionsgroup.com/post/chromium-session-storage-and-local-storage

Using the module

An example showing how to iterate all records, grouped by host is shown below:

import sys
import pathlib
from ccl_chromium_reader import ccl_chromium_sessionstorage

level_db_in_dir = pathlib.Path(sys.argv[1])

# Create the SessionStoreDb object which is used to access the data
with ccl_chromium_sessionstorage.SessionStoreDb(level_db_in_dir) as session_storage: 
    for host in session_storage.iter_hosts():
        print(f"Getting records for {host}")
        for record in session_storage.iter_records_for_host(host):
          print(record.leveldb_sequence_number, record.key, record.value)

Cache

ccl_chromium_cache contains functionality for reading Chromium cache data (both block file and simple cache formats). It can be used to programmatically access cache data and metadata (including http headers).

CLI

Executing the module as a script allows you to dump a cache (either format) and collate all metadata into a csv file.

USAGE: ccl_chromium_cache.py <cache input dir> <out dir>

Using the module

The main() function (which provides the CLI) in the module shows the full process of detecting the cache type, reading data and metadata from the cache.

ccl_chrome_indexeddb's People

Contributors

cclgroupltd avatar lxndrblz avatar obsidianforensics avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ccl_chrome_indexeddb's Issues

Can you write to indexeddb?

I have been able to read the indexeddb file, but I don't know how I can write to it. Any suggestions?

Thank you.

Why I am not getting all the global meta keys with type 201 (0xC9)

Sorry if this is not related to this library but I'm having problems doing the same for indexeddb in c++ and not getting all the global meta records especially those with type 201 (0xc9) which corresponds to database names and origins. However I get all the database meta records and the global meta max database id record with the correct max database id.

Do I need some special flags when opening the database ?
I tested with this python script and it lists all the databases with their names but it seems to use custom implementation of leveldb

Error 22 Invalid Argument when '.ldb' file is zero length.

Firstly thank you for the work you put in here - it is huge.

In ccl_leveldb.py > class LdbFile > self._f.seek(-LdbFile.FOOTER_SIZE, os.SEEK_END) the error FIRST occurs.

I added a check for file size here.
ccl_leveldb.py > class RawLevelDb > self._files.append(LdbFile(file))

I was hoping to use your work to extract key/value pairs from a website from the .ldb files but it seems the files are not updated in real-time.

Using Chrome Dev tools (Local Storage > desired website), .ldb last changed dates, and value of the extracted pairs (and source files thereof) from your dump_leveldb.py I can see that the key/value pairs are from several hours ago.

Your fine article (https://www.cclsolutionsgroup.com/post/hang-on-thats-not-sqlite-chrome-electron-and-leveldb) implies that Chrome updates the files in leveldb in real-time, if I understood it correctly.

So it seems Chrome keeps things in memory before writing to disk???

ValueError: Blink type tag not present

Hi,

Thanks for your efforts in developing this code! Much appreciated.

I have a problem with the code stopping during the iterate_records function within ccl_chromium_indexeddb.py.

The code stops here:

               if blink_type_tag != 0xff:
                    # TODO: probably don't want to fail hard here long term...
                    raise ValueError("Blink type tag not present")

I'm finding that for some records in my Skype IndexedDB, have a blink_type_tag of 16, causing the code to stop at the above location.

I've implemented a temporary workaround to allow the code to print a message and continue rather than stopping at this point:

               if blink_type_tag != 0xff:
                    # TODO: probably don't want to fail hard here long term...
                    #raise ValueError("Blink type tag not present")
                    print("***** Skipping record with unknown blink_type_tag: " + str(blink_type_tag) + " *****")
                    continue

After running the above modified code, I found that for my Skype IndexedDB, I get approximately 5,000 affected records with a blink_type_tag of 16, out of a total of 20,000 records.

On a potentially related matter, I'm also seeing 'ccl_v8_value_deserializer._Undefined object at 0xXXX' errors when printing out the values: (yes, this might be a direct consequence of my code modification above, but thought you should know).

print(f"key: {record.value}")

key: {'_serverMessages': [{'id': '1525335581949', 'originalarrivaltime': '2018-05-03T08:19:38.846Z', 'messagetype': 'RichText', 'version': '1525335581949', 'composetime': '2018-05-03T08:19:38.846Z', 'clientmessageid': '10187620813155729536', 'conversationLink': 'https://hk2-2-client-s.gateway.messenger.live.com/v1/users/ME/conversations/[email protected]', 'content': 'OK. I&apos;ll provide you the photos that we could do for the embossed texture. ', 'type': 'Message', 'conversationid': '[email protected]', 'from': 'https://hk2-2-client-s.gateway.messenger.live.com/v1/users/ME/contacts/8:removed'}], 'cuid': '10187620813155729536', 'conversationId': '[email protected]', 'createdTime': 1525335581949.0, 'creator': '8:removed', 'content': 'OK. I&apos;ll provide you the photos that we could do for the embossed texture. ', 'messagetype': 'RichText', 'contenttype': <ccl_v8_value_deserializer._Undefined object at 0x7eff208b68b0>, 'properties': <ccl_v8_value_deserializer._Undefined object at 0x7eff208b68b0>, '_isEphemeral': False, '_fileEncryptionKeys': <ccl_v8_value_deserializer._Undefined object at 0x7eff208b68b0>, '_countsType': 1, '_isMyMessage': 0}

Any assistance would be greatly appreciated!

Exception "invalid magic number" in Microsoft Teams Leveldb files

Hi, when I try to parse a MS Teams IndexedDB folder I run into this exception:

  File "ccl_chrome_indexeddb\ccl_leveldb.py", line 554, in __init__
  File "ccl_chrome_indexeddb\ccl_leveldb.py", line 221, in __init__
ValueError: Invalid magic number in ******\******\IndexedDB\https_teams.microsoft.com_0.indexeddb.leveldb\002818.ldb

MS Teams version too recent, maybe?

Investigate orphaned records

Keys with the (undocumented) key prefix: 00 00 00 00 32 appear to contain deleted data records sometimes.

Need to look at how the records can be recovered (when they are records, because they aren't always).

Question regarding object decoding

Hi,

Thanks for your efforts in developing this code and your blog posts! It is much appreciated.

I am using your code in one of my forensics projects for extracting conversation artefacts from an Electron-based communication platform. While the enumeration works incredible reliable, I have the impression that the record.value are not fully decoded.

My response looks like this right now:

b'!\xff\x13\xff\ro"\x0econversationId"[19:54dd27a7-fbb0-4bf0-8208-a4b31a578a3f_e62b7cec-7379-4d6f-aed7-24b48be68a74@unq.gbl.spaces"\x0fparentMessageId"\r1622368092916"\x08messageso"@4235357803446472000,8:orgid:e62b7cec-7379-4d6f-aed7-24b48be68a74o"\x0bmessagetype"\rRichText/Html"\x0bcontenttype"\x04text"\x07content"6<div>To Sherlock Holmes she is always the woman.</div>"\rrenderContent"6<div>To Sherlock Holmes she is always the woman.</div>"\x0fclientmessageid"\x134235357803446472000"\ramsreferencesA\x00$\x00\x00"\rimdisplayname"\x08Jane Doe"\npropertieso{\x00"\x02id"\r1622368092916"\x04type"\x07Message"\nsequenceIdI\x04"\x0bmessageKind"\x11skypeMessageLocal"\x0bcomposetime"\x1c2021-05-30T09:48:12.9160000Z"\x13originalarrivaltime"\x1c2021-05-30T09:48:12.9160000Z"\x11clientArrivalTime"\x182021-05-30T10:08:31.218Z"\x10conversationLink"\x9b\x01https://uk.ng.msg.teams.microsoft.com/v1/users/ME/conversations/19:54dd27a7-fbb0-4bf0-8208-a4b31a578a3f_e62b7cec-7379-4d6f-aed7-24b48be68a74@unq.gbl.spaces"\x04from"ghttps://uk.ng.msg.teams.microsoft.com/v1/users/ME/contacts/8:orgid:e62b7cec-7379-4d6f-aed7-24b48be68a74"\x06sourceI\x02"\x07idUnion"\x134235357803446472000"\x0econversationId"[19:54dd27a7-fbb0-4bf0-8208-a4b31a578a3f_e62b7cec-7379-4d6f-aed7-24b48be68a74@unq.gbl.spaces"\rversionNumberN\x00@/\xc8\xca\x9bwB"\x07version"\r1622368092916"\x13messageStorageStateI\x02"\x15isActionExecuteUpdateF"\x1d_conversationIdMessageIdUnion"i19:54dd27a7-fbb0-4bf0-8208-a4b31a578a3f_e62b7cec-7379-4d6f-aed7-24b48be68a74@unq.gbl.spaces_1622368092916"\x0fparentMessageId"\r1622368092916"\x0bcreatedTimeN\x00@/\xc8\xca\x9bwB"\x07creator",8:orgid:e62b7cec-7379-4d6f-aed7-24b48be68a74"\x0ecreatorProfileo"\x03mri",8:orgid:e62b7cec-7379-4d6f-aed7-24b48be68a74"\x11userPrincipalName"5JaneDoe_forensics.im#EXT#@Forensicsim.onmicrosoft.com"\x0bdisplayName"\x08Jane Doe"\tgivenName"\x08Jane Doe"\x08objectId"$e62b7cec-7379-4d6f-aed7-24b48be68a74"\x04type"\x06person{\x06"\x08isFromMeT"\x0euserHasStarredF"\x1creplyChainLatestDeliveryTimeN\x00@/\xc8\xca\x9bwB"\x05stateI\x04"\x11notificationLevelI\x02"\x08mentionsA\x00$\x00\x00"\nhyperLinksA\x00$\x00\x00"\x0battachmentsA\x00$\x00\x00"\x19inputExtensionAttachmentsA\x00$\x00\x00"\x15trimmedMessageContent"+To Sherlock Holmes she is always the woman."\x1bmessageContentContainsImage0"\x1bmessageContentContainsVideoF"\x0bisSanitizedT"\x1aisPlainTextConvertedToHtmlT"\x16isRichContentProcessedT" isRichMessagePropertiesProcessedT"&isRenderContentWithGiphyDisplayEnabledT"\risForceDeleteF"\x16isSfBGroupConversationF"\x11messageLayoutTypeI\x00"\x0ccallDurationI\x00"\x14callParticipantsMrisA\x00$\x00\x00"\x16cachedDeduplicationKey"?8:orgid:e62b7cec-7379-4d6f-aed7-24b48be68a744235357803446472000"\x19cachedOriginalArrivalTime"\x1c2021-05-30T09:48:12.9160000Z"\x1ccachedOriginalArrivalTimeUtcN\x00@/\xc8\xca\x9bwB"\x0e_callRecording0"\x0f_callTranscript0"\x0f_meetingObjects0"\x15callParticipantsCountI\x01"\t_pinStateo"\x08isPinnedF{\x01{;{\x01"\x0cisInsideChat"\x04true"\x12latestDeliveryTime"\x100001622368092916{\x05'

From your blog post I took away that each of the recurring tags, such as " or { give an indication for the data that follows and their data types. I am now wondering if this object encoding has already been implemented or if I did something wrong?

Currently, I am working around this issue by splitting the record's value based on the "-character and ignoring the first byte after the split. While this works in most cases, it does not seem ideal, as it fails if a record contains a nested json array.

Please let me know if you need any additional details. I would be willing to share my test leveldb, as it contains only staged entries and nothing secretive.

Snappy decompression performance

Ran a few tests and found that the snappy decompress function is a bit of a bottleneck.

Using python-snappy instead helped me a lot:

....
import snappy
...

def decompress(data: typing.BinaryIO) -> bytes:
    return snappy.uncompress(data.read())

Retrieve the ldb file of a record

Hi Alex,

Is there a clean and easy way to retrieve the .ldb/.log file of a certain record?

I currently have a code snippet that looks like this:

    extracted_values = []

    for db_info in wrapper.database_ids:
        # Skip databases without a valid dbid_no
        if db_info.dbid_no is None:
            continue

        db = wrapper[db_info.dbid_no]

        for obj_store_name in db.object_store_names:
            # Skip empty object stores
            if obj_store_name is None:
                continue
            if obj_store_name in TEAMS_DB_OBJECT_STORES or do_not_filter is False:
                obj_store = db[obj_store_name]
                records_per_object_store = 0
                for record in obj_store.iterate_records():
                    records_per_object_store += 1
                    extracted_values.append({
                        "key": record.key.raw_key,
                        "value": record.value,
                        "store": obj_store_name,
                    })
                print(
                    f"{obj_store_name} {db.name} (Records: {records_per_object_store})"
                )

In addition to the key, value and store, I'd also like to to retrieve the name of the file, where the record was found (i.e. 000114.ldb). I had tried with record.database_origin, but this only gives me something else.

In my own fork before, I had implemented a custom iterate records function, but I was wondering if there is an easier way to implement it?

    def iterate_records(self, do_not_filter=False):

        blink_deserializer = ccl_blink_value_deserializer.BlinkV8Deserializer()
        # Loop through the databases and object stores based on their ids
        for global_id in self.global_metadata.db_ids:
            # print(f"Processing database: {global_id.name}")
            if None == global_id.dbid_no:
                print(f"WARNING: Skipping database {global_id.name}")
                continue

            for object_store_id in range(1, self.database_metadata.get_meta(global_id.dbid_no,
                                                                            DatabaseMetadataType.MaximumObjectStoreId) + 1):

                datastore = self.object_store_meta.get_meta(global_id.dbid_no, object_store_id,
                                                            ObjectStoreMetadataType.StoreName)

                # print(f"\t Processing object store: {datastore}")
                records_per_object_store = 0
                if datastore in TEAMS_DB_OBJECT_STORES or do_not_filter:
                    prefix = bytes([0, global_id.dbid_no, object_store_id, 1])
                    for record in self._fetched_records:
                        if record.key.startswith(prefix):
                            records_per_object_store += 1
                            # Skip records with empty values as these cant properly decoded
                            if record.value == b'':
                                continue
                            value_version, varint_raw = ccl_chromium_indexeddb.custom_le_varint_from_bytes(record.value)
                            val_idx = len(varint_raw)
                            # read the blink envelope
                            blink_type_tag = record.value[val_idx]
                            if blink_type_tag != 0xff:
                                print("Blink type tag not present")
                            val_idx += 1

                            blink_version, varint_raw = ccl_chromium_indexeddb.custom_le_varint_from_bytes(
                                record.value[val_idx:])

                            val_idx += len(varint_raw)

                            # read the raw value of the record.
                            obj_raw = io.BytesIO(record.value[val_idx:])
                            try:
                                # Initialize deserializer and try deserialization.
                                deserializer = ccl_v8_value_deserializer.Deserializer(
                                    obj_raw, host_object_delegate=blink_deserializer.read)
                                value = deserializer.read()
                                yield {'key': record.key, 'value': value, 'origin_file': record.origin_file,
                                       'store': datastore, 'state': record.state, 'seq': record.seq}
                            except Exception as e:
                                # TODO Some proper error handling wouldn't hurt
                                continue

Thanks for your support.

ValueError: Didn't get version tag in the header

Hello,

I've been trying to retrieve the key/value information from a specific leveldb on Chrome IndexedDB and I keep getting a ValueError exception.

ValueError: Didn't get version tag in the header

This happens on the records iteration and it crashes on the following verification:

def _read_header(self) -> int:
        tag = self._read_tag()
        if tag != Constants.token_kVersion:
            raise ValueError("Didn't get version tag in the header")
        version = self._read_le_varint()[0]
        return version

Apparently my version tag value is 0x01 and it should be 0xff.

This only appears with one specific leveldb. Do you know why is the version tag with this value? Shouldn't it be the 0xff instead of 0x01

Certain Database IDs get missed

Hi Alex,

I hope you are doing fine.

During a recent forensic investigation I noticed the ccl_chromium_indexeddb.py wouldn't process and database IDs higher than 127. In real life scenarios I have seen much higher IDs, such as 544:

DatabaseId(dbid_no=543, origin='https_teams.microsoft.com_0@1', name='Teams:app-device-permissions-manager:e2737957-fab8-4d7e-94f6-9bd6af9f7158:228fbaa3-4bee-4598-9980-8fcebd19df2d')

If I understand the Google documentation correctly, the dbid_no can be 1 - 8 bytes long.

I am currently investigating a way to cleverly collect the global Metadata for all records. Please let me know, if you are already working on something yourself so we could join forces.

In particular, I am taking about these lines of codes:
https://github.com/cclgroupltd/ccl_chrome_indexeddb/blob/c3fcb3876b9aadf375536cbdf437c94df357d276/ccl_chromium_indexeddb.py#L378-L407

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.