Giter Site home page Giter Site logo

Comments (15)

peter-wangxu avatar peter-wangxu commented on July 28, 2024

Looks like error occurred when writing the middle of the file instead of the tail of file.

Will look into it

from persist-queue.

EverWinter23 avatar EverWinter23 commented on July 28, 2024

Hey, @peter-wangxu the way it works right now, is that you write data to a temporary file and when data has been successfully written, rename the file to the correct destination file for atomic-write. Or am I wrong?

from persist-queue.

grantjenks avatar grantjenks commented on July 28, 2024

Do you have a reliable way to reproduce the hard reboots? I would be interested in adding this testing to my Disk Cache project. Disk Cache uses SQLite for Deque (queue-like) and Index (dict-like) objects which are persistent, thread-safe and process-safe.

from persist-queue.

peter-wangxu avatar peter-wangxu commented on July 28, 2024

@dugdmitry can you please sent me the output of ls -l under the queue dir?
also please share with me the info file content?

from persist-queue.

peter-wangxu avatar peter-wangxu commented on July 28, 2024

@EverWinter23 Yes and No, YES I did atomic writing for info file, but NO for the data queue file.
by design, if data is corrupted in queue file, the queue will try to remove the last unfinished write.

from persist-queue.

dugdmitry avatar dugdmitry commented on July 28, 2024

@peter-wangxu here is an output of ls -l

total 3408
-rw------- 1 root root 120 Jul 2 12:46 info
-rw-rw-rw- 1 root root 70245 Jul 1 23:12 q01932
-rw-rw-rw- 1 root root 70712 Jul 1 23:30 q01933
-rw-rw-rw- 1 root root 70707 Jul 1 23:49 q01934
-rw-rw-rw- 1 root root 70806 Jul 2 00:07 q01935
-rw-rw-rw- 1 root root 70808 Jul 2 00:25 q01936
-rw-rw-rw- 1 root root 70815 Jul 2 00:44 q01937
-rw-rw-rw- 1 root root 70811 Jul 2 01:03 q01938
-rw-rw-rw- 1 root root 70808 Jul 2 01:21 q01939
-rw-rw-rw- 1 root root 70809 Jul 2 01:39 q01940
-rw-rw-rw- 1 root root 70809 Jul 2 01:58 q01941
-rw-rw-rw- 1 root root 70807 Jul 2 02:16 q01942
-rw-rw-rw- 1 root root 70807 Jul 2 02:35 q01943
-rw-rw-rw- 1 root root 70814 Jul 2 02:53 q01944
-rw-rw-rw- 1 root root 70808 Jul 2 03:12 q01945
-rw-rw-rw- 1 root root 70808 Jul 2 03:30 q01946
-rw-rw-rw- 1 root root 70807 Jul 2 03:49 q01947
-rw-rw-rw- 1 root root 70811 Jul 2 04:07 q01948
-rw-rw-rw- 1 root root 70908 Jul 2 04:26 q01949
-rw-rw-rw- 1 root root 70258 Jul 2 04:44 q01950
-rw-rw-rw- 1 root root 71008 Jul 2 05:03 q01951
-rw-rw-rw- 1 root root 70930 Jul 2 05:21 q01952
-rw-rw-rw- 1 root root 70908 Jul 2 05:40 q01953
-rw-rw-rw- 1 root root 71002 Jul 2 05:58 q01954
-rw-rw-rw- 1 root root 71100 Jul 2 06:17 q01955
-rw-rw-rw- 1 root root 71179 Jul 2 06:35 q01956
-rw-rw-rw- 1 root root 71209 Jul 2 06:54 q01957
-rw-rw-rw- 1 root root 71207 Jul 2 07:12 q01958
-rw-rw-rw- 1 root root 71207 Jul 2 07:31 q01959
-rw-rw-rw- 1 root root 70650 Jul 2 07:51 q01960
-rw-rw-rw- 1 root root 70567 Jul 2 08:11 q01961
-rw-rw-rw- 1 root root 71217 Jul 2 08:31 q01962
-rw-rw-rw- 1 root root 71308 Jul 2 08:49 q01963
-rw-rw-rw- 1 root root 71307 Jul 2 09:08 q01964
-rw-rw-rw- 1 root root 71281 Jul 2 09:26 q01965
-rw-rw-rw- 1 root root 71199 Jul 2 09:45 q01966
-rw-rw-rw- 1 root root 71107 Jul 2 10:03 q01967
-rw-rw-rw- 1 root root 71108 Jul 2 10:22 q01968
-rw-rw-rw- 1 root root 71109 Jul 2 10:40 q01969
-rw-rw-rw- 1 root root 71157 Jul 2 10:58 q01970
-rw-rw-rw- 1 root root 71207 Jul 2 11:17 q01971
-rw-rw-rw- 1 root root 71208 Jul 2 11:36 q01972
-rw-rw-rw- 1 root root 71207 Jul 2 11:54 q01973
-rw-rw-rw- 1 root root 71207 Jul 2 12:13 q01974
-rw-rw-rw- 1 root root 71207 Jul 2 12:31 q01975
-rw-rw-rw- 1 root root 56254 Jul 2 12:46 q01976

And here is content of info file:

cat info:

(dp0
S'tail'
p1
(lp2
I1932
aI6
aL4254L
asS'head'
p3
(lp4
I1976
aI79
aL56254L
asS'chunksize'
p5
I100
sS'size'
p6
I4473
s.

Also want to notice, that the corrupted file in this case is q01932, which supposedly is not the 'tail' of the file queue.

from persist-queue.

peter-wangxu avatar peter-wangxu commented on July 28, 2024

@dugdmitry The tailf file should not be corrupted by any reboot, since the queue only read (opened by rb) from it when calling get, is it possible that the file system itself corrupted the file in any way? for example the the file system cache?

from persist-queue.

dugdmitry avatar dugdmitry commented on July 28, 2024

@peter-wangxu We're using ext4 filesystem, without disk cache:

cat /etc/fstab
UUID=e9870704-b9e8-417c-a6c3-d8fd280d0307 / ext4 noatime,errors=remount-ro 0 1
debugfs /sys/kernel/debug debugfs defaults 0 0

The persistent queue file is located under /var directory, which is in /.
Maybe it has something to do with ext4 filesystem, which uses journalling? I'm not a specialist in Linux filesystems, so it's just a guess.

Also, one more thing. Instead of file-based Queue, I tried to use the SQLite-based queue FIFOSQLiteQueue.
After hard reset, the following exception appeared:

message = PERSISTENT_QUEUE.get()
File "build/bdist.linux-armv7l/egg/persistqueue/sqlqueue.py", line 84, in get
pickled = self._pop()
File "build/bdist.linux-armv7l/egg/persistqueue/sqlqueue.py", line 61, in _pop
row = self._select()
File "build/bdist.linux-armv7l/egg/persistqueue/sqlbase.py", line 146, in _select
return self._getter.execute(self._sql_select, args).fetchone()
OperationalError: Could not decode to UTF-8 column 'data' with text '��(module.objects
Object
q'

It seems that data.db database was filled with some corrupted data during the hard reset.
Anyway, I managed to avoid this exception by manually adding sqlite_conn.text_factory = str, as described here:
https://stackoverflow.com/questions/22751363/sqlite3-operationalerror-could-not-decode-to-utf-8-column

Maybe it makes sense to wrap the text_factory attribute into a dedicated method in your library?
Also, what do you thing about ext4 ? Might it be the cause of the file corruption?
I'm thinking that I might be able to set up a number of Linux machines to make tests with different Queues and filesystem types.

from persist-queue.

dugdmitry avatar dugdmitry commented on July 28, 2024

@grantjenks regarding the hard reboots reproduction. I will try to set up a test-bed, based on our project.
Most probably, the test-bed will be based on BeagleBone Linux SoC (https://en.wikipedia.org/wiki/BeagleBoard) with debian 7, 3.8.13 linux kernel and ext4 filesystem.
The reboots will be reproduced by either triggering the reset pin, or manually switching off the input power.
I could try to test your library as well.

from persist-queue.

peter-wangxu avatar peter-wangxu commented on July 28, 2024

@dugdmitry thanks for the feed back, what's your python version and sqlite3 version?

from persist-queue.

dugdmitry avatar dugdmitry commented on July 28, 2024

@peter-wangxu
Here is python version we use:

python
Python 2.7.3 (default, Jun 21 2016, 21:00:47)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

sqllite3 version:

import sqlite3
sqlite3.version
'2.6.0'

from persist-queue.

grantjenks avatar grantjenks commented on July 28, 2024

@dugdmitry I would love some testing on Beagle boards. Try replacing the persistent queue with a diskcache.Deque object: http://www.grantjenks.com/docs/diskcache/tutorial.html#deque The API is the same as collections.deque in the standard library.

from persist-queue.

peter-wangxu avatar peter-wangxu commented on July 28, 2024

@dugdmitry can you please share the mount option for the the ext4 file system.

from persist-queue.

peter-wangxu avatar peter-wangxu commented on July 28, 2024

@dugdmitry this should be an issue regarding the underlying filesystem( for example you mounted via data=journal)

I am thinking about adding a fsync at some point around writing.

Thanks
Peter

from persist-queue.

peter-wangxu avatar peter-wangxu commented on July 28, 2024

@dugdmitry This should be fixed by GH-65, if the issue still exists, just reopen it.

from persist-queue.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.