Giter Site home page Giter Site logo

I encountered several errors while uploading to the PIKPAK cloud drive (investigated for over a week, identified the issues, hoping to assist in fixing). about rclone HOT 27 OPEN

cj-neo avatar cj-neo commented on May 26, 2024
I encountered several errors while uploading to the PIKPAK cloud drive (investigated for over a week, identified the issues, hoping to assist in fixing).

from rclone.

Comments (27)

wiserain avatar wiserain commented on May 26, 2024 1

Thanks for the test. I will fix it soon.

from rclone.

wiserain avatar wiserain commented on May 26, 2024 1

Can you try with v1.67.0-beta.7930.223ba8626.fix-7787-pikpak-upload-conflict?

We've made some adjustments in this version to make uploads more reliable.

Checking upload status

By implementing getTask() API call and using task_id assigned for each upload, we ensure uploads are completed successfully on the server-side.

Cancel uploads

If an upload encounters an error, we cancel it to remove residual files (hidden or unreadable).

Force sleep and min sleep

Based on experiment using 1000 small files as @cj-neo described and following commands to upload and check them,

rclone copy ./test test:test --transfers=16 --log-file=test.log --log-level=DEBUG
rclone check ./test test:test --download --log-file=test.log --log-level=DEBUG

the number of first three retries of low level retry for 1000 files are

1 2 3
826 59 28
728 135 102
662 81 63
592 85 65
517 44 34
429 68 59

meaning that 5% (~50 of 1000) files require at least three retries which is +150 ms. Therefore, introducing a forced delay (or sleep) after uploading can be a reasonable approach to ensure server-side updates take effect. Note that this doesn't impact total execution time much.

Moreover, minimum sleep for pacer is increased from 10 to 100ms to resolve following server error.

http2: server sent GOAWAY and closed the connection; LastStreamID=1999, ErrCode=NO_ERROR, debug=""

Stop using uploadByForm()

We stop using uploadByForm() cause it is not as reliable as uploadByResumable() for a large number of small files.

from rclone.

ncw avatar ncw commented on May 26, 2024

If you can work out why these problems are happening, preferably with an rclone log showing the problem we can work on fixing them.

Ideally we'd have a reliable way of reproducing the problem too. With that most bugs become easy to fix.

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

If you can work out why these problems are happening, preferably with an rclone log showing the problem we can work on fixing them.

Ideally we'd have a reliable way of reproducing the problem too. With that most bugs become easy to fix.

Thank you for your response. I actually wanted to include the log files, but there are a few issues:

1.Due to the large number of uploaded files, the log files are too extensive to include. There are no obvious error messages, and I'm not sure how to extract them.
2.There are concerns about personal privacy, and I prefer not to disclose what I have uploaded to others.

However, it's not a problem. Later, I'll try uploading some unrelated items to see if I can reproduce these errors. Then, I'll include the log files. Please feel free to attend to other matters in the meantime, as I've already written a repair program myself, and it's not urgent for me.

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

I generated a thousand test text files using the following Python

from faker import Faker
import os
import random

fake = Faker()

folder_path = "test_folder"
os.makedirs(folder_path, exist_ok=True)

num_files = 1000
for i in range(num_files):
    file_name = f"file_{i}.txt"
    file_path = os.path.join(folder_path, file_name)
    with open(file_path, "w") as file:
        file.write(fake.text())

I generated log files using the following command.

rclone copy --transfers=16  --drive-chunk-size=64M --log-file=rclone_log.txt --log-level=DEBUG ./test_folder p11:/test_folder

log files here

rclone_log.txt

This time, I found 28 files that couldn't be read properly, along with many additional main files with names ending in (1). Unfortunately, I didn't find the hidden files mentioned before (even after retrying uploads several times, perhaps the sample size is still too small or it may be related to file size). If you're interested, you can also run the Python program I mentioned above to generate more test files. Perhaps you'll encounter issues with the incorrect status or unreadable files. It's possible that fixing the issue with unreadable files could also resolve the status error problem!

Here, you can see many files ending with (1), which shouldn't exist. Moreover, some original files were successfully uploaded but still generated another file with (1) at the end. What's even stranger is that many files with (1) at the end are still unreadable corrupted files.

In addition to these 28 files that couldn't be read, there are even more files without original files, only derivative files ending in (1), additionally, some files were directly uploaded to the recycle bin.

Below are the 28 corrupted files that couldn't be read:

/test_folder/file_945.txt
/test_folder/file_962.txt
/test_folder/file_995.txt
/test_folder/file_593(1).txt
/test_folder/file_642(1).txt
/test_folder/file_666(1).txt
/test_folder/file_667(1).txt
/test_folder/file_673(1).txt
/test_folder/file_683(1).txt
/test_folder/file_686(1).txt
/test_folder/file_70(1).txt
/test_folder/file_744(1).txt
/test_folder/file_76(1).txt
/test_folder/file_762(1).txt
/test_folder/file_768(1).txt
/test_folder/file_786(1).txt
/test_folder/file_822(1).txt
/test_folder/file_824(1).txt
/test_folder/file_827(1).txt
/test_folder/file_838(1).txt
/test_folder/file_84(1).txt
/test_folder/file_841(1).txt
/test_folder/file_847(1).txt
/test_folder/file_850(1).txt
/test_folder/file_891(1).txt
/test_folder/file_896(1).txt
/test_folder/file_908(1).txt
/test_folder/file_922(1).txt

If you need more information or assistance, please let me know. Thank you very much for your help.

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

To provide more observation data, I re-uploaded three thousand test files. Unfortunately, the status error issue still did not occur.

I suspect that the files hidden due to status errors mentioned earlier are likely related to the file size or whether they are media files (because I know some platforms convert media files, and this process may fail, leading to incomplete status). In the past, I often encountered status errors when uploading large files, but this time, despite uploading thousands of small files, I haven't encountered any, which is very abnormal.

Rclone log file
rclone_log.txt

In total there are:
10 files that cannot be read.
13 duplicate files (excluding originals)
40 files that were uploaded directly to the trash bin after uploading.

The following files cannot be read
file_647.txt
file_665.txt
file_703.txt
file_733.txt
file_786.txt
file_826.txt
file_828.txt
file_834.txt
file_863.txt
file_864.txt

The following files are duplicated.
file_61(1).txt
file_697(1).txt
file_717(1).txt
file_735(1).txt
file_775(1).txt
file_787(1).txt
file_802(1).txt
file_818(1).txt
file_840(1).txt
file_844(1).txt
file_860(1).txt
file_927(1).txt
file_991(1).txt

The following files were uploaded directly to the trash/bin.
file_1433.txt
file_1641.txt
file_175.txt
file_1752.txt
file_1755.txt
file_1813.txt
file_1859.txt
file_221.txt
file_2364.txt
file_2502.txt
file_2522.txt
file_2627.txt
file_2702.txt
file_2709.txt
file_2727.txt
file_2777.txt
file_2831.txt
file_2897.txt
file_2921.txt
file_2925.txt
file_2926.txt
file_2931.txt
file_317.txt
file_328.txt
file_381.txt
file_399.txt
file_465.txt
file_502.txt
file_533.txt
file_532.txt
file_647.txt
file_665.txt
file_703.txt
file_733.txt
file_786.txt
file_828.txt
file_826.txt
file_834.txt
file_863.txt
file_864.txt

If you need more information or assistance, please let me know. Thank you very much for your help.

from rclone.

wiserain avatar wiserain commented on May 26, 2024

I have started looking into this. Thanks for the detailed report.

from rclone.

wiserain avatar wiserain commented on May 26, 2024

Did you successfully handle those invalid files? In my drive, there are two types of invalids:

  1. Unreadable files
  2. Hidden files

For unreadable files, it is actually readable after manually untrash(making "trahsed": false) them. I haven't tested all but at least one of them is. I couldn't delete these invalids from server.

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

Did you successfully handle those invalid files? In my drive, there are two types of invalids:

  1. Unreadable files
  2. Hidden files

For unreadable files, it is actually readable after manually untrash(making "trahsed": false) them. I haven't tested all but at least one of them is. I couldn't delete these invalids from server.

First of all, thank you for your help. Regarding what you mentioned about unreadable files, they are probably the files I mentioned before that were directly uploaded to the trash, rather than being truly unreadable. Truly unreadable files can only be deleted and cannot be repaired! However, often when unreadable files are generated, duplicate files are also generated at the same time (I speculate this may be due to rclone's inability to write during retries). I simply replace the original file with the duplicate file. If there are no duplicate files, I can only re-upload through rclone.

Let me summarize the issues I've encountered:

1.Files may be directly uploaded to the trash, as you mentioned, and the only difference between them and normal files is a file parameter difference, trashed:true.

2.Unreadable files, these files can also be filtered by parameters to search for, first by platform: Upload and there is no task_id. Normally, files uploaded through the platform will be marked as Upload and will always have a task_id.

3.Hidden files, can also be found through the file parameter phase: PHASE_TYPE_PENDING.

4.Duplicate files, I directly use the hash code to search, but may also need to compare files, because sometimes there are duplicate files in the same folder with different filenames, which may need to be handled specially.

These are roughly the situations I've encountered. If there's anything else you need help with, please let me know. Gratitude.

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

Additional notes:

  1. The probability of generating unreadable files on my end is quite high. Sometimes it's also noticeable in the official software, as normal media files have icons, but unreadable files cannot display icons and clicking on them does not play anything. It feels like the files were not fully uploaded, and they may only be marked with a task_id after a complete upload.
  2. Hidden files: I used to see them frequently, but they have been rare lately. However, it's good that you've found them.
  3. Duplicate files can actually be directly searched using the official software search function by entering (1) to query. Unless the original filename already has (1) at the end, it should be a duplicate file.

In fact, many of these issues were discovered through comparing file parameters, and I'm not sure of the exact cause of the problems. These are just personal observations, so they may not always be accurate. If there's anything else you need help with, please let me know.

from rclone.

wiserain avatar wiserain commented on May 26, 2024

Can you please try using v1.67.0-beta.7928.2e582d73f.fix-7787-pikpak-upload-conflict?

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

Can you please try using v1.67.0-beta.7928.2e582d73f.fix-7787-pikpak-upload-conflict?

Sure, I'll give it a try later, but it might take some time. Thank you for your help.

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

I used the same approach to upload one thousand files for testing before, but shortly after starting, I encountered the error:

panic: runtime error: invalid memory address or nil pointer dereference

Then it was forcibly terminated.

However, when I used rclone version 1.66 or 1.65 with the same command to upload the same files, this error did not occur.

Command:

rclone copy --transfers=16 --drive-chunk-size=64M --log-file=rclone_log.txt --log-level=DEBUG ./test_folder pr3:/test_folder

Log file:
rclone_log.txt

my operating system is Win10. I haven't tested on other platforms yet, so I'm not sure if the same error will occur.

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

Thank you for putting in so much effort to test and fix the issues. I've just uploaded a thousand files on my WIN10 system without encountering any errors that occurred before. Over the next couple of days, I'll be testing the upload of more types of files. I'll report back here whether or not any issues are found.

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

Hello,

Today, I encountered two more issues during testing:

  1. I found that some specific files cannot be uploaded successfully using rclone. I tried repeatedly to upload them dozens of times, but they always ended up in the same state where they couldn't be read.(The upload shows as successful and without errors, but cannot be read. The file properties are missing the task_id.) However, uploading them once using the official website was successful. I tested versions 1.66 and the modified 1.67 beta version you provided, and encountered the same problem. However, this issue occurs very rarely, approximately only once in twenty thousand files.

Below are two files that cause problems on my end. Could you please test them?

https://drive.google.com/drive/folders/1pvHTbi0uEuI8OmEWvF6TKZtJPG_fTydX?usp=drive_link

  1. Additionally, I noticed a strange phenomenon where rclone uploads some files repeatedly. After executing an upload command and ending its execution, if I run the same command again, it uploads the same file once more. Regardless of how many times I execute the command, some files are repeatedly uploaded. When I checked the folder, I found that the file names had "(1)" appended to them, but I couldn't see the original file names, nor could I remove "(1)". It seems that they are hidden files, but my previously written program cannot detect them. Therefore, there might be other reasons causing the files to be hidden. I'm still unsure of the cause of this problem and will need to observe and test further.

Actually, there's more to it. A few days ago, something strange happened. I uploaded a folder named "abc", but instead of just getting one folder named "abc", I ended up with two folders: one named "abc" and another with random characters appended to it. The original contents were duplicated into both of these folders.

Initially, when I uploaded the folder, I didn't explicitly indicate that it was a folder, for example: "c:\abc". Now, I've changed it to "c:\abc\" and it seems that I haven't encountered this strange issue again.

However, I remember encountering similar issues when uploading to Google Drive and OneDrive before, but they were very rare and difficult to reproduce. Just thought I'd mention it since it happened again here.

Please take a look first at why some files cannot be uploaded successfully.
As for other issues, I'll let you know once If i have clear findings.

Thank you very much for your generous assistance.

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

Sorry, the second issue I mentioned about the repeated uploading of files and hidden files is the same issue as before. It's due to the insufficiency of my self-developed detection program, which failed to detect it.

Because I overlooked the official restriction on reading a maximum of five hundred files or folders at a time, the excess portion was left unprocessed.

Please focus on fixing the specific files that cannot be readed. THX

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

For testing purposes, I found another video file that cannot be uploaded using rclone, similar to the two image files I mentioned earlier. After uploading, the file cannot be read because I noticed that there was "no upload process" at all, meaning the file was not actually uploaded.

This video file is nearly 900MB in size, but the transfer process completes instantly without transferring any data, and there are no error messages. This is very strange.

https://drive.google.com/drive/folders/1pvHTbi0uEuI8OmEWvF6TKZtJPG_fTydX?usp=sharing

test in win10

C:\Users\NEO\Desktop\test>rclone copy -vv 3.wmv pr9:
2024/05/15 21:41:22 DEBUG : rclone: Version "v1.67.0-beta.7930.223ba8626.fix-7787-pikpak-upload-conflict" starting with parameters ["rclone" "copy" "-vv" "3.wmv" "pr9:"]
2024/05/15 21:41:22 DEBUG : Creating backend with remote "3.wmv"
2024/05/15 21:41:22 DEBUG : Using config file from "C:\\Users\\NEO\\Desktop\\test\\rclone.conf"
2024/05/15 21:41:22 DEBUG : fs cache: adding new entry for parent of "3.wmv", "//?/C:/Users/NEO/Desktop/test"
2024/05/15 21:41:22 DEBUG : Creating backend with remote "pr9:"
2024/05/15 21:41:22 DEBUG : 3.wmv: Need to transfer - File not found at Destination
2024/05/15 21:41:25 DEBUG : 3.wmv: Dst hash empty - aborting Src hash check
2024/05/15 21:41:25 INFO  : 3.wmv: Copied (new)
2024/05/15 21:41:25 INFO  :
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         3.1s

2024/05/15 21:41:25 DEBUG : 4 go routines active

test in ubuntu 22.04

neo@arm:~/test$ ./rclone copy -vv 3.wmv pr9:
2024/05/15 22:05:40 DEBUG : rclone: Version "v1.67.0-beta.7930.223ba8626.fix-7787-pikpak-upload-conflict" starting with parameters ["./rclone" "copy" "-vv" "3.wmv" "pr9:"]
2024/05/15 22:05:40 DEBUG : Creating backend with remote "3.wmv"
2024/05/15 22:05:40 DEBUG : Using config file from "/home/neo/test/rclone.conf"
2024/05/15 22:05:40 DEBUG : fs cache: adding new entry for parent of "3.wmv", "/home/neo/test"
2024/05/15 22:05:40 DEBUG : Creating backend with remote "pr9:"
2024/05/15 22:05:40 DEBUG : 3.wmv: Need to transfer - File not found at Destination
2024/05/15 22:05:42 DEBUG : 3.wmv: Dst hash empty - aborting Src hash check
2024/05/15 22:05:42 INFO  : 3.wmv: Copied (new)
2024/05/15 22:05:42 INFO  :
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         2.4s

2024/05/15 22:05:42 DEBUG : 7 go routines active

from rclone.

wiserain avatar wiserain commented on May 26, 2024

Pikpak skips uploading traffic in two cases:

  • zero-byte files
  • rapid/fast uploads: when a file already exists in Pikpak storage based on a hash, Pikpak can reference the existing copy instead of re-uploading it.

However, our current implementation uses an incorrect hash, which points to the wrong file. There might be a discrepancy between the hash and the referenced file. See #7838 Let's revisit "no upload process" problem once the hash issue is resolved.

Btw, have you noticed any changes due to increased minimum sleep and a forced sleep? Does it affect you in any way?

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

Thank you for your response. I've previously noticed the peculiarity of 0-byte files, so I skip them when checking for "unreadable files." Regarding hash checks, as far as I understand, most cloud storage providers perform checks and remove duplicates after the files are uploaded. This is done to maintain consistency in upload processing and user experience. For example, when uploading via a web browser, the hash code cannot be known in advance, and pikpak also provides a web-based upload method. Therefore, for problematic files, we may need to use different upload methods and conduct more checks.

Additionally, the increased minimum sleep and forced sleep have minimal impact on me. Our previous tests involved a large number of small files, but in general, file sizes are not usually so small. Most of the time is spent on file transmission, so the impact will be even less. repeatedly checking for file errors may require more time, and I agree that this investment is worthwhile.

I appreciate your continued assistance in modifying the program. Looking forward to your updates.

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

By the way, I wanted to report an issue I encountered a few days ago while uploading a large file. The file is 134GB, but I encountered the following error before even reaching 100GB:

2024/05/08 20:56:55 ERROR : /Marvels.Avengers.zip: Failed to copy: failed to upload: MultipartUpload: upload multipart failed
        upload id: C278C4F1FAD2486782772961F3CA663A
caused by: TotalPartsExceeded: exceeded total allowed configured MaxUploadParts (10000). Adjust PartSize to fit in this limit

It mentions that adjusting the PartSize parameter can allow the upload. Is the corresponding parameter max-upload-parts? However, I couldn't find this parameter in Pikpak. Currently, I only see two specific parameters for cloud storage services: --oos-max-upload-parts and --s3-max-upload-parts, but neither seems applicable to Pikpak.

I'm not sure what the maximum file size limit for Pikpak is, but currently, uploading large files to Pikpak using rclone seems to be problematic.

If possible, I think rclone could handle large file uploads better. For instance, it could first check if the file size fits within the default PartSize before starting the upload. If it doesn't fit, the upload shouldn't start because uploading large files takes a lot of time. Encountering a PartSize error halfway through, requiring a settings adjustment and re-upload, wastes all the previous time.

Additionally, maybe rclone could automatically adjust the PartSize based on the file size, completing the upload without any errors. I think it's challenging for the average user to know how large the file needs to be before adjusting the PartSize for a successful upload, Of course, these are just my humble suggestions and might be a bit demanding. I'm just offering some possible directions for improvement.

Thank you again for your help.

from rclone.

wiserain avatar wiserain commented on May 26, 2024

By the way, I wanted to report an issue I encountered a few days ago while uploading a large file. The file is 134GB, but I encountered the following error before even reaching 100GB:

I am aware of this issue. Including user-configurable upload part size, it will be fixed soon. Would you please open a separate issue for this?

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

By the way, I wanted to report an issue I encountered a few days ago while uploading a large file. The file is 134GB, but I encountered the following error before even reaching 100GB:

I am aware of this issue. Including user-configurable upload part size, it will be fixed soon. Would you please open a separate issue for this?

OK #7850

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

After these few days of testing, unfortunately, most of the previously mentioned issues still exist:

  1. Files cannot be read after uploading.
  2. Status errors causing hidden files.
  3. Duplicate files.

These problems all still persist, with the only issue resolved being that files are no longer being directly uploaded to the trash.

Additionally, because I need to monitor the progress while uploading, it is difficult to provide DEBUG messages. On the other hand, I also do not want to publicly disclose what I have uploaded. However, I believe that if you also upload a large amount of files of various types and sizes like I do, you will still be able to see these issues.

from rclone.

wiserain avatar wiserain commented on May 26, 2024

Problematic files are still small ones? Or different this time? What are you using for --transfers? What if you reduce the value if it is too high?

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

Not all of them are small files. As mentioned before, I tested uploading a thousand small files without any issues, and initially, I thought the problem was resolved.

However, I later continued with the original upload tasks. I am transferring a large amount of data previously stored on Google Drive to PIKPAK, and the file sizes and types vary.

The problems are exactly the same as at the beginning, including the issue where files cannot be uploaded using RCLONE. There are two differences:

1.Deleting the problematic files and re-uploading them has a high chance of success.
2.The other type of issue is that even deleting and re-uploading won't work, there is no transfer process at all.

As for the current --transfers parameter, I have set it to 8 and the problem still occurs. If set lower, the file transfer speed becomes much slower. However, if needed, I can help test with a lower setting.

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

By the way, if you need to test uploading a large amount of data, you can use Google Colab. It doesn't use up your server or VPS bandwidth and is very convenient. The only limitation is that it disconnects after a period of inactivity, but this doesn't significantly affect RCLONE.

from rclone.

cj-neo avatar cj-neo commented on May 26, 2024

I am currently using a detection program I previously wrote in Python. It logs problematic files and deletes them. After the detection is complete, it calls RCLONE to re-upload those files, then re-runs the detection... repeating this process several times until only the files that cannot be uploaded with RCLONE remain, which are then uploaded manually.

I have noticed that issues tend to be continuous. If no errors occur, they tend to stay that way, but once an error appears, it often happens in the same directory, and even several consecutive files can fail. Therefore, I believe these issues are likely due to network or server overload.

Since a lot of issues have accumulated, I am concerned about wasting too much of your time. We already know which files have abnormal attributes, I am wondering if, you could consider a temporary workaround. Specifically, we could recheck the files once more before RCLONE finishes the transfer, similar to my initial approach, but handled internally rather than by an external program. Later, when more time is available, we can address the root cause of the problem. Please evaluate this suggestion.

from rclone.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.