Comments (27)
Thanks for the test. I will fix it soon.
from rclone.
Can you try with v1.67.0-beta.7930.223ba8626.fix-7787-pikpak-upload-conflict?
We've made some adjustments in this version to make uploads more reliable.
Checking upload status
By implementing getTask()
API call and using task_id
assigned for each upload, we ensure uploads are completed successfully on the server-side.
Cancel uploads
If an upload encounters an error, we cancel it to remove residual files (hidden or unreadable).
Force sleep and min sleep
Based on experiment using 1000 small files as @cj-neo described and following commands to upload and check them,
rclone copy ./test test:test --transfers=16 --log-file=test.log --log-level=DEBUG
rclone check ./test test:test --download --log-file=test.log --log-level=DEBUG
the number of first three retries of low level retry for 1000 files are
1 | 2 | 3 |
---|---|---|
826 | 59 | 28 |
728 | 135 | 102 |
662 | 81 | 63 |
592 | 85 | 65 |
517 | 44 | 34 |
429 | 68 | 59 |
meaning that 5% (~50 of 1000) files require at least three retries which is +150 ms. Therefore, introducing a forced delay (or sleep) after uploading can be a reasonable approach to ensure server-side updates take effect. Note that this doesn't impact total execution time much.
Moreover, minimum sleep for pacer is increased from 10 to 100ms to resolve following server error.
http2: server sent GOAWAY and closed the connection; LastStreamID=1999, ErrCode=NO_ERROR, debug=""
Stop using uploadByForm()
We stop using uploadByForm()
cause it is not as reliable as uploadByResumable()
for a large number of small files.
from rclone.
If you can work out why these problems are happening, preferably with an rclone log showing the problem we can work on fixing them.
Ideally we'd have a reliable way of reproducing the problem too. With that most bugs become easy to fix.
from rclone.
If you can work out why these problems are happening, preferably with an rclone log showing the problem we can work on fixing them.
Ideally we'd have a reliable way of reproducing the problem too. With that most bugs become easy to fix.
Thank you for your response. I actually wanted to include the log files, but there are a few issues:
1.Due to the large number of uploaded files, the log files are too extensive to include. There are no obvious error messages, and I'm not sure how to extract them.
2.There are concerns about personal privacy, and I prefer not to disclose what I have uploaded to others.
However, it's not a problem. Later, I'll try uploading some unrelated items to see if I can reproduce these errors. Then, I'll include the log files. Please feel free to attend to other matters in the meantime, as I've already written a repair program myself, and it's not urgent for me.
from rclone.
I generated a thousand test text files using the following Python
from faker import Faker
import os
import random
fake = Faker()
folder_path = "test_folder"
os.makedirs(folder_path, exist_ok=True)
num_files = 1000
for i in range(num_files):
file_name = f"file_{i}.txt"
file_path = os.path.join(folder_path, file_name)
with open(file_path, "w") as file:
file.write(fake.text())
I generated log files using the following command.
rclone copy --transfers=16 --drive-chunk-size=64M --log-file=rclone_log.txt --log-level=DEBUG ./test_folder p11:/test_folder
log files here
This time, I found 28 files that couldn't be read properly, along with many additional main files with names ending in (1). Unfortunately, I didn't find the hidden files mentioned before (even after retrying uploads several times, perhaps the sample size is still too small or it may be related to file size). If you're interested, you can also run the Python program I mentioned above to generate more test files. Perhaps you'll encounter issues with the incorrect status or unreadable files. It's possible that fixing the issue with unreadable files could also resolve the status error problem!
Here, you can see many files ending with (1), which shouldn't exist. Moreover, some original files were successfully uploaded but still generated another file with (1) at the end. What's even stranger is that many files with (1) at the end are still unreadable corrupted files.
In addition to these 28 files that couldn't be read, there are even more files without original files, only derivative files ending in (1), additionally, some files were directly uploaded to the recycle bin.
Below are the 28 corrupted files that couldn't be read:
/test_folder/file_945.txt
/test_folder/file_962.txt
/test_folder/file_995.txt
/test_folder/file_593(1).txt
/test_folder/file_642(1).txt
/test_folder/file_666(1).txt
/test_folder/file_667(1).txt
/test_folder/file_673(1).txt
/test_folder/file_683(1).txt
/test_folder/file_686(1).txt
/test_folder/file_70(1).txt
/test_folder/file_744(1).txt
/test_folder/file_76(1).txt
/test_folder/file_762(1).txt
/test_folder/file_768(1).txt
/test_folder/file_786(1).txt
/test_folder/file_822(1).txt
/test_folder/file_824(1).txt
/test_folder/file_827(1).txt
/test_folder/file_838(1).txt
/test_folder/file_84(1).txt
/test_folder/file_841(1).txt
/test_folder/file_847(1).txt
/test_folder/file_850(1).txt
/test_folder/file_891(1).txt
/test_folder/file_896(1).txt
/test_folder/file_908(1).txt
/test_folder/file_922(1).txt
If you need more information or assistance, please let me know. Thank you very much for your help.
from rclone.
To provide more observation data, I re-uploaded three thousand test files. Unfortunately, the status error issue still did not occur.
I suspect that the files hidden due to status errors mentioned earlier are likely related to the file size or whether they are media files (because I know some platforms convert media files, and this process may fail, leading to incomplete status). In the past, I often encountered status errors when uploading large files, but this time, despite uploading thousands of small files, I haven't encountered any, which is very abnormal.
Rclone log file
rclone_log.txt
In total there are:
10 files that cannot be read.
13 duplicate files (excluding originals)
40 files that were uploaded directly to the trash bin after uploading.
The following files cannot be read
file_647.txt
file_665.txt
file_703.txt
file_733.txt
file_786.txt
file_826.txt
file_828.txt
file_834.txt
file_863.txt
file_864.txt
The following files are duplicated.
file_61(1).txt
file_697(1).txt
file_717(1).txt
file_735(1).txt
file_775(1).txt
file_787(1).txt
file_802(1).txt
file_818(1).txt
file_840(1).txt
file_844(1).txt
file_860(1).txt
file_927(1).txt
file_991(1).txt
The following files were uploaded directly to the trash/bin.
file_1433.txt
file_1641.txt
file_175.txt
file_1752.txt
file_1755.txt
file_1813.txt
file_1859.txt
file_221.txt
file_2364.txt
file_2502.txt
file_2522.txt
file_2627.txt
file_2702.txt
file_2709.txt
file_2727.txt
file_2777.txt
file_2831.txt
file_2897.txt
file_2921.txt
file_2925.txt
file_2926.txt
file_2931.txt
file_317.txt
file_328.txt
file_381.txt
file_399.txt
file_465.txt
file_502.txt
file_533.txt
file_532.txt
file_647.txt
file_665.txt
file_703.txt
file_733.txt
file_786.txt
file_828.txt
file_826.txt
file_834.txt
file_863.txt
file_864.txt
If you need more information or assistance, please let me know. Thank you very much for your help.
from rclone.
I have started looking into this. Thanks for the detailed report.
from rclone.
Did you successfully handle those invalid files? In my drive, there are two types of invalids:
- Unreadable files
- Hidden files
For unreadable files, it is actually readable after manually untrash(making "trahsed": false
) them. I haven't tested all but at least one of them is. I couldn't delete these invalids from server.
from rclone.
Did you successfully handle those invalid files? In my drive, there are two types of invalids:
- Unreadable files
- Hidden files
For unreadable files, it is actually readable after manually untrash(making
"trahsed": false
) them. I haven't tested all but at least one of them is. I couldn't delete these invalids from server.
First of all, thank you for your help. Regarding what you mentioned about unreadable files, they are probably the files I mentioned before that were directly uploaded to the trash, rather than being truly unreadable. Truly unreadable files can only be deleted and cannot be repaired! However, often when unreadable files are generated, duplicate files are also generated at the same time (I speculate this may be due to rclone's inability to write during retries). I simply replace the original file with the duplicate file. If there are no duplicate files, I can only re-upload through rclone.
Let me summarize the issues I've encountered:
1.Files may be directly uploaded to the trash, as you mentioned, and the only difference between them and normal files is a file parameter difference, trashed:true.
2.Unreadable files, these files can also be filtered by parameters to search for, first by platform: Upload and there is no task_id. Normally, files uploaded through the platform will be marked as Upload and will always have a task_id.
3.Hidden files, can also be found through the file parameter phase: PHASE_TYPE_PENDING.
4.Duplicate files, I directly use the hash code to search, but may also need to compare files, because sometimes there are duplicate files in the same folder with different filenames, which may need to be handled specially.
These are roughly the situations I've encountered. If there's anything else you need help with, please let me know. Gratitude.
from rclone.
Additional notes:
- The probability of generating unreadable files on my end is quite high. Sometimes it's also noticeable in the official software, as normal media files have icons, but unreadable files cannot display icons and clicking on them does not play anything. It feels like the files were not fully uploaded, and they may only be marked with a task_id after a complete upload.
- Hidden files: I used to see them frequently, but they have been rare lately. However, it's good that you've found them.
- Duplicate files can actually be directly searched using the official software search function by entering (1) to query. Unless the original filename already has (1) at the end, it should be a duplicate file.
In fact, many of these issues were discovered through comparing file parameters, and I'm not sure of the exact cause of the problems. These are just personal observations, so they may not always be accurate. If there's anything else you need help with, please let me know.
from rclone.
Can you please try using v1.67.0-beta.7928.2e582d73f.fix-7787-pikpak-upload-conflict?
from rclone.
Can you please try using v1.67.0-beta.7928.2e582d73f.fix-7787-pikpak-upload-conflict?
Sure, I'll give it a try later, but it might take some time. Thank you for your help.
from rclone.
I used the same approach to upload one thousand files for testing before, but shortly after starting, I encountered the error:
panic: runtime error: invalid memory address or nil pointer dereference
Then it was forcibly terminated.
However, when I used rclone version 1.66 or 1.65 with the same command to upload the same files, this error did not occur.
Command:
rclone copy --transfers=16 --drive-chunk-size=64M --log-file=rclone_log.txt --log-level=DEBUG ./test_folder pr3:/test_folder
Log file:
rclone_log.txt
my operating system is Win10. I haven't tested on other platforms yet, so I'm not sure if the same error will occur.
from rclone.
Thank you for putting in so much effort to test and fix the issues. I've just uploaded a thousand files on my WIN10 system without encountering any errors that occurred before. Over the next couple of days, I'll be testing the upload of more types of files. I'll report back here whether or not any issues are found.
from rclone.
Hello,
Today, I encountered two more issues during testing:
- I found that some specific files cannot be uploaded successfully using rclone. I tried repeatedly to upload them dozens of times, but they always ended up in the same state where they couldn't be read.(The upload shows as successful and without errors, but cannot be read. The file properties are missing the task_id.) However, uploading them once using the official website was successful. I tested versions 1.66 and the modified 1.67 beta version you provided, and encountered the same problem. However, this issue occurs very rarely, approximately only once in twenty thousand files.
Below are two files that cause problems on my end. Could you please test them?
https://drive.google.com/drive/folders/1pvHTbi0uEuI8OmEWvF6TKZtJPG_fTydX?usp=drive_link
- Additionally, I noticed a strange phenomenon where rclone uploads some files repeatedly. After executing an upload command and ending its execution, if I run the same command again, it uploads the same file once more. Regardless of how many times I execute the command, some files are repeatedly uploaded. When I checked the folder, I found that the file names had "(1)" appended to them, but I couldn't see the original file names, nor could I remove "(1)". It seems that they are hidden files, but my previously written program cannot detect them. Therefore, there might be other reasons causing the files to be hidden. I'm still unsure of the cause of this problem and will need to observe and test further.
Actually, there's more to it. A few days ago, something strange happened. I uploaded a folder named "abc", but instead of just getting one folder named "abc", I ended up with two folders: one named "abc" and another with random characters appended to it. The original contents were duplicated into both of these folders.
Initially, when I uploaded the folder, I didn't explicitly indicate that it was a folder, for example: "c:\abc"
. Now, I've changed it to "c:\abc\"
and it seems that I haven't encountered this strange issue again.
However, I remember encountering similar issues when uploading to Google Drive and OneDrive before, but they were very rare and difficult to reproduce. Just thought I'd mention it since it happened again here.
Please take a look first at why some files cannot be uploaded successfully.
As for other issues, I'll let you know once If i have clear findings.
Thank you very much for your generous assistance.
from rclone.
Sorry, the second issue I mentioned about the repeated uploading of files and hidden files is the same issue as before. It's due to the insufficiency of my self-developed detection program, which failed to detect it.
Because I overlooked the official restriction on reading a maximum of five hundred files or folders at a time, the excess portion was left unprocessed.
Please focus on fixing the specific files that cannot be readed. THX
from rclone.
For testing purposes, I found another video file that cannot be uploaded using rclone, similar to the two image files I mentioned earlier. After uploading, the file cannot be read because I noticed that there was "no upload process" at all, meaning the file was not actually uploaded.
This video file is nearly 900MB in size, but the transfer process completes instantly without transferring any data, and there are no error messages. This is very strange.
https://drive.google.com/drive/folders/1pvHTbi0uEuI8OmEWvF6TKZtJPG_fTydX?usp=sharing
test in win10
C:\Users\NEO\Desktop\test>rclone copy -vv 3.wmv pr9:
2024/05/15 21:41:22 DEBUG : rclone: Version "v1.67.0-beta.7930.223ba8626.fix-7787-pikpak-upload-conflict" starting with parameters ["rclone" "copy" "-vv" "3.wmv" "pr9:"]
2024/05/15 21:41:22 DEBUG : Creating backend with remote "3.wmv"
2024/05/15 21:41:22 DEBUG : Using config file from "C:\\Users\\NEO\\Desktop\\test\\rclone.conf"
2024/05/15 21:41:22 DEBUG : fs cache: adding new entry for parent of "3.wmv", "//?/C:/Users/NEO/Desktop/test"
2024/05/15 21:41:22 DEBUG : Creating backend with remote "pr9:"
2024/05/15 21:41:22 DEBUG : 3.wmv: Need to transfer - File not found at Destination
2024/05/15 21:41:25 DEBUG : 3.wmv: Dst hash empty - aborting Src hash check
2024/05/15 21:41:25 INFO : 3.wmv: Copied (new)
2024/05/15 21:41:25 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Transferred: 1 / 1, 100%
Elapsed time: 3.1s
2024/05/15 21:41:25 DEBUG : 4 go routines active
test in ubuntu 22.04
neo@arm:~/test$ ./rclone copy -vv 3.wmv pr9:
2024/05/15 22:05:40 DEBUG : rclone: Version "v1.67.0-beta.7930.223ba8626.fix-7787-pikpak-upload-conflict" starting with parameters ["./rclone" "copy" "-vv" "3.wmv" "pr9:"]
2024/05/15 22:05:40 DEBUG : Creating backend with remote "3.wmv"
2024/05/15 22:05:40 DEBUG : Using config file from "/home/neo/test/rclone.conf"
2024/05/15 22:05:40 DEBUG : fs cache: adding new entry for parent of "3.wmv", "/home/neo/test"
2024/05/15 22:05:40 DEBUG : Creating backend with remote "pr9:"
2024/05/15 22:05:40 DEBUG : 3.wmv: Need to transfer - File not found at Destination
2024/05/15 22:05:42 DEBUG : 3.wmv: Dst hash empty - aborting Src hash check
2024/05/15 22:05:42 INFO : 3.wmv: Copied (new)
2024/05/15 22:05:42 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Transferred: 1 / 1, 100%
Elapsed time: 2.4s
2024/05/15 22:05:42 DEBUG : 7 go routines active
from rclone.
Pikpak skips uploading traffic in two cases:
- zero-byte files
- rapid/fast uploads: when a file already exists in Pikpak storage based on a hash, Pikpak can reference the existing copy instead of re-uploading it.
However, our current implementation uses an incorrect hash, which points to the wrong file. There might be a discrepancy between the hash and the referenced file. See #7838 Let's revisit "no upload process" problem once the hash issue is resolved.
Btw, have you noticed any changes due to increased minimum sleep and a forced sleep? Does it affect you in any way?
from rclone.
Thank you for your response. I've previously noticed the peculiarity of 0-byte files, so I skip them when checking for "unreadable files." Regarding hash checks, as far as I understand, most cloud storage providers perform checks and remove duplicates after the files are uploaded. This is done to maintain consistency in upload processing and user experience. For example, when uploading via a web browser, the hash code cannot be known in advance, and pikpak also provides a web-based upload method. Therefore, for problematic files, we may need to use different upload methods and conduct more checks.
Additionally, the increased minimum sleep and forced sleep have minimal impact on me. Our previous tests involved a large number of small files, but in general, file sizes are not usually so small. Most of the time is spent on file transmission, so the impact will be even less. repeatedly checking for file errors may require more time, and I agree that this investment is worthwhile.
I appreciate your continued assistance in modifying the program. Looking forward to your updates.
from rclone.
By the way, I wanted to report an issue I encountered a few days ago while uploading a large file. The file is 134GB, but I encountered the following error before even reaching 100GB:
2024/05/08 20:56:55 ERROR : /Marvels.Avengers.zip: Failed to copy: failed to upload: MultipartUpload: upload multipart failed
upload id: C278C4F1FAD2486782772961F3CA663A
caused by: TotalPartsExceeded: exceeded total allowed configured MaxUploadParts (10000). Adjust PartSize to fit in this limit
It mentions that adjusting the PartSize parameter can allow the upload. Is the corresponding parameter max-upload-parts? However, I couldn't find this parameter in Pikpak. Currently, I only see two specific parameters for cloud storage services: --oos-max-upload-parts and --s3-max-upload-parts, but neither seems applicable to Pikpak.
I'm not sure what the maximum file size limit for Pikpak is, but currently, uploading large files to Pikpak using rclone seems to be problematic.
If possible, I think rclone could handle large file uploads better. For instance, it could first check if the file size fits within the default PartSize before starting the upload. If it doesn't fit, the upload shouldn't start because uploading large files takes a lot of time. Encountering a PartSize error halfway through, requiring a settings adjustment and re-upload, wastes all the previous time.
Additionally, maybe rclone could automatically adjust the PartSize based on the file size, completing the upload without any errors. I think it's challenging for the average user to know how large the file needs to be before adjusting the PartSize for a successful upload, Of course, these are just my humble suggestions and might be a bit demanding. I'm just offering some possible directions for improvement.
Thank you again for your help.
from rclone.
By the way, I wanted to report an issue I encountered a few days ago while uploading a large file. The file is 134GB, but I encountered the following error before even reaching 100GB:
I am aware of this issue. Including user-configurable upload part size, it will be fixed soon. Would you please open a separate issue for this?
from rclone.
By the way, I wanted to report an issue I encountered a few days ago while uploading a large file. The file is 134GB, but I encountered the following error before even reaching 100GB:
I am aware of this issue. Including user-configurable upload part size, it will be fixed soon. Would you please open a separate issue for this?
OK #7850
from rclone.
After these few days of testing, unfortunately, most of the previously mentioned issues still exist:
- Files cannot be read after uploading.
- Status errors causing hidden files.
- Duplicate files.
These problems all still persist, with the only issue resolved being that files are no longer being directly uploaded to the trash.
Additionally, because I need to monitor the progress while uploading, it is difficult to provide DEBUG messages. On the other hand, I also do not want to publicly disclose what I have uploaded. However, I believe that if you also upload a large amount of files of various types and sizes like I do, you will still be able to see these issues.
from rclone.
Problematic files are still small ones? Or different this time? What are you using for --transfers? What if you reduce the value if it is too high?
from rclone.
Not all of them are small files. As mentioned before, I tested uploading a thousand small files without any issues, and initially, I thought the problem was resolved.
However, I later continued with the original upload tasks. I am transferring a large amount of data previously stored on Google Drive to PIKPAK, and the file sizes and types vary.
The problems are exactly the same as at the beginning, including the issue where files cannot be uploaded using RCLONE. There are two differences:
1.Deleting the problematic files and re-uploading them has a high chance of success.
2.The other type of issue is that even deleting and re-uploading won't work, there is no transfer process at all.
As for the current --transfers parameter, I have set it to 8 and the problem still occurs. If set lower, the file transfer speed becomes much slower. However, if needed, I can help test with a lower setting.
from rclone.
By the way, if you need to test uploading a large amount of data, you can use Google Colab. It doesn't use up your server or VPS bandwidth and is very convenient. The only limitation is that it disconnects after a period of inactivity, but this doesn't significantly affect RCLONE.
from rclone.
I am currently using a detection program I previously wrote in Python. It logs problematic files and deletes them. After the detection is complete, it calls RCLONE to re-upload those files, then re-runs the detection... repeating this process several times until only the files that cannot be uploaded with RCLONE remain, which are then uploaded manually.
I have noticed that issues tend to be continuous. If no errors occur, they tend to stay that way, but once an error appears, it often happens in the same directory, and even several consecutive files can fail. Therefore, I believe these issues are likely due to network or server overload.
Since a lot of issues have accumulated, I am concerned about wasting too much of your time. We already know which files have abnormal attributes, I am wondering if, you could consider a temporary workaround. Specifically, we could recheck the files once more before RCLONE finishes the transfer, similar to my initial approach, but handled internally rather than by an external program. Later, when more time is available, we can address the root cause of the problem. Please evaluate this suggestion.
from rclone.
Related Issues (20)
- Can’t move and rename inside linkbox mount HOT 3
- Wrong encoding type in response when using "serve s3" - should not be url-encoded HOT 4
- MacOS and Linux doesn't support space character on folder name HOT 1
- Problems with pikpak storing hash after uploads HOT 4
- Fatal error: unknown flag: --vfs-cache-mode HOT 4
- Fail on missing files when using `--files-from=`
- Support for BitBucket (git repo over ssh) HOT 1
- BYO Storage Provider
- panic at github.com/rclone/rclone/fs/metadata.go:112: invalid memory address or nil pointer dereference HOT 9
- SrcFsType can't be used to reliably detect backend in metadata mapper HOT 8
- The issue encountered when uploading large size file to Pikpak. HOT 2
- android: SAF support
- Google Drive API does not always return permissionIds in files.list request HOT 8
- Improvement: 1fichier server-side rename, move
- ERROR : : Entry doesn't belong in directory "" (same as directory) - ignoring HOT 1
- rclone config does not use --password-command when encrypting configuration
- Implement chunked and parallel Upload in serve WebDAV .
- Public Link for Swift Backend HOT 1
- Explain dual id behavior when lsf google drive HOT 3
- Proton Drive: error 422 due to authenticating using an empty access token? HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rclone.