Giter Site home page Giter Site logo

Corruption with some mime types about sharry HOT 14 CLOSED

eikek avatar eikek commented on May 22, 2024
Corruption with some mime types

from sharry.

Comments (14)

exedore6 avatar exedore6 commented on May 22, 2024 1

Update, it was the ProxyHtmlEnable directive. From the documentation it appears to be required, but commenting it out has resolved the issue.

You couldn't load it because it's a module that probably wasn't enabled on your test system.

It's also got a known bug that matches our issue https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

I'm going to do some more apache research, you can close this issue. Expect a pull request with some updated proxy documentation (which is the only change I would think is needed here)

from sharry.

exedore6 avatar exedore6 commented on May 22, 2024

Other notes - this is the docker image, using postgresql as the for the backend.

from sharry.

eikek avatar eikek commented on May 22, 2024

Oh my…. Thank you for reporting! I'll take a look. Really curious on the reason to this, especially that it is related to the content type.

from sharry.

exedore6 avatar exedore6 commented on May 22, 2024

I haven't tested this hypothesis out, but could it be four letter extensions? (I'll upload a jpeg to see if it gets twisted) Tested it, an uploaded JPEG with extension jpeg isn't corrupted.
I can provide a shares and aliases for testing if that helps.

from sharry.

exedore6 avatar exedore6 commented on May 22, 2024

Renaming a word document to .zip, uploading it and downloading it leaves things uncorrupted.

from sharry.

exedore6 avatar exedore6 commented on May 22, 2024

Link to two identical files, uploaded with different extensions, one downloads clean, the other corrupt.

https://sharry.kent-school.edu/app/open/6jDQxbREegR-d76TEmfuNeN-zx15TBwVC1e-xyaxffbbxCe

from sharry.

exedore6 avatar exedore6 commented on May 22, 2024

One more important fact - this instance is behind an apache reverse proxy - Below is the virtualhost entry. I haven't replicated the issue in a different environment.

<VirtualHost sharry.kent-school.edu:443>
        # The ServerName directive sets the request scheme, hostname and port that
        # the server uses to identify itself. This is used when creating
        # redirection URLs. In the context of virtual hosts, the ServerName
        # specifies what hostname must appear in the request's Host: header to
        # match this virtual host. For the default virtual host (this file) this
        # value is not decisive as it is used as a last resort host regardless.
        # However, you must set it for any further virtual host explicitly.
        #ServerName www.example.com

        ServerAdmin webmaster@localhost

        # Available loglevels: trace8, ..., trace1, debug, info, notice, warn,
        # error, crit, alert, emerg.
        # It is also possible to configure the loglevel for particular
        # modules, e.g.
        #LogLevel info ssl:warn

        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined
        ProxyBadHeader Ignore
        SetEnv no-gzip On
        RewriteEngine On
        RewriteRule ^/app/assets/sharry-webapp/1.2.0i/favicon/.*.png https://www.kent-school.edu/favicon.ico [R]
        SetEnv proxy-sendchunks On
        ProxyPass "/" "http://localhost:4090/" timeout=2400 keepalive=on
        ProxyPassReverse "/" "http://localhost:4090/"
        ProxyHTMLEnable On


        # For most configuration files from conf-available/, which are
        # enabled or disabled at a global level, it is possible to
        # include a line for only one particular virtual host. For example the
        # following line enables the CGI configuration for this host only
        # after it has been globally disabled with "a2disconf".
        #Include conf-available/serve-cgi-bin.conf


ServerName sharry.kent-school.edu
SSLCertificateFile /etc/letsencrypt/live/sharry.kent-school.edu/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/sharry.kent-school.edu/privkey.pem
Include /etc/letsencrypt/options-ssl-apache.conf
</VirtualHost>

from sharry.

eikek avatar eikek commented on May 22, 2024

Great - Thank you for the test files!

from sharry.

exedore6 avatar exedore6 commented on May 22, 2024

I don't have comfort enough in psql to get you the records from the chunkdata table for an example (I can do a query, but I don't know how to get it into a file for you)

from sharry.

eikek avatar eikek commented on May 22, 2024

Don't worry! I should be able to reproduce it with your files and then I see where it leads me. Thank you for all the info. If I get stuck I'm coming back to you with questions :-)

from sharry.

eikek avatar eikek commented on May 22, 2024

Hello, I just tried it quickly with my local setup (no reverse proxy) and couldn't reproduce it. I downloaded your files and it showed the problem, one is 350K the other 234K, really strange!

Could you maybe try the following to check whether the database contains correct content? (it seems so by looking at the lengths): Edit the share description and put in this:

{{#files}}
- {{name}}: `{{checksum}}`
{{/files}}

When saving the share, it should show all attachment filenames and their sha256 checksum as stored in the db (you could also query the filemeta database table). They should be the same in the example from above (for me it is a2af46e7…c2e2c64). If they are the same, could you then try to download without the apache proxy in front? If they are different, there is an upload problem. I'm going to setup an apache here and see if I can reproduce it.

from sharry.

eikek avatar eikek commented on May 22, 2024

I just remembered that the checksum is also sent with an ETag header in the response. They are both equal, so I now assume that the db contains the correct data and it is rather related to downloading. When I run the downloads via curl -vv I see this:

"bad" file:

curl -v --output /dev/null https://sharry.kent-school.edu/api/v2/open/share/6jDQxbREegR-d76TEmfuNeN-zx15TBwVC1e-xyaxffbbxCe/file/BuJtZYi82Gg-1mzogfXtLG8-wxwED3XiuWG-LzHXMdt9UbE
< HTTP/1.1 200 OK
< Date: Thu, 07 May 2020 19:24:51 GMT
< Server: Apache/2.4.29 (Ubuntu)
< Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document;charset=utf-8
< Accept-Ranges: bytes
< Last-Modified: Thu, 07 May 2020 13:30:29 GMT
< Content-Disposition: inline; filename="Document.docx"
< ETag: "a2af46e7745897f3a830e9b0b2de90d395d6a759e7049c663c7f155bfc2e2c64"
< Transfer-Encoding: chunked

"good" file:

curl -v --output /dev/null https://sharry.kent-school.edu/api/v2/open/share/6jDQxbREegR-d76TEmfuNeN-zx15TBwVC1e-xyaxffbbxCe/file/E7PkezZjVLP-cMvKnDTUW52-LEzymkU1ho6-imJpkZoRs4c
< HTTP/1.1 200 OK
< Date: Thu, 07 May 2020 19:32:20 GMT
< Server: Apache/2.4.29 (Ubuntu)
< Content-Type: application/zip
< Accept-Ranges: bytes
< Last-Modified: Thu, 07 May 2020 13:28:21 GMT
< Content-Disposition: inline; filename="document.zip"
< ETag: "a2af46e7745897f3a830e9b0b2de90d395d6a759e7049c663c7f155bfc2e2c64"
< Content-Length: 240461

For the first file, the ;charset=utf-8 looks suspicious, but I don't think it is related. Then it seems that apache sends chunked responses. Sharry may also send chunks, so maybe apache messes it up somehow? It is really strange, that it only applies to the docx file and not the zip version, which are both identical…. Unfortunately, I'm not at all familiar with apache configuration. Another thing I could imagine is that apache won't compress already compressed files (like zip or jpg) but tries to do this to other files it thinks are not compressed which then results in chunked transfers (which should work actually, but may cause this…). If you could verify that the error also exists/not exists without apache, that would be helpful I think.

For comparison, I uploaded the same file here: https://box.daheim.site/app/open/4qPTg3UhYkT-z4rw2X9qi5g-BTgpe1jb3HB-x2Jb5oT6RLo

I tried with an Apach reverse proxy here, too. I used Apache 2.4.43. I used your config where possible (there is a huge default apache config before this virtual host and I tested without tls). I had to remove the line ProxyHTMLEnable On, because my apache wouldn't start otherwise. With this setup I could not reproduce it either.

< HTTP/1.1 200 OK
< Date: Thu, 07 May 2020 20:49:00 GMT
< Server: Apache/2.4.43 (Unix)
< Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
< Accept-Ranges: bytes
< Last-Modified: Thu, 07 May 2020 20:46:18 GMT
< Content-Disposition: inline; filename="document_orig.docx"
< ETag: "a2af46e7745897f3a830e9b0b2de90d395d6a759e7049c663c7f155bfc2e2c64"
< Content-Length: 240461

from sharry.

exedore6 avatar exedore6 commented on May 22, 2024

When not going through the reverse proxy, it downloads clean, no chunking.
If I tell apache to not chunk its transfers,

it still gets chunked (which leaves me suspecting that it's being chunked by sharry in that configuration)

If your apache doesn't have the proxyhtmlenable directive set, then urls in the returned webpage don't get rewritten (I expect that your test environment it's still making api calls to the unproxied server)

At least we know that the x-factor is definitely an apache+ssl+reverseproxy. Additionally I'm taking comfort that it's only showing up with downloads at the moment.

from sharry.

eikek avatar eikek commented on May 22, 2024

That's good to hear! I think (with my understanding of this directive) regarding sharry this directive is not necessary, if you deploy sharry at the root path. The only HTML that is returned from the server is just one page that loads the javascript application. The links in there are without the hostname, e.g. /app/assets/sharry-webapp/1.3.0/sharry-app.js. The rest is all covered by setting the base-url to the "outside" url.

A PR with updated docs would be great, of course. Thank you.

from sharry.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.