Giter Site home page Giter Site logo

anticompositenumber / signatures Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 2.0 822 KB

Validates user signatures, checking for technical and policy issues

Home Page: https://signatures.toolforge.org

License: GNU Affero General Public License v3.0

Python 83.97% HTML 15.60% Shell 0.43%

signatures's People

Contributors

anticompositenumber avatar dependabot[bot] avatar

Stargazers

 avatar

Watchers

 avatar  avatar

signatures's Issues

User site report submission

  1. Identify bottlenecks and try to decrease query time
  2. Run jobs from a Celery worker
  3. Rate-limit requests for the same site when query times are long.
    • One report per site/config per day seems reasonable.
    • Sub 1min queries could be cached less heavily, 1/hr maybe.
  4. Decide on level of configuration available, and how that should be presented in the list of reports
    • Currently, there is only one test suite and configuration parameters remain constant.
    • If users can configure the query (#1), and those reports are cached, they should be identified in the query list.
  5. Store data in database instead of writing to json?

Store reports on toolsdb instead of as json files

JSON is easy, but NFS is bad. And it's hard to combine data from different sites.

I'm thinking of doing two tables -- one to keep the signatures as-scanned, and one to keep the errors.

sig_id timestamp signature
1 20210320200805 foo bar
username wiki sig_id timestamp error
foo enwiki 1 20210320200805 no-user-links
foo enwiki 1 20210320200805 plain-fancy-sig

Extended validation

Many large wikis have policies and guidelines for signatures that go above and beyond the proposal for technical requirements. These requirements should be supported in an extended validation mode, possibly with granular test selection.

  • Images in signatures
  • Transclusions in signatures
  • Post-subst wikitext size
  • Impersonation through link name
  • Unescaped pipe characters
  • External links
  • Visual size
  • Line breaks
  • Horizontal rule
  • Insufficient contrast (with background and shadow)
  • Text shadow size

Alternate user/sig source

Currently, only signatures stored in the replica database can be accessed.

  • On a single user query, add option to supply new signature.
  • On a batch query, add option to supply user and signature data in JSON form.

A MediaWiki site with a public API would still be required for i18n and template markup checks. That site would have to expose the Linter API in the same manner as public WMF sites to do lint checks.

Capitalization of the first letter of username

Searching a username with a lowercase letter would return a "User does not exist" message. Since MediaWiki does not allow the creation of usernames start with lowercase letters, it is possible to add an automatic conversion.

Improve performance?

>>> p.sort_stats(pstats.SortKey.TIME).print_stats(30)
Mon Mar 23 02:27:58 2020    sigprobs_profile

         5200087 function calls (5155274 primitive calls) in 495.606 seconds

   Ordered by: internal time
   List reduced from 1999 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      104  402.013    3.866  402.013    3.866 {method 'recv_into' of '_socket.socket' objects}
     2036   85.032    0.042   85.032    0.042 {method 'read' of '_ssl._SSLSocket' objects}
     5979    0.262    0.000    0.262    0.000 {built-in method posix.stat}
     4189    0.183    0.000    0.264    0.000 url.py:210(_encode_invalid_chars)
        1    0.167    0.167  495.195  495.195 sigprobs.py:378(main)
      205    0.150    0.001    0.172    0.001 <frozen importlib._bootstrap_external>:914(get_data)
    91358    0.149    0.000    0.364    0.000 os.py:673(__getitem__)
   108346    0.141    0.000    0.647    0.000 _collections_abc.py:742(__iter__)
     1901    0.141    0.000    0.141    0.000 {method 'write' of '_ssl._SSLSocket' objects}
      110    0.135    0.001    0.137    0.001 {built-in method io.open}
   401706    0.129    0.000    0.171    0.000 {built-in method builtins.isinstance}
   534880    0.127    0.000    0.127    0.000 {method 'lower' of 'str' objects}
     1015    0.116    0.000    0.308    0.000 feedparser.py:471(_parse_headers)
    21322    0.104    0.000    0.312    0.000 parse.py:361(urlparse)
   311422    0.102    0.000    0.102    0.000 {method 'decode' of 'bytes' objects}
    91358    0.088    0.000    0.138    0.000 os.py:751(encode)
    48898    0.086    0.000    0.086    0.000 {method 'match' of 're.Pattern' objects}
     2030    0.085    0.000    0.667    0.000 request.py:2456(getproxies_environment)
     1015    0.084    0.000    0.890    0.001 client.py:203(parse_headers)
   166462    0.083    0.000    0.148    0.000 os.py:755(decode)
    49753    0.080    0.000    0.092    0.000 parse.py:109(_coerce_args)
    24371    0.080    0.000    0.140    0.000 parse.py:412(urlsplit)
     9135    0.077    0.000    0.142    0.000 message.py:462(get)
     5077    0.076    0.000    0.220    0.000 _collections_abc.py:824(update)
    31322    0.076    0.000    0.128    0.000 _policybase.py:293(header_source_parse)
     2030    0.070    0.000    0.577    0.000 feedparser.py:218(_parsegen)
    85260    0.068    0.000    0.140    0.000 os.py:696(__iter__)
166141/166140    0.067    0.000    0.067    0.000 {method 'encode' of 'str' objects}
 6877/985    0.059    0.000    0.496    0.001 utils.py:36(parse_anything)
31440/11160    0.059    0.000    0.109    0.000 cookiejar.py:1214(deepvalues)
>>> p.sort_stats(pstats.SortKey.CUMULATIVE).print_stats(30)
Mon Mar 23 02:27:58 2020    sigprobs_profile

         5200087 function calls (5155274 primitive calls) in 495.606 seconds

   Ordered by: cumulative time
   List reduced from 1999 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    219/1    0.002    0.000  495.638  495.638 {built-in method builtins.exec}
        1    0.000    0.000  495.638  495.638 sigprobs.py:21(<module>)
        1    0.167    0.167  495.195  495.195 sigprobs.py:378(main)
     2140    0.016    0.000  487.096    0.228 socket.py:575(readinto)
      902    0.008    0.000  402.222    0.446 sigprobs.py:60(iter_active_user_sigs)
     1410    0.021    0.000  402.091    0.285 connections.py:648(_read_packet)
      100    0.001    0.000  402.068    4.021 cursors.py:151(execute)
      100    0.001    0.000  402.061    4.021 cursors.py:451(_query)
      100    0.001    0.000  402.058    4.021 connections.py:508(query)
      100    0.001    0.000  402.047    4.020 connections.py:720(_read_query_result)
      100    0.002    0.000  402.046    4.020 connections.py:1086(init_unbuffered_query)
     2820    0.012    0.000  402.045    0.143 connections.py:687(_read_bytes)
     2861    0.006    0.000  402.021    0.141 {method 'read' of '_io.BufferedReader' objects}
      104  402.013    3.866  402.013    3.866 {method 'recv_into' of '_socket.socket' objects}
      886    0.039    0.000   92.566    0.104 sigprobs.py:197(check_sig)
     1015    0.027    0.000   91.829    0.090 sessions.py:463(request)
     1015    0.041    0.000   89.084    0.088 sessions.py:614(send)
     1015    0.034    0.000   88.646    0.087 adapters.py:394(send)
     1015    0.048    0.000   87.597    0.086 connectionpool.py:494(urlopen)
     1015    0.046    0.000   87.008    0.086 connectionpool.py:351(_make_request)
      886    0.012    0.000   86.469    0.098 sigprobs.py:218(get_lint_errors)
      886    0.009    0.000   86.334    0.097 sessions.py:567(post)
     1015    0.012    0.000   86.136    0.085 client.py:1292(getresponse)
     1015    0.031    0.000   86.081    0.085 client.py:299(begin)
     1015    0.035    0.000   85.076    0.084 client.py:266(_read_status)
    33404    0.041    0.000   85.059    0.003 {method 'readline' of '_io.BufferedReader' objects}
     2036    0.012    0.000   85.055    0.042 ssl.py:1041(recv_into)
     2036    0.010    0.000   85.042    0.042 ssl.py:902(read)
     2036   85.032    0.042   85.032    0.042 {method 'read' of '_ssl._SSLSocket' objects}
      129    0.001    0.000    5.505    0.043 sessions.py:534(get)
>>> p.strip_dirs().sort_stats(pstats.SortKey.FILENAME, pstats.SortKey.CUMULATIVE).print_stats()
Mon Mar 23 02:27:58 2020    sigprobs_profile

         5200087 function calls (5155274 primitive calls) in 495.606 seconds

   Ordered by: file name, cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

(...)
        1    0.000    0.000  495.638  495.638 sigprobs.py:21(<module>)
        1    0.167    0.167  495.195  495.195 sigprobs.py:378(main)
      902    0.008    0.000  402.222    0.446 sigprobs.py:60(iter_active_user_sigs)
      886    0.039    0.000   92.566    0.104 sigprobs.py:197(check_sig)
      886    0.012    0.000   86.469    0.098 sigprobs.py:218(get_lint_errors)
      128    0.002    0.000    5.436    0.042 sigprobs.py:317(evaluate_subst)
      886    0.011    0.000    4.599    0.005 sigprobs.py:236(check_links)
      886    0.002    0.000    1.447    0.002 sigprobs.py:345(check_tildes)
      985    0.028    0.000    0.596    0.001 sigprobs.py:255(compare_links)
        1    0.001    0.001    0.085    0.085 sigprobs.py:122(get_site_data)
     2724    0.010    0.000    0.014    0.000 sigprobs.py:189(normal_name)
      886    0.003    0.000    0.003    0.000 sigprobs.py:333(check_fanciness)
        1    0.000    0.000    0.003    0.003 sigprobs.py:39(load_config)
      886    0.001    0.000    0.002    0.000 sigprobs.py:370(check_length)
        1    0.000    0.000    0.000    0.000 sigprobs.py:159(<dictcomp>)
        1    0.000    0.000    0.000    0.000 sigprobs.py:163(<dictcomp>)
        1    0.000    0.000    0.000    0.000 sigprobs.py:168(<setcomp>)
        1    0.000    0.000    0.000    0.000 sigprobs.py:173(<listcomp>)
        1    0.000    0.000    0.000    0.000 sigprobs.py:174(<listcomp>)
(...)
>>> p.strip_dirs().sort_stats(pstats.SortKey.FILENAME, pstats.SortKey.TIME).print_stats()
Mon Mar 23 02:27:58 2020    sigprobs_profile

         5200087 function calls (5155274 primitive calls) in 495.606 seconds

   Ordered by: file name, internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
(...)
        1    0.167    0.167  495.195  495.195 sigprobs.py:378(main)
      886    0.039    0.000   92.566    0.104 sigprobs.py:197(check_sig)
      985    0.028    0.000    0.596    0.001 sigprobs.py:255(compare_links)
      886    0.012    0.000   86.469    0.098 sigprobs.py:218(get_lint_errors)
      886    0.011    0.000    4.599    0.005 sigprobs.py:236(check_links)
     2724    0.010    0.000    0.014    0.000 sigprobs.py:189(normal_name)
      902    0.008    0.000  402.222    0.446 sigprobs.py:60(iter_active_user_sigs)
      886    0.003    0.000    0.003    0.000 sigprobs.py:333(check_fanciness)
      128    0.002    0.000    5.436    0.042 sigprobs.py:317(evaluate_subst)
      886    0.002    0.000    1.447    0.002 sigprobs.py:345(check_tildes)
      886    0.001    0.000    0.002    0.000 sigprobs.py:370(check_length)
        1    0.001    0.001    0.085    0.085 sigprobs.py:122(get_site_data)
        1    0.000    0.000    0.003    0.003 sigprobs.py:39(load_config)
        1    0.000    0.000    0.000    0.000 sigprobs.py:159(<dictcomp>)
        1    0.000    0.000    0.000    0.000 sigprobs.py:163(<dictcomp>)
        1    0.000    0.000  495.638  495.638 sigprobs.py:21(<module>)
        1    0.000    0.000    0.000    0.000 sigprobs.py:168(<setcomp>)
        1    0.000    0.000    0.000    0.000 sigprobs.py:173(<listcomp>)
        1    0.000    0.000    0.000    0.000 sigprobs.py:174(<listcomp>)
(...)

Write complete tests and documentation

  • API documentation
  • User-facing explanation of signature tests
  • Operation and maintenance docs
  • Unit testing
  • Docstrings and comments
  • Static type checking?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.