Giter Site home page Giter Site logo

deployment's People

Contributors

abhih1 avatar amaltaro avatar andrius-k avatar arooshap avatar brunocoimbra avatar esmaeeleskandari avatar evansde77 avatar fiorintu avatar geneguvo avatar giffels avatar goughes avatar h4d4 avatar hdelanno avatar hufnagel avatar jfernan2 avatar mbandrews avatar mmusich avatar muhammadimranfarooqi avatar nothingface0 avatar panos512 avatar quark2 avatar rovere avatar stuartw avatar tanmaymudholkar avatar ticoann avatar todor-ivanov avatar vanbesien avatar vkhristenko avatar vkuznet avatar yuyiguo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deployment's Issues

Fix broken binding for Rucio default account

We have a broken binding of the default Rucio account for the MSRuleClener micro service. We are wrongly associating the default to wma_{test,prod}, while we had to bind it to wmcore_transferror instead.

And we are also missing the Rucio enable parameter for the service.

Problem with pep8 in WMAgent Dev

Hi @BrunoCoimbra @ticoann

When I try to run "pep8" within our deployment I get an error

[dmwm@72d561e8d903 WMCore]$ pep8
bash: /home/dmwm/unittestdeploy/wmagent/1.1.3.pre5/sw/slc6_amd64_gcc493/external/py2-pep8/1.7.0-comp2/bin/pep8: /build/dmwmbld/srv/state/dmwmbld/builds/comp_gcc493/w/slc6_amd64_gcc493/extern: bad interpreter: No such file or directory

This comes from the fact that the python it's trying to use at the start of which pep8 is this:
#!/build/dmwmbld/srv/state/dmwmbld/builds/comp_gcc493/w/slc6_amd64_gcc493/external/python/2.7.13/bin/python

If I replace this with the standard /usr/bin/env python , everything is OK. Can you fix the RPMs please?

Removing very old IP exceptions from the frontends.

We are looking into improving the security of the services provided via the 'cmsweb' infrastructure at CERN.

Quite some time ago, back in early 2011 according to the records, we had set up some exceptions in our services such that a number of machines from institutes like DESY, FNAL, and RWTH could access the services without a X509 certificate, by explicitly allowing them to access via their IP address.

With the ever increasing level of attacks to any web services, and the improvements in browsers, we would like to know if we could revert that decision, so that we could remove these exceptions and have the browsers use standard X509 certificates to contact the services behind cmsweb.

Be able to specify the MySQL socket filename

Socket filenames (with path) in linux can be at most 117 characters long. Burying the socket file deep in the deployment can blow this limit. Instead, allow the socket file to be set by an optional environment variable.

Update of deployment and required packages for RHEL8?

Hello,
We are working on the upgrade of DQM machines. The physical machines are upgraded with more powerful ones and the operation system is also moving from centos 7 to RHEL8. Here we have a couple of questions and wonder if the experts could help us.

  1. is there a plan to update the codes (for example the script Deploy to include the new SCRAM_ARCH el8_amd64_gcc11? Sorry I am still new to the codes here and not sure how much the SCRAM_ARCH matters here.

  2. While we were trying to create a temporary DQMGUI and ran the command at our new DQMGUI servers, we find a lot of packages for bootstrap are missing. Most of them could be installed. However, one of them is not supported by RHEL8: compat-libstdc++-33

Do you know if there is a way to resolve this? Thanks!!

$PWD/deployment/Deploy -A slc7_amd64_gcc630 -r "comp=comp" -R comp@HG2209d -t MYDEV -s "prep sw post" $PWD dqmgui/bare

Shin-Shan Eiko Yu for DQM-DC

add PopConLogDB and wmstats support to Tier0 manage script

Two changes to the Tier0 manage script. First is adding support to configure for another database used by the Tier0. It's the PoConLogDB, which is used to detect when express conditions have been transferred to ORCOFF. The second change is adding support for wmstats, which is now mandatory and needs to be configured when setting up the Tier0/WMAgent. Also cleanup the Tier0 manage script somewhat, removing some redundant settings.

Incorrect couchIP is pulled

Using the following in my secrets:

MYSQL_USER=xxx
MYSQL_PASS=xxx
COUCH_USER=xxx
COUCH_PASS=xxx
COUCH_PORT=5984
COUCH_HOST=127.0.0.1
REQMGR_HOSTNAME=127.0.0.1
REQMGR_PORT=8684
WORKLOAD_SUMMARY_HOSTNAME=127.0.0.1
WORKLOAD_SUMMARY_PORT=5984
WORKLOAD_SUMMARY_DBNAME=workload_summary
CS_HOSTNAME=127.0.0.1
CS_PORT=8888
SB_HOSTNAME=127.0.0.1
UFC_CACHEDIR=/home/meloam/wmagent-new/ufc_cache
UFC_PORT=7778
HOST_DN=/DC=org/DC=doegrids/OU=Services/CN=se2.accre.vanderbilt.edu

My couch server binds correctly to 127.0.0.1, but then the wmagent, etc.. configurations point to

config.JobStateMachine.couchurl = 'http://_:_*@se2.accre.vanderbilt.edu:5984'

Which fails :/ are the manage scrupts supposed to use COUCH_HOST for that value?

Kerberos/AFS token renewal for WMAgent submitting to LSF

A WMAgent configured for local CERN submission would need to wrap call to start the agent in a k5reauth to automatically renew Kerberos/AFS tokens. As this is only functional at CERN, it needs to be optional. Last time Samir looked at it, there was a problem getting getopts option parsing to work within the manage script.

Not very urgent as we do not have many WMAgents submitting locally to CERN and it's a one line patch that can easily be applied manually after deployment.

Still, would be good if this can be integrated into the manage script.

Potential Misconfiguration

We are a group of researchers from Yale University building a tool to finding bugs in configurations files. To evaluate the effectiveness of our tool, we randomly selected 1000 open source configuration files to verify and are requesting feedback on our error report. Your file https://raw.githubusercontent.com/dmwm/deployment/621a578dc4a664af435962e057a0ae2e90d5cb7c/wmagent/my.cnf was among those files. When we ran our tool your file, it reported the following potential errors.

[MISSING ERROR: Expected "user[mysql.server]" in the same file as "[mysql.server]"
In the training set we saw: True 15 times, False 1 times
]

The training set referenced in the report is taken from the industrial configuration files at https://github.com/tianyin/configuration_datasets. Note that even if your system is currently working, these bug may manifest itself only under large traffic loads or different system environments.

If you feel the any of above errors may indeed cause problems either on your system, or a different system, please comment on this issue report in the space below. This will help use to improve our tool. If you do not believe this is a potential bug, please feel free to close this issue. If possible we would appreciate your feedback before July 27.

If you would like find out more about how we detected these bug, you can find the open source tool at https://github.com/santolucito/ConfigV. For a quick overview of this tool, you can watch this video at https://youtu.be/plliEh-5MpM. If you have further questions, or would like to get involved with this project, feel free to reach out over email at [email protected].

Thank you for your time!

devvm demployment seems broken

Deployment gets stuck then running the Deploy script for system/devvm at the place where it tries to create the host certificate (looking at the logs this is /tmp/foo/cfg/system/careq-cern-ch which is failling, possibly because certificates website was improved with new "beautiful" look).

Ran this on a fresh SLC5 64bit VM:

sudo yum install git

set up machine

kinit
mkdir -p /tmp/foo
cd /tmp/foo
git clone git://github.com/dmwm/deployment.git cfg
sudo -l
cfg/Deploy -t dummy -s post $PWD system/devvm

[comp-das-dev2] /tmp/foo $ cfg/Deploy -t dummy -s post $PWD system/devvm
INFO: 20121115101626: starting deployment of: system/devvm
INFO: deploying system - variant: devvm, version: default
NOTE: no /data/certs/hostcert.pem
NOTE: no /etc/grid-security/hostcert.pem
WARNING: requesting now a new host certificate via ca.cern.ch
WARNING: you will prompted for your login or grid certificate password
WARNING: by supplying your password you agree to CERN computing rules
Using SSL X509 key /afs/cern.ch/user/z/zemleris/.globus/userkey.pem certificate /afs/cern.ch/user/z/zemleris/.globus/usercert.pem
Traceback (most recent call last):
File "/tmp/foo/cfg/system/careq-cern-ch", line 150, in ?
assert m, "No DER download link at <%s>" % url
AssertionError: No DER download link at https://login.cern.ch/adfs/ls/?wa=wsignin1.0&wtrealm=https%3a%2f%2fca.cern.ch%2fca%2f&wctx=rm%3d0%26id%3dpassive%26ru%3d%252fca%252fHostCertificates%252fRequestHostCertificate.aspx&wct=2012-11-15T09%3a16%3a48Z
INFO: installation log can be found in /tmp/foo/.deploy/20121115-101627-19290-post.log
ERROR: installation failed with exit code 1

replace hardcoded hostnames with environment variables

We have hardcoded host names, such as cmsweb.cern.ch, cmsweb-testbed.cern.ch in various places. For instance, these names are used in deploy scripts. But for k8s setup deploy scripts are used during build of image, i.e. it implies that these names are propagated into the image. We should remove all hardcoded host names and replace them as following:

cmsweb_prod=${CMSWEB_HOSTNAME:-cmsweb.cern.ch}
cmsweb_preprod=${CMSWEB_HOSTNAME:-cmsweb-testbed.cern.ch}

and then replace usage of host names with $cmsweb_prod and $cmsweb_preprod variables. This will allow to manipulate host names during k8s deployment where we can set appropriate host name via environment variables. Then we can adjust cmsweb docker images to rely on this environment variable.

Remove any user requests from googleusercontent.com from CMSWEB side.

This feature can be implemented using mod_evasive.conf file that we already have.

Here is how we define this file:
The configuration file for mod_evasive, an Apache module designed to provide protection against Distributed Denial of Service (DDoS) attacks, has the following parameters:.

DOSHashTableSize: The size of the hash table used to track IP addresses and URLs. The default value is 3097, but you can adjust this value based on the size of your site and the number of clients you expect to handle.

DOSPageInterval: The minimum time in seconds between page requests from the same client. A client that requests the same page more frequently than this value will be blocked. In this configuration, the interval is set to 1 second.

DOSSiteInterval: The minimum time in seconds between requests to different pages from the same client. A client that requests multiple pages more frequently than this value will be blocked. In this configuration, the interval is set to 1 second.

DOSBlockingPeriod: The amount of time in seconds that a client will be blocked after triggering the module's protection. In this configuration, the blocking period is set to 10 seconds.

DOSPageCount: The number of requests for the same page from the same client that will trigger the module's protection. In this configuration, the limit is set to 2 requests.

DOSSiteCount: The number of requests for different pages from the same client that will trigger the module's protection. In this configuration, the limit is set to 50 requests.

DOSSystemCommand: A command that will be executed when a client is blocked. In this configuration, the command will write a log message to a file.

DOSWhitelist: A list of IP addresses that should not be blocked by the module. In this configuration, several CERN IP addresses are whitelisted to ensure that they are not blocked.

Action item for CMSWeb team,

  • #1267
  • Set the following configuration,
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*Googleusercontent.*$ [NC]
RewriteRule .* - [F,L]
</IfModule>

Or to simply use fail2ban. In that case, the action items would be:

  • Install fair2ban in docker.
  • Create a file in the jail.d directory to configure fail2ban to monitor Apache logs.
  • In the same directory, create a file defining the fail regex.
  • Make sure that the log path is properly defined and logs are mounted properly there.

Further tasks:

  • Carry out extensive testing and document the procedure.
  • Explore the possibility of further vulnerabilities.

UID numbers clash with cern accounts in cfg/system/deploy

The devvm deployment fails with :
useradd: UID 100001 is not unique

Currently there are 13 UIDs number that clash with the CERN users IDs:

-bash-4.2$ for uid in grep add_local_user /tmp/foo/cfg/system/deploy.orig | cut -d" " -f4 ; do getent passwd $uid; done | wc -l
13

Rendering problems of BeamMonitor fitted plots

After deploying PR #1269 during beams, d0 vs phi0 and track zo plots are blacklisted again. We have cleared the blacklist, and restarted our playback DQM GUI server (srv-c2f11-29-03). We recorded the log's right after the restart that might provide more information, they can be found here.

Frontend disables keepalive for Chrome on OSX

The frontend disables keepalive for Chrome on OSX, which I just noticed. You can test it yourself via:

curl -v -s -o /dev/null -k --key /tmp/x509up_u112870 --cert /tmp/x509up_u112870 'https://cmsweb.cern.ch/' -H "Connection: keep-alive"

curl -v -s -o /dev/null -k --key /tmp/x509up_u112870 --cert /tmp/x509up_u112870 'https://cmsweb.cern.ch/' -H "Connection: keep-alive" -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'

These yield:

No user-agent:
> GET / HTTP/1.1
> Host: cmsweb.cern.ch
> Accept: */*
> Connection: keep-alive

**snip**

< HTTP/1.1 200 OK
< Date: Tue, 01 Mar 2016 17:33:51 GMT
< Server: Apache
< Last-Modified: Mon, 25 Jan 2016 14:49:41 GMT
< ETag: "8d7-52a29adc5cb40"
< Accept-Ranges: bytes
< Content-Length: 2263
< CMS-Server-Time: D=678 t=1456853631811320
< Keep-Alive: timeout=5, max=100
< Connection: Keep-Alive

Chrome user-agent:
> GET / HTTP/1.1
> Host: cmsweb.cern.ch
> Accept: */*
> Connection: keep-alive
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36

**snip**

< HTTP/1.1 200 OK
< Date: Tue, 01 Mar 2016 17:33:58 GMT
< Server: Apache
< Last-Modified: Mon, 25 Jan 2016 14:49:41 GMT
< ETag: "8d7-52a29adc5cb40"
< Accept-Ranges: bytes
< Content-Length: 2263
< CMS-Server-Time: D=1290 t=1456853638447423
< Connection: close

This appears to be related to the following line

# Disable keep-alive with Safari. See various bugs on Google.
which matches against Safari in Chrome's user-agent of User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36

A simple solution would be to add BrowserMatch Chrome !nokeepalive to the subsequent line, though it's not clear that would do 100% the right thing. I'd test it myself, but my dev frontend on my machine isn't cooperating right now ๐Ÿ˜ฆ. It'd be neat (?) to get a list of unique user-agents to work with. That'd help figure out the right matches needed.

subscribe to /deployment PRs

I would like to be notified when a new PR is submitted for /deployment. Would it be possible to add me to the list of subscribers?
Thanks!
Tomas (CMS DQM core)

Deploy should inform if something bad happens

Running the following command

[se2] ~/wmagent/quickinstall $ ./cfg/Deploy -R [email protected] -s sw -A slc5_amd64_gcc461 -t v01 /fs1/home/meloam/wmagent/quickinstall wmagent
INFO: 20120907153151: starting deployment of: wmagent
INFO: deploying wmagent - variant: default, version: default
INFO: bootstrapping comp software area in /fs1/home/meloam/wmagent/quickinstall/v01/sw
INFO: bootstrap successful

No error message is displayed, even though the deploy failed. You have to look in .deploy to see what happened:

<snip>
Get:17 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+mysql+5.1.58-comp 1-1 [16.2MB]
Get:18 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+py2-mysqldb+1.2.3c1-comp8 1-1 [106kB]
Get:19 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/cms cms+wmcore-db-mysql+1-comp55 1-1 [7779B]
Get:20 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+oracle+11.2.0.3.0__10.2.0.4.0 1-1 [64.3MB]
Get:21 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+py2-cx-oracle+5.1-comp8 1-1 [157kB]
Get:22 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/cms cms+wmcore-db-oracle+1-comp50 1-1 [7929B]
Get:23 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+cherrypy+3.1.2-comp5 1-1 [585kB]
Get:24 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+py2-cheetah+2.4.0-comp5 1-1 [413kB]
Get:25 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+yui+2.9.0 1-1 [2492kB]
Get:26 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/cms cms+wmcore-webtools+1-comp54 1-1 [8160B]
Get:27 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+py2-psutil+0.3.0-comp5 1-1 [76.7kB]
Get:28 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+zeromq+2.1.9-comp 1-1 [735kB]
Get:29 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+py2-zmq+2.1.9-comp5 1-1 [639kB]
Get:30 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/cms cms+wmagent+0.9.13 1-1 [9259B]
Fetched 157MB in 1m39s (1570kB/s)
Executing RPM (/fs1/home/meloam/wmagent/quickinstall/v01/sw/slc5_amd64_gcc461/external/apt/429-comp/bin/rpm-wrapper -Uvh -r /fs1/home/meloam/wmagent/quickinstall/v01/sw --force --prefix /fs1/home/meloam/wmagent/quickinstall/v01/sw --ignoreos --ignorearch --force --prefix /fs1/home/meloam/wmagent/quickinstall/v01/sw --ignoreos --ignorearch --oldpackage)...
error: Failed dependencies:
    libuuid.so.1()(64bit) is needed by external+zeromq+2.1.9-comp-1-1.x86_64
Executing,: Sub-process /fs1/home/meloam/wmagent/quickinstall/v01/sw/slc5_amd64_gcc461/external/apt/429-comp/bin/rpm-wrapper returned an error code (30)    

That it failed is something separate. (I'l have to hunt that down)

readlink -f doesn't work on OSX

This is probably a minor thing, but OSX doesn't support the -f option for readlink, which means the wmagent manage script bombs out at the very first line. I'm not sure of the shell trickery to make this work conditionally

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.