dmwm / deployment Goto Github PK
View Code? Open in Web Editor NEWCMS DMWM Deployment
CMS DMWM Deployment
Disable reqmgr2 cherrypy threads for the testbed instance. This will allow the transition of reqmgr2 to the k8s testbed.
We have a broken binding of the default Rucio account for the MSRuleClener micro service. We are wrongly associating the default to wma_{test,prod}
, while we had to bind it to wmcore_transferror
instead.
And we are also missing the Rucio enable parameter for the service.
When I try to run "pep8" within our deployment I get an error
[dmwm@72d561e8d903 WMCore]$ pep8
bash: /home/dmwm/unittestdeploy/wmagent/1.1.3.pre5/sw/slc6_amd64_gcc493/external/py2-pep8/1.7.0-comp2/bin/pep8: /build/dmwmbld/srv/state/dmwmbld/builds/comp_gcc493/w/slc6_amd64_gcc493/extern: bad interpreter: No such file or directory
This comes from the fact that the python it's trying to use at the start of which pep8
is this:
#!/build/dmwmbld/srv/state/dmwmbld/builds/comp_gcc493/w/slc6_amd64_gcc493/external/python/2.7.13/bin/python
If I replace this with the standard /usr/bin/env python , everything is OK. Can you fix the RPMs please?
We are looking into improving the security of the services provided via the 'cmsweb' infrastructure at CERN.
Quite some time ago, back in early 2011 according to the records, we had set up some exceptions in our services such that a number of machines from institutes like DESY, FNAL, and RWTH could access the services without a X509 certificate, by explicitly allowing them to access via their IP address.
With the ever increasing level of attacks to any web services, and the improvements in browsers, we would like to know if we could revert that decision, so that we could remove these exceptions and have the browsers use standard X509 certificates to contact the services behind cmsweb.
Socket filenames (with path) in linux can be at most 117 characters long. Burying the socket file deep in the deployment can blow this limit. Instead, allow the socket file to be set by an optional environment variable.
Hello,
We are working on the upgrade of DQM machines. The physical machines are upgraded with more powerful ones and the operation system is also moving from centos 7 to RHEL8. Here we have a couple of questions and wonder if the experts could help us.
is there a plan to update the codes (for example the script Deploy to include the new SCRAM_ARCH el8_amd64_gcc11? Sorry I am still new to the codes here and not sure how much the SCRAM_ARCH matters here.
While we were trying to create a temporary DQMGUI and ran the command at our new DQMGUI servers, we find a lot of packages for bootstrap are missing. Most of them could be installed. However, one of them is not supported by RHEL8: compat-libstdc++-33
Do you know if there is a way to resolve this? Thanks!!
$PWD/deployment/Deploy -A slc7_amd64_gcc630 -r "comp=comp" -R comp@HG2209d -t MYDEV -s "prep sw post" $PWD dqmgui/bare
Shin-Shan Eiko Yu for DQM-DC
Disable CherryPy threads from t0reqmon deployment.
Reason: These will be running from the k8s deployment instead (and we can't run 2 at the same time).
https://github.com/dmwm/deployment/blob/master/t0_reqmon/config.py#L74
Two changes to the Tier0 manage script. First is adding support to configure for another database used by the Tier0. It's the PoConLogDB, which is used to detect when express conditions have been transferred to ORCOFF. The second change is adding support for wmstats, which is now mandatory and needs to be configured when setting up the Tier0/WMAgent. Also cleanup the Tier0 manage script somewhat, removing some redundant settings.
Using the following in my secrets:
MYSQL_USER=xxx
MYSQL_PASS=xxx
COUCH_USER=xxx
COUCH_PASS=xxx
COUCH_PORT=5984
COUCH_HOST=127.0.0.1
REQMGR_HOSTNAME=127.0.0.1
REQMGR_PORT=8684
WORKLOAD_SUMMARY_HOSTNAME=127.0.0.1
WORKLOAD_SUMMARY_PORT=5984
WORKLOAD_SUMMARY_DBNAME=workload_summary
CS_HOSTNAME=127.0.0.1
CS_PORT=8888
SB_HOSTNAME=127.0.0.1
UFC_CACHEDIR=/home/meloam/wmagent-new/ufc_cache
UFC_PORT=7778
HOST_DN=/DC=org/DC=doegrids/OU=Services/CN=se2.accre.vanderbilt.edu
My couch server binds correctly to 127.0.0.1, but then the wmagent, etc.. configurations point to
config.JobStateMachine.couchurl = 'http://_:_*@se2.accre.vanderbilt.edu:5984'
Which fails :/ are the manage scrupts supposed to use COUCH_HOST for that value?
A WMAgent configured for local CERN submission would need to wrap call to start the agent in a k5reauth to automatically renew Kerberos/AFS tokens. As this is only functional at CERN, it needs to be optional. Last time Samir looked at it, there was a problem getting getopts option parsing to work within the manage script.
Not very urgent as we do not have many WMAgents submitting locally to CERN and it's a one line patch that can easily be applied manually after deployment.
Still, would be good if this can be integrated into the manage script.
We are a group of researchers from Yale University building a tool to finding bugs in configurations files. To evaluate the effectiveness of our tool, we randomly selected 1000 open source configuration files to verify and are requesting feedback on our error report. Your file https://raw.githubusercontent.com/dmwm/deployment/621a578dc4a664af435962e057a0ae2e90d5cb7c/wmagent/my.cnf was among those files. When we ran our tool your file, it reported the following potential errors.
[MISSING ERROR: Expected "user[mysql.server]" in the same file as "[mysql.server]"
In the training set we saw: True 15 times, False 1 times
]
The training set referenced in the report is taken from the industrial configuration files at https://github.com/tianyin/configuration_datasets. Note that even if your system is currently working, these bug may manifest itself only under large traffic loads or different system environments.
If you feel the any of above errors may indeed cause problems either on your system, or a different system, please comment on this issue report in the space below. This will help use to improve our tool. If you do not believe this is a potential bug, please feel free to close this issue. If possible we would appreciate your feedback before July 27.
If you would like find out more about how we detected these bug, you can find the open source tool at https://github.com/santolucito/ConfigV. For a quick overview of this tool, you can watch this video at https://youtu.be/plliEh-5MpM. If you have further questions, or would like to get involved with this project, feel free to reach out over email at [email protected].
Thank you for your time!
Hi Lina,
I was trying to install slc6 version but it is getting wrong architecture. (gcc481)
https://github.com/dmwm/deployment/blob/master/Deploy#L52
Shouldn't it be gcc493?
Deployment gets stuck then running the Deploy script for system/devvm at the place where it tries to create the host certificate (looking at the logs this is /tmp/foo/cfg/system/careq-cern-ch which is failling, possibly because certificates website was improved with new "beautiful" look).
Ran this on a fresh SLC5 64bit VM:
sudo yum install git
kinit
mkdir -p /tmp/foo
cd /tmp/foo
git clone git://github.com/dmwm/deployment.git cfg
sudo -l
cfg/Deploy -t dummy -s post $PWD system/devvm
[comp-das-dev2] /tmp/foo $ cfg/Deploy -t dummy -s post $PWD system/devvm
INFO: 20121115101626: starting deployment of: system/devvm
INFO: deploying system - variant: devvm, version: default
NOTE: no /data/certs/hostcert.pem
NOTE: no /etc/grid-security/hostcert.pem
WARNING: requesting now a new host certificate via ca.cern.ch
WARNING: you will prompted for your login or grid certificate password
WARNING: by supplying your password you agree to CERN computing rules
Using SSL X509 key /afs/cern.ch/user/z/zemleris/.globus/userkey.pem certificate /afs/cern.ch/user/z/zemleris/.globus/usercert.pem
Traceback (most recent call last):
File "/tmp/foo/cfg/system/careq-cern-ch", line 150, in ?
assert m, "No DER download link at <%s>" % url
AssertionError: No DER download link at https://login.cern.ch/adfs/ls/?wa=wsignin1.0&wtrealm=https%3a%2f%2fca.cern.ch%2fca%2f&wctx=rm%3d0%26id%3dpassive%26ru%3d%252fca%252fHostCertificates%252fRequestHostCertificate.aspx&wct=2012-11-15T09%3a16%3a48Z
INFO: installation log can be found in /tmp/foo/.deploy/20121115-101627-19290-post.log
ERROR: installation failed with exit code 1
We have hardcoded host names, such as cmsweb.cern.ch
, cmsweb-testbed.cern.ch
in various places. For instance, these names are used in deploy
scripts. But for k8s setup deploy scripts are used during build of image, i.e. it implies that these names are propagated into the image. We should remove all hardcoded host names and replace them as following:
cmsweb_prod=${CMSWEB_HOSTNAME:-cmsweb.cern.ch}
cmsweb_preprod=${CMSWEB_HOSTNAME:-cmsweb-testbed.cern.ch}
and then replace usage of host names with $cmsweb_prod
and $cmsweb_preprod
variables. This will allow to manipulate host names during k8s deployment where we can set appropriate host name via environment variables. Then we can adjust cmsweb docker images to rely on this environment variable.
This feature can be implemented using mod_evasive.conf file that we already have.
Here is how we define this file:
The configuration file for mod_evasive, an Apache module designed to provide protection against Distributed Denial of Service (DDoS) attacks, has the following parameters:.
DOSHashTableSize: The size of the hash table used to track IP addresses and URLs. The default value is 3097, but you can adjust this value based on the size of your site and the number of clients you expect to handle.
DOSPageInterval: The minimum time in seconds between page requests from the same client. A client that requests the same page more frequently than this value will be blocked. In this configuration, the interval is set to 1 second.
DOSSiteInterval: The minimum time in seconds between requests to different pages from the same client. A client that requests multiple pages more frequently than this value will be blocked. In this configuration, the interval is set to 1 second.
DOSBlockingPeriod: The amount of time in seconds that a client will be blocked after triggering the module's protection. In this configuration, the blocking period is set to 10 seconds.
DOSPageCount: The number of requests for the same page from the same client that will trigger the module's protection. In this configuration, the limit is set to 2 requests.
DOSSiteCount: The number of requests for different pages from the same client that will trigger the module's protection. In this configuration, the limit is set to 50 requests.
DOSSystemCommand: A command that will be executed when a client is blocked. In this configuration, the command will write a log message to a file.
DOSWhitelist: A list of IP addresses that should not be blocked by the module. In this configuration, several CERN IP addresses are whitelisted to ensure that they are not blocked.
Action item for CMSWeb team,
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*Googleusercontent.*$ [NC]
RewriteRule .* - [F,L]
</IfModule>
Or to simply use fail2ban. In that case, the action items would be:
fail regex
.Further tasks:
Something strange happens with wmagent's manage script, not placing any options gives:
http://http://se9.accre.vanderbilt.edu:5984:5984/workloadsummary
to wmagent-mod-config which fails because it's unparsable. Something's getting matched improperly
Do not run CherryPy threads in deployment config
Reason: These will be running from the k8s deployment instead (and we can't run 2 at the same time).
https://github.com/dmwm/deployment/blob/master/reqmon/config.py#L93
The devvm deployment fails with :
useradd: UID 100001 is not unique
Currently there are 13 UIDs number that clash with the CERN users IDs:
-bash-4.2$ for uid in grep add_local_user /tmp/foo/cfg/system/deploy.orig | cut -d" " -f4
; do getent passwd $uid; done | wc -l
13
The frontend disables keepalive for Chrome on OSX, which I just noticed. You can test it yourself via:
curl -v -s -o /dev/null -k --key /tmp/x509up_u112870 --cert /tmp/x509up_u112870 'https://cmsweb.cern.ch/' -H "Connection: keep-alive"
curl -v -s -o /dev/null -k --key /tmp/x509up_u112870 --cert /tmp/x509up_u112870 'https://cmsweb.cern.ch/' -H "Connection: keep-alive" -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'
These yield:
No user-agent:
> GET / HTTP/1.1
> Host: cmsweb.cern.ch
> Accept: */*
> Connection: keep-alive
**snip**
< HTTP/1.1 200 OK
< Date: Tue, 01 Mar 2016 17:33:51 GMT
< Server: Apache
< Last-Modified: Mon, 25 Jan 2016 14:49:41 GMT
< ETag: "8d7-52a29adc5cb40"
< Accept-Ranges: bytes
< Content-Length: 2263
< CMS-Server-Time: D=678 t=1456853631811320
< Keep-Alive: timeout=5, max=100
< Connection: Keep-Alive
Chrome user-agent:
> GET / HTTP/1.1
> Host: cmsweb.cern.ch
> Accept: */*
> Connection: keep-alive
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36
**snip**
< HTTP/1.1 200 OK
< Date: Tue, 01 Mar 2016 17:33:58 GMT
< Server: Apache
< Last-Modified: Mon, 25 Jan 2016 14:49:41 GMT
< ETag: "8d7-52a29adc5cb40"
< Accept-Ranges: bytes
< Content-Length: 2263
< CMS-Server-Time: D=1290 t=1456853638447423
< Connection: close
This appears to be related to the following line
deployment/frontend/frontend.conf
Line 1 in dfb5840
Safari
in Chrome's user-agent of User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36
A simple solution would be to add BrowserMatch Chrome !nokeepalive
to the subsequent line, though it's not clear that would do 100% the right thing. I'd test it myself, but my dev frontend on my machine isn't cooperating right now ๐ฆ. It'd be neat (?) to get a list of unique user-agents to work with. That'd help figure out the right matches needed.
Hi Diego, Alan,
is there any reason why -o PubkeyAuthentication=no is used in
https://github.com/dmwm/deployment/blob/master/admin/InstallDev#L71 ? We (DBS 3) have set-up an environment, that allows us to install dbs3 vms running under the dbs3 service account without knowing the password by using public keys. Unfortunately, that is not working anymore. Do you think we can remove that option again, at least for the InstallDev?
Thanks,
Manuel
I would like to be notified when a new PR is submitted for /deployment. Would it be possible to add me to the list of subscribers?
Thanks!
Tomas (CMS DQM core)
false alarm. it failed since another couch was running, closing the issue.
@BrunoCoimbra can you update https://github.com/dmwm/deployment/blob/master/frontend/backends-prod.txt and include redirect rules for vocms0106 and vocms021 please?
Running the following command
[se2] ~/wmagent/quickinstall $ ./cfg/Deploy -R [email protected] -s sw -A slc5_amd64_gcc461 -t v01 /fs1/home/meloam/wmagent/quickinstall wmagent
INFO: 20120907153151: starting deployment of: wmagent
INFO: deploying wmagent - variant: default, version: default
INFO: bootstrapping comp software area in /fs1/home/meloam/wmagent/quickinstall/v01/sw
INFO: bootstrap successful
No error message is displayed, even though the deploy failed. You have to look in .deploy to see what happened:
<snip>
Get:17 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+mysql+5.1.58-comp 1-1 [16.2MB]
Get:18 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+py2-mysqldb+1.2.3c1-comp8 1-1 [106kB]
Get:19 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/cms cms+wmcore-db-mysql+1-comp55 1-1 [7779B]
Get:20 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+oracle+11.2.0.3.0__10.2.0.4.0 1-1 [64.3MB]
Get:21 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+py2-cx-oracle+5.1-comp8 1-1 [157kB]
Get:22 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/cms cms+wmcore-db-oracle+1-comp50 1-1 [7929B]
Get:23 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+cherrypy+3.1.2-comp5 1-1 [585kB]
Get:24 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+py2-cheetah+2.4.0-comp5 1-1 [413kB]
Get:25 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+yui+2.9.0 1-1 [2492kB]
Get:26 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/cms cms+wmcore-webtools+1-comp54 1-1 [8160B]
Get:27 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+py2-psutil+0.3.0-comp5 1-1 [76.7kB]
Get:28 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+zeromq+2.1.9-comp 1-1 [735kB]
Get:29 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/external external+py2-zmq+2.1.9-comp5 1-1 [639kB]
Get:30 http://cmsrep.cern.ch cmssw/comp/apt/slc5_amd64_gcc461/cms cms+wmagent+0.9.13 1-1 [9259B]
Fetched 157MB in 1m39s (1570kB/s)
Executing RPM (/fs1/home/meloam/wmagent/quickinstall/v01/sw/slc5_amd64_gcc461/external/apt/429-comp/bin/rpm-wrapper -Uvh -r /fs1/home/meloam/wmagent/quickinstall/v01/sw --force --prefix /fs1/home/meloam/wmagent/quickinstall/v01/sw --ignoreos --ignorearch --force --prefix /fs1/home/meloam/wmagent/quickinstall/v01/sw --ignoreos --ignorearch --oldpackage)...
error: Failed dependencies:
libuuid.so.1()(64bit) is needed by external+zeromq+2.1.9-comp-1-1.x86_64
Executing,: Sub-process /fs1/home/meloam/wmagent/quickinstall/v01/sw/slc5_amd64_gcc461/external/apt/429-comp/bin/rpm-wrapper returned an error code (30)
That it failed is something separate. (I'l have to hunt that down)
This is probably a minor thing, but OSX doesn't support the -f option for readlink, which means the wmagent manage script bombs out at the very first line. I'm not sure of the shell trickery to make this work conditionally
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.