cloudfoundry / routing-release Goto Github PK

This is the BOSH release for cloud foundry routers

License: Apache License 2.0

HTML 28.45% Shell 11.39% Ruby 55.42% Go 4.73%

bosh-release routing-release cloudfoundry cff-wg-app-runtime-platform

routing-release's Introduction

Routing Release

This repository is a BOSH release for deploying Gorouter, TCP Routing, and other associated tasks that provide HTTP and TCP routing in Cloud Foundry foundations.

Downloads

Our BOSH release is available on bosh.io and on our GitHub Releases page.

Getting Help

If you have a concrete issue to report or a change to request, please create a Github issue on routing-release.

Issues with any related submodules (Gorouter, Routing API, Route Registrar, CF TCP Router) should be created here instead.

You can also reach us on Slack at cloudfoundry.slack.com in the #cf-for-vms-networking. channel.

Contributing

See the Contributing.md for more information on how to contribute.

Routing Operator Resources
Routing App Developer Resources
Routing Contributor Resources

Routing Operator Resources

High Availability

The TCP Router and Routing API are stateless and horizontally scalable. The TCP Routers must be fronted by a load balancer for high-availability. The Routing API depends on a database, that can be clustered for high-availability. For high availability, deploy multiple instances of each job, distributed across regions of your infrastructure.

Routing API

For details refer to Routing API.

Metrics

For documentation on metrics available for streaming from Routing components through the Loggregator Firehose, visit the CloudFoundry Documentation. You can use the NOAA Firehose sample app to quickly consume metrics from the Firehose.

Routing App Developer Resources

Session Affinity

For more information on how Routing release accomplishes session affinity, i.e. sticky sessions, refer to the Session Affinity document.

Headers

X-CF Headers describes the X-CF headers that are set on requests and responses inside of CF.

routing-release's People

Contributors

Stargazers

Watchers

Forkers

fordaz 212442724 ericpromislow dkoper digideskio orange-cloudfoundry atulkc mook-as shalako dsabeti frozzygoh fujitsu-cf comdaze kaixiang hpcloud dellemc-trigr vmware-archive davidhiston abbyachau samof76 acidburn0zzz nota-ja malmanzor barthy1 wendorf howshit birowo i072737 rakutentech brightzheng100 benmoss swetharepakula govau alphagov drnic gmflau vcgato29 vmlinuxer sauravmndl-zz ljfranklin kayrus emalm iredelmeier ogrand armfoundry keymon robbo10 licshire alex-slynko lucaspinto aminjam petereltgroth dlresende shamus julian-zucker sanhitamukherji gberche-orange vreynolds tlwr domdom82 cf-gemfire-org heycait infra-red rhardt-pivotal 7hunderbird isabella232 iuztemur soha-albaghdady stefanlay tuanpham91 0401-7r011 asinh009 paas-ta ctlong pet-forks dtimm h0nig ywei2017 cunnie peterellisjones stijnvet n-shaw kecirlotfi manny27nyc joergdw ramamk1 mrizwanshaik peanball mariash ebroberson reneighbor sap-contributions geofffranks loggregator zpascal baluusamy benjaminguttmann-avtq amansingh066 thelangley thomasthal

routing-release's Issues

Error filling in template `router_configurer.yml.erb' for `tcp_router_z1/0'

When I run command bosh -n deploy, there is the following error, I tried final available release and my release, got the same error, how to resolve it?

  Started preparing configuration > Binding configuration. Failed: Error filling in template `router_configurer.yml.erb' for `tcp_router_z1/0' (line 4: Can't find property `["router.router_configurer.tcp_router_secret"]') (00:00:00)

Error 100: Error filling in template `router_configurer.yml.erb' for `tcp_router_z1/0' (line 4: Can't find property `["router.router_configurer.tcp_router_secret"]')

Configure router for in isolation segment on bosh v1

Thanks for submitting an issue to routing-release. We are always trying to improve! To help us, please fill out the following template.

Issue

According to guideline for creating Routing Isolation Segment
Ref : https://docs.cloudfoundry.org/adminguide/routing-is.html

it is neccessary to config different network/subnet and router for each isolation segment.
This can be done by using cloud-config.
I also learn that all deployments must be converted to use new format if needed to use cloud config.
https://bosh.io/docs/cloud-config.html

But my current deployment uses old format (v1)
Is it possible to configure routers for isolation segments by specifying information directly to deployment manifest v1?

[provide quick introduction so this issue can be triaged]

Context

[provide more detailed introduction]

Steps to Reproduce

[ordered list the process to finding and recreating the issue, example below]

Expected result

[describe what you would expect to have resulted from this process]

Current result

[describe what you you currently experience from this process, and thereby explain the bug]

Possible Fix

[not obligatory, but suggest fixes or reasons for the bug]

`name of issue` Output Results

[if relevant, include a screenshot]

As a standalone release, how do I deploy it?

The built in scripts seem to assume all sorts of existing systems are running (diego, consul, cf).

For the standalone case - where I want to run cf-tcp-router in front of a cluster of service instances - how do I run this BOSH release?

TCP v164 issue

We have upgraded tcp-routing from v 0.146.0 to v 0.163.0 with cf v 283, without any issues.
But on upgrading tcp-routing from v 0.163.0 to v 0.164.0, facing an issue to curl the apps.
We are getting error as "connection refused" with cf v 283.
Have commented out "tcp_emitter" jobs, just kept the oauth_secret property for tcp_emitter inside routing manifest.
We also checked the logs inside tcp_router, found an error as "tcp-router.watcher.failed-to-get-next-routing-api-event".
Can someone please suggest some solution for it ??

Use json file extension instead of yml

The route registrar depends on registrar_settings.yml.erb. While I know that YAML is a superset of JSON, the internals of the template specifically use JSON. Why not just name the file with a json extension?

Working of mTLS

Hi team,

Its a question rather than issue regarding the working of mTLS.

I just wanted in how many we are using mTLS in cloud foundry i.e.
i have some use cases can you please help how i can use mTLS in given below cases.

If i push a application on cloud foundry ,do i need to add certs in application itself. If yes what changes would i need to do it cloud foundry deployment. Is Credhub is mandatory requirement and how its going to work?
From docs what i can understand there is use case of mTLS to make platform secure for inner communication too.So https://github.com/cloudfoundry/cf-release/blob/master/on-tls-certificates.md this for securing internal communication between components.
If i want to communicate third party apps to application running cloud foundry how mtls can help?

4.If i want to communicate 2 apps using same domain name running on cloudfoundry with each other is ,there is requirement of mtls to make connection secure.

5.If i want to communicate 2 apps using different domain name running on cloudfoundry with each other, is there is requirement of mtls to make connection secure.

thanks in advance

cf router-groups command returns 404.

Hi All ,
I am new to cloud foundry .
I Installed tcp router and routing api component along with go router .
When ever i execute cf router-groups i am getting below error message .Any guidance is appreciated .
Getting router groups as admin ...

FAILED
Failed fetching router groups.
Server error, status code: 404, error code: , message:

cf curl v2/info output is
covladmins-MacBook-Pro-6:~ nsharma$ cf curl v2/info
{
"name": "",
"build": "",
"support": "",
"version": 0,
"description": "Cloud Foundry sponsored by Pivotal",
"authorization_endpoint": "XXX,
"token_endpoint": "XXXXX",
"min_cli_version": null,
"min_recommended_cli_version": null,
"api_version": "2.78.0",
"app_ssh_endpoint": "XXXXX",
"app_ssh_host_key_fingerprint": "XXXX",
"app_ssh_oauth_client": "ssh-proxy",
"routing_endpoint": "https://XXXXX/routing",
"logging_endpoint": "XXXXX",
"doppler_logging_endpoint": "XXXXX,
"user": "efe860d4-f640-4d0d-87bb-d2c362f0ee59"
}

Another unused blob

Similar to #60, the blob go1.6.3.linux-amd64.tar.gz is unused - go1.7.4.linux-amd64.tar.gz is referenced instead.

An issue with 'client_cert_validation: require'

Hi,

I'm trying the scenario where there are a couple of routers set to client_cert_validation: require and ELB is configured to use those routers for a particular app domain that needs to force mTLS. The problem is that when providing a wrong cert I can still see the page (no cert provided seems to correctly lead to a cert issue denying the page). I'm facing the same issue when all routers are centrally configured with properties.router.client_cert_validation: require.

The relevant parts of the manifest used look like this, please let me know if there are other dependencies:

- instances: 1
  name: router_z1
  properties:
    router:
      client_cert_validation: require

- instances: 1
  name: router_z2
  properties:
    router:
      client_cert_validation: require
    
- instances: 1
  name: router_z3
  properties:
    router:
      client_cert_validation: request
    
properties:
  router:
    cipher_suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256:TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384:TLS_RSA_WITH_AES_128_CBC_SHA
    ca_certs: |
      -----BEGIN CERTIFICATE-----
 
      -----END CERTIFICATE-----
    disable_http: true
    enable_ssl: true
    forwarded_client_cert: sanitize_set
    min_tls_version: TLSv1.1
    ssl_skip_validation: false

The router_z1 also keeps returning these type of repeating messages: tls: client offered an unsupported, maximum protocol version of 301, remote error: tls: bad certificate, tls: client didn't provide a certificate, though not necessarily corresponding to me hitting the page on the browser. Any ideas?

Thanks and Regards,
Georgi

release and system logrotate rules result in file duplication

Issue

The routing-release installs a cron job that runs logrotate on the files in /var/vcap/sys/log/gorouter/*.log. At the same time there is the system wide logrotate running on the same files. The later has been installed by the bosh-agent. Also both logrotates are configured slightly different: The logrotate run by this cron job has the flag delaycompress whereas the system wide logrotate does not use this flag.

Since the last stemcell update that introduces this fix to logrotate this causes a problem:
The logrotate job of the routing-release leaves a file access.log.1 in the monitored directory and the other logrotate job (configured through the bosh-agent) will copy that file to a file named access.log.1-2017xxxxxx.backup when it also does the rotation (this will happen if the access.log grew above the size threshold of 50MiB within the last 5 minutes). This file will remain in the directory and eventually the ephemeral disk of the VM will fill up. That in turn causes the router to halt.

Context

routing-release installs a crontab job with this gorouter_logrotate.cron.erb and configures logrotate with this logrotate.conf.erb.

Bosh-agent uses this template to setup logrotate for all logfiles of jobs.

Behavior of logrotate has been changed lately with this change.

Steps to Reproduce

Deploy routing-release as it is used within cf-release v265 on stemcell 3421.11.
Switch on access log in the gorouter job
Create load on the deployed landscape in such a way that the access.log file grows by more than 50 MiB in 5 minutes.

Expected result

No files named *.backup are created in /var/vcap/sys/log/gorouter/ and the ephemeral disk does not fill up. The gorouter continues to work normally.

Current result

In the directory /var/vcap/sys/log/gorouter files with names *.backup are being created every now and then.
And as soon as the ephemeral disk is full, the gorouter will answer requests with http status code 503.

Possible Fix

Remove the installation the cron job and call to logrotate within this release.

`name of issue` Output Results

Here is a screenshot showing the ephemeral disks of four router nodes filling up until I applied a workaround:

Workaround

Configure routing-release in such a way that it doesn't do logrotation. That is set the property properties.router.logrotate.size to an insane big value, e.g. 4G.

hard-wired paths in scripts/generate-bosh-lite-manifest

scripts/generate-bosh-lite-manifest assumes default paths are in $HOME/workspace/...

There's no reason to impose parent pathnames on other developers. One could use $(git rev-parse --show-toplevel) to find the location of this directory, or better still, provide an option to specify the cloudfoundry dir (I for one keep my cloudfoundry/ and cloudfoundry-incubator/ repo's in separate directories).

Docs refer to a non-existent directory: cf/cf-release/bosh-lite/deployments

Probably something obvious is missing between steps 4 and 5 of https://github.com/cloudfoundry-incubator/cf-routing-release/blob/develop/README.md:

4. Install spiff, a tool for generating BOSH manifests. spiff is required for running the scripts in later 
steps. Stable binaries can be downloaded from Spiff Releases.
...
5. Generate and target router's manifest:

cd ~/workspace/cf-routing-release
./scripts/generate-bosh-lite-manifest

It expects cf.yml to be present in ~/workspace/cf-release/bosh-lite/deployments...

Except there is no deployments directory in cf-release/bosh-lite, just a directory called stubs. I don't see a cf.yml in any of the cf directories I've downloaded. Should I be using spiff to generate the yaml files for ./scripts/generate-bosh-lite-manifest ?

hard-wired endpoints and ports in specs need to be configurable

https://github.com/cloudfoundry-incubator/routing-release/blob/74bb5adc658d0df6104d6fe2ba747254b2ab0d46/jobs/router_configurer/templates/router_configurer.yml.erb#L12-L13 and https://github.com/cloudfoundry-incubator/routing-release/blob/74bb5adc658d0df6104d6fe2ba747254b2ab0d46/jobs/tcp_emitter/templates/tcp_emitter.yml.erb#L11-L12 hardwire the endpoint for the routing-api service.

These values should be added to the jobs' respective specs, and the constant values replaced with ERB references to the properties.

[Feature Request] allow the route-registrar to be stopped on drain

Issue

Route-registrar cannot be shutdown on drain.

Context

In my bosh release I have two jobs, my application and the route-registrar. The application job has a drain script that calls into the application to prepare it for shutdown. The problem is that the drain process will not complete as long as their are active users accessing the app running on the vm. I would like a way of configuring route-registrar to unregister the route on drain or suggestions about other ways of avoiding this problem.

Steps to Reproduce

Have a job with a drain script that will not terminate while there are active requests to the job
Have a route-registrar job that will register the route with the go router.
Restart the bosh vm
Access to the vm through the registered route is still available while drain is run. New requests keep coming in and the drain script cannot complete.

Expected result

I would like a way of unregistering the route on drain so that the application becomes inaccessible through the go router. Or if that is not feasible for some reason a suggestion of other ways I can remove the route.

Current result

The drain script does not complete since the route is still active.

Possible Fix

To make sure this was my problem I implemented an optional drain script for route-registrar

https://github.com/malmanzor/routing-release/tree/optional-drain

I can create a PR if my solution is acceptable.

Thank you!

PCRE version outdated

Hello, the PCRE version used[1] 8.37 has known vulnerabilities [2] and should be updated to the latest version (now 8.39)

[1] https://github.com/cloudfoundry-incubator/routing-release/search?utf8=%E2%9C%93&q=pcre
[2] https://www.cvedetails.com/cve/CVE-2016-3191/

A problem with the acceptance tests

Hi,

Is it possible to run the acceptance tests errand when the CF and Routing deployments are on AWS, not bosh-lite? I'm running routing release 0.143.0 and CF release 250 and the actual TCP routing works as expected, however the acceptance tests bosh errand fails with this:

• Failure [3.093 seconds]
Registration
/var/vcap/packages/acceptance_tests/src/code.cloudfoundry.org/routing-acceptance-tests/http_routes/registration_test.go:86
  HTTP Route
  /var/vcap/packages/acceptance_tests/src/code.cloudfoundry.org/routing-acceptance-tests/http_routes/registration_test.go:85
    can register, list, subscribe to sse and unregister routes [It]
    /var/vcap/packages/acceptance_tests/src/code.cloudfoundry.org/routing-acceptance-tests/http_routes/registration_test.go:84

    Expected error:
        <*errors.errorString | 0xc4201d1060>: {
            s: "attempt to read from closed buffer",
        }
        attempt to read from closed buffer
    not to have occurred

Summarizing 1 Failure:

[Fail] Registration HTTP Route [It] can register, list, subscribe to sse and unregister routes
/var/vcap/packages/acceptance_tests/src/code.cloudfoundry.org/routing-acceptance-tests/http_routes/registration_test.go:42

Ran 1 of 1 Specs in 3.098 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 0 Skipped

I've redacted the private details, the acceptance tests properties section in the manifest looks like this:

  acceptance_tests:
    addresses:
    ----
    cloud_controller:
      admin_password: ---
      admin_user: ---
      api: api.---
      apps_domain: ---
      use_http: false
    skip_ssl_validatio: true
    system_domain: ----
    include_http_routes: true

Any clues what I'm doing wrong? Thanks in advance.

Georgi

high CPU load due to prepend-timestamp

Issue

prepend-datetime in syslog_utils.sh in our testing can use up to 15% of CPU time and cause high loadavg under heavy traffic.

Context

prepend-timestamp uses read to read lines from stdin and this causes very high loadavg because read does one syscall per input byte

Steps to Reproduce

cause gorouter to emit many error logs (e.g. by sending traffic to misbehaving applications)
notice how the bash script run_gorouter uses a sizable amount of CPU time

Expected result

lower CPU loadavg

Current result

high CPU loadavg due to prepend-datetime

Possible Fix

replace prepend-timestamp with ts or awk (see first two answers from https://stackoverflow.com/questions/21564/is-there-a-unix-utility-to-prepend-timestamps-to-stdin).

`X-Cf-App-Instance` header can not be used on DEA applications.

Thanks for submitting an issue to routing-release. We are always trying to improve! To help us, please fill out the following template.

Issue

X-Cf-App-Instance header can not be used on DEA applications.

As a feature introduced in 0.139.0 (ticket), it is said that developer can use X-Cf-App-Instance to specify a certain backend. We've just upgraded our deployment to cf-release 246 but found that this feature only works on Diego applications.

Our deployment contains both DEA and Diego, so this confused us. I'm not sure the "doesn't work on DEA app" part is a bug or a feature. But since there are many more CF deployments other than PCF, which could be probably still running DEA, this behavior could really confuse users.

Context

[provide more detailed introduction]

Steps to Reproduce

Push an application to CF (> v245), app should be host on a DEA.
Follow the DOC-1 or DOC-2, curl with -H "X-Cf-App-Instance: APP_GUID:APP_INDEX"

Expected result

Request should reach APP:APP_INDEX, then correct result returned to developer.

Current result

Get a 404 Not Found: Requested route ('app_url') does not exist.

Possible Fix

Add support for this feature on DEA application.
If it is decided that DEA application should not support such feature, this should be documented in the both docs mentioned above.

`name of issue` Output Results

[if relevant, include a screenshot]

Intermittent RATS failure on BOSH lite environments

Issue

The release integration team sees that Tcp Routing / multiple-app ports / single external port with multiple app ports / should switch between ports fails in roughly 1 in 10 runs on our BOSH lite environments, with a "connection reset by peer" error.

This issue does not occur on our full BOSH GCP environments. We've yet to conduct a more in-depth exploration.

`RATS Failure` Output Results

https://release-integration.ci.cf-app.com/teams/main/pipelines/cf-deployment/jobs/lite-rats/builds/362
https://release-integration.ci.cf-app.com/teams/main/pipelines/cf-deployment/jobs/lite-rats/builds/351
https://release-integration.ci.cf-app.com/teams/main/pipelines/cf-deployment/jobs/lite-rats/builds/335
https://release-integration.ci.cf-app.com/teams/main/pipelines/cf-deployment/jobs/lite-rats/builds/328
https://release-integration.ci.cf-app.com/teams/main/pipelines/cf-deployment/jobs/lite-rats/builds/315

Unused blob

cf-cli/cf-linux-amd64.tgz is unused as of c895216 but it was not removed from the blobs file.

https://github.com/cloudfoundry-incubator/routing-release/blob/release-candidate/config/blobs.yml#L10-L13

Documentation for surgical routing is wrong

Issue

Doc implies APP_INDEX starts at 1, actually starts at 0.

Context

See http://docs.cloudfoundry.org/devguide/deploy-apps/routes-domains.html#surgical-routing

Steps to Reproduce

Doc says:

APP_INDEX, for example 1, 2, or 3, is an identifier for a particular app instance. Use the CLI command cf app APP-NAME to get statistics on each instance of a particular app.

Expected result

Trying to use an APP_INDEX of (number of instances), e.g. APP_INDEX 10 with 10 instances produces an error. APP_INDEX goes from zero to (number of instances -1). In fact, cf app <app name> which the doc references even show the app count going from 0 on up.

Current result

Doc is wrong (or implies incorrect information)

Possible Fix

Fix docs

Status endpoint only enabled if status.user and status.password specified

It also appears that status.user and status.password are not actually used to get status anymore.

Issue while using routing release version 0.168.0

Thanks for submitting an issue to routing-release. We are always trying to improve! To help us, please fill out the following template.

Issue

Facing issue w.r.t. routing release version, not able to access the app, getting "connection refused error"
while connecting.

Using cf release version to be 283.

Thanks in advance.

Add support for connecting to Routing API DB over TLS

We have a few environments which have strict security policies in place which only allow secure database connections regardless of the data classification or even if the data itself is encrypted.

A recent commit for the UAA release has enabled TLS connections for the UAA DB.
https://www.pivotaltracker.com/n/projects/997278/stories/145563953

Jon Price
Intel Corp

Error message on invalid app index is wrong

Issue

Error message on invalid app index is wrong.

Context

Giving an invalid app index returns a 404 with "Not Found: Requested route ('URL to apps') does not exist."

Steps to Reproduce

Make a request to an app with X-CF-APP-INSTANCE: APP_GUID:APP_INDEX. (Per the docs here: http://docs.cloudfoundry.org/devguide/deploy-apps/routes-domains.html#surgical-routing) x

Expected result

Error message should say "requested instance does not exist).

Current result

If the app index is invalid, instead of returning a message saying so, you can an error message that says the request route (i.e. the URL) does not exist.

Possible Fix

Fix the error message. :)

`name of issue` Output Results

N/A

Move to the cloudfoundry GitHub org

That was kinda confusing, but I'm glad I found you!

Garden-runc support

Hello,

When deploying CF on bosh lite using garden-runc, we discovered a problem with the router job.

/var/vcap/jobs/gorouter/bin/gorouter_ctl: line 35:
/proc/sys/net/ipv4/tcp_fin_timeout: No such file or directory

It seems the start script for the router is not detecting that it is running in the container and failing to write a proc file. The helper script to find out if you're running in a container src/routing_utils/pidutils.sh exposes a function which doesn't support detecting garden-runc container names.

This is what /proc/self/cgroups looks like in garden-linux

/:~$ cat /proc/self/cgroup
12:pids:/
11:hugetlb:/
10:net_prio:/
9:perf_event:/
8:net_cls:/
7:freezer:/
6:devices:/instance-3g3mjd8vjti
5:memory:/instance-3g3mjd8vjti
4:blkio:/
3:cpuacct:/instance-3g3mjd8vjti
2:cpu:/instance-3g3mjd8vjti
1:cpuset:/instance-3g3mjd8vjti

This is what /proc/self/cgroups looks like in garden-runc

/:~$ cat /proc/self/cgroup
12:pids:/c2014f15-e490-4124-75e4-e5b5a08ca89f
11:hugetlb:/c2014f15-e490-4124-75e4-e5b5a08ca89f
10:net_prio:/c2014f15-e490-4124-75e4-e5b5a08ca89f
9:perf_event:/c2014f15-e490-4124-75e4-e5b5a08ca89f
8:net_cls:/c2014f15-e490-4124-75e4-e5b5a08ca89f
7:freezer:/c2014f15-e490-4124-75e4-e5b5a08ca89f
6:devices:/c2014f15-e490-4124-75e4-e5b5a08ca89f
5:memory:/c2014f15-e490-4124-75e4-e5b5a08ca89f
4:blkio:/c2014f15-e490-4124-75e4-e5b5a08ca89f
3:cpuacct:/c2014f15-e490-4124-75e4-e5b5a08ca89f
2:cpu:/c2014f15-e490-4124-75e4-e5b5a08ca89f
1:cpuset:/c2014f15-e490-4124-75e4-e5b5a08ca89f

Is support for garden-runc on the roadmap for this release?

Regards

Alex and @Samze
CC: @teddyking

HTTP apps not getting pushed on cf once tcp is down.

Hi team,

We were facing issue while pushing http apps on cf when tcp was down.
First we were using cf v281 and tcp-routing v146.We simply configured routing_api set to 'true' in cf manifest to enable routing while pushing tcp apps.
Because of some reason,tcp-routing went down,so we were not able to push tcp apps on cf.
Also we were facing error while pushing http apps on cf.

As suggested by community ,we upgraded cf v to v283 and tcp_routing to v163.
Now suppose if because of some reason tcp is not available or down,then does there exist some way so that we can push http apps individually on cf without tcp .?

Thanks,

[Feature request] We would like to be able to override the release version while generating manifest

Hi:

It will be great if we could have a separate stub file which we can specify versions for routing/cf releases, instead of using latest version all the times.

This will make the automation process easier to dictate the version it wants to deploy.

Thoughts?

Two questions about tcp router

Hello,
I have the following questions for tcp router, please help to answer:
(1)When I execute command "cf push app1 --no-start -d tcp.bosh-lite.com --random-route, there is the following error, what do I need configure for this?

Creating random route for tcp.bosh-lite.com...
FAILED
Server error, status code: 400, error code: 310009, message: You have exceeded the total reserved route ports for your organization's quota.

(2) I only see the command to create a shared-domain for the TCP router group, is there a way to update an existing share-domain(for example bosh-lite.com) for the TCP router group?

General questions about routing-release

Hi,
I am note sure if I should have put that on the mailing list, or not, but I will ask the question here. Is there a timeline/roadmap for the routing release to be integrated into the the cf-release, regarding HA Proxy and the PostgreSQL release? Or will it always be a separate release in future?
Does the current version also run with postgres, if I configure it in correctly in the property section?

properties:
  routing_api:
    sqldb:
      host: <IP of SQL Host>
      port: <Port for SQL Host>
      type: mysql
      schema: <Schema name>
      username: <Username for SQL DB>
      password: <Password for SQL DB>

Is the manual configuration of the database stable for upgrade in future release?

gorouter closing connection to client before writing the whole response body

we've traced down a possible unexpected behavior (in our eyes) to gorouter.
@jriguera can provide more details about our cloudfoundry/gorouter env if needed.

the scenario is a client uploading a large file, with the server limiting to a max file size (say 2 mb) by reading up-to 2mb from the upstream.
if this limit is reached, the server immediately responds with 413, connection:close header and some response body (of around 50k size).
this results in closing the conn from the server side, AFTER writing the whole response body (so its available to the client side).

when making these kind of requests to the server directly (via curl or a minimal client impl) we get the whole response on the client side. even if the client is still in the middle of uploading (didnt finish the req).

when doing the same with gorouter in the middle, we either get an err or a partial response body.
the errors vary but are usually either curl: (55) Send failure: Broken pipe or curl: (56) Recv failure: Connection reset by peer.

I've posted the above ^^ on slack and was prompted by @rosenhouse to open an issue here.
he also mentioned:

I believe this is known behavior in Gorouter (and other HTTP reverse proxies, for that matter envoyproxy/envoy#2067 )

http routing failing when routing_api vms go down

Hi,

Is there any automatic backup option in place for the situation when all routing_api vms are down, so that the core http routing and domain functionality of CF can remain working instead of failing due to the unavailable Routing API?

Thanks.

Georgi

curl with `X-Cf-App-Instance` header but non-corresponding `Host` header would cause panic.

Issue

curl with X-Cf-App-Instance header but non-corresponding Host header would cause panic.

This is reproduced on cf-release v246 which delivered with routing-release v0.141.0

Context

Seems gorouter doesn't validate the Host header when X-Cf-App-Instance is provided in the header.

Steps to Reproduce

curl <gorouter_ip>:<gorouter_port> -H "X-CF-APP-INSTANCE: APP_GUID:APP_INDEX" -H "Host: A_WRONG_URL"

If APP_GUID:APP_INDEX is valid, any non-bind-to-the-app Host will trigger the panic.
If APP_GUID:APP_INDEX is invalid, any non-exist-route Host will trigger the panic.

Expected result

gorouter should not panic. Instead, it should return a 404.

Current result

gorouter panics.

Possible Fix

The validation of Host should be performed despite whether X-CF-APP-INSTANCE is set or not

`name of issue` Output Results

[if relevant, include a screenshot]

go test on route-registrar fails on missing package code.cloudfoundry.org/localip

Issue

I was attempting to build route-registrar from the github release tar file. The go build command worked just fine. When I tried using go test I got a message about a missing package. The package getcode.cloudfoundry.org/localip should be included into the release for route-registrar.

Context

I was expecting all the pieces to build route-registrar would exist in the github release.

Steps to Reproduce

go install code.cloudfoundry.org/route-registrar

go test code.cloudfoundry.org/route-registrar

code.cloudfoundry.org/route-registrar

src/code.cloudfoundry.org/route-registrar/main_test.go:13:2: cannot find package "code.cloudfoundry.org/localip" in any of:
/goroot/src/code.cloudfoundry.org/localip (from $GOROOT)
/tmp/build/a94a8fe5/gopath/src/code.cloudfoundry.org/localip (from $GOPATH)
/gopath/src/code.cloudfoundry.org/localip
FAIL code.cloudfoundry.org/route-registrar [setup failed]

go get code.cloudfoundry.org/localip

go test code.cloudfoundry.org/route-registrar

ok code.cloudfoundry.org/route-registrar 6.173s

Expected result

The route-registar should build with go test and go build modes.

Current result

The route-registrar program does not build in go test mode.

CF does not recover from full shut down due to route_registrar

This issue was originally raised here.

The mkdir -p ${RUN_DIR} should probably be done in both pre-start and in whatever monit start route_registrar invokes.

This is an issue for recovery from an unplanned full stop, and perhaps also for intentional stops and starts. /cc @cppforlife @mkocher @dsboulder

Thanks to @neilhwatson for originally reporting this issue.

Please allow deployment manifest to expose UAA clients via BOSH links

Issue

As a BOSH operator, I want to deploy CFCR (Kubernetes on BOSH, formerly called Kubo). It has a routing option that allows Kubernetes Services to be exposed to the CF TCP router, enabled via a BOSH operator file [1]. Currently a BOSH operator must use credhub CLI, a Post-It node, and a local vars.yml file on their computer to transfer the UAA client/secret from their cf deployment across to their kubo deployment. This is counter to the goals and dreams of BOSH links, credhub, and Post-It notes.

[1] https://github.com/cloudfoundry-incubator/kubo-deployment/blob/master/manifests/ops-files/cf-routing.yml#L46-L47

Solution

Instead, kubo-deployment needs to be able to use BOSH links to discover the UAA client/secret to be used to connect to the TCP routing API + UAA.

In order to allow BOSH links to expose additional information/properties/credentials, we will need to add a uaa_clients property to tcp_router job spec file. It does not need to be used within tcp_router itself, rather the job property exists only to be passed thru from the cf deployment manifest to other deployments via shared BOSH links.

After this issue is resolved in tcp_router, BOSH operators will be able to expose their routing UAA clients from cf deployment to their other BOSH deployments without manually touching credhub nor using Post-It notes.

Forwarding XFCC header to route-service.

Issue

Client Certificate received in a request are no getting forwarded to route-service via router.

Context

After setting newly introduced property router.forwarded_client_cert found that,
XFCC header can be found in request headers sent to an app directly

client --(cert)--> Router--(XFCC header)--> end_app

But gorouter don't set XFCC header when proxy-ing request via routing-service attached to an application.

client --(cert)--> Router--(no XFCC header)--> Route-service---> end_app

Steps to Reproduce

Router configuration (using v0.168.0)

router:
  client_cert_validation: require
  disable_http: true
  enable_ssl: true
  forwarded_client_cert: sanitize_set
  min_tls_version: TLSv1.1
  ssl_skip_validation: false

Push demo_app list all header on request
Push and bind any route-service and to demo app. List headers coming at route-service after requesting demo_app.

It seems router is setting following 3 headers only while forwarding request to route-service. Probably conditional XFCC header need to be added.
https://godoc.org/code.cloudfoundry.org/gorouter/routeservice#pkg-constants

Thanks,
Mandar K.

Trailing slashes in submodule URLs

git complains about the trailing slashes in the url for submodules.

For example:

https://github.com/cloudfoundry-incubator/trace-logger/

git --version
git version 2.6.4

Force client Cert validation.

I'm using cloud foundry v276, and want to implement forced client cert validation for all request.
I can see similar properties are added in last few days on develop branch, how can I incorporate such functionality in CF v276.

In addition to this can someone through light on XFCC property and how its work.
Considering I Want my router to validate Client ssl certificate(for all request) but, application behind router can be simple hello-world app and has nothing to do with with client cert, is it necessary to set this XFCC property at all.

Thanks,
Mandar

TCP Routing Ports

We have a need for a few of our applications to use client certificate authentication. TCP routes work great allowing us to delegate the SSL handshake to the servlet container.

However, we'd like to create "vanity" domains that map to the TCP domain:port so clients don't have to know the port number. Is this feature on the roadmap or does this capability already exist?

How to use another port rather than 443 when the ssl is enabled for router?

In current router release, when the ssl is enabled, the port is 443 by default. But I want to use another port and did not found how to configure it.

Pretty format timestamps in router logs

Currently timestamps for the gorouter logs appear to be formatted in epoch time, this is mildly inconvenient for humans as we have to go and convert the timestamps. It would be nice to have them formatted in ISO 8601 notation as that is both easily parseable by computers and humans alike.

Most CF components use this format.

rats fail with cf cli 6.33.1

Issue

We noticed that rats is failing with the latest cf cli 6.33.1.

Context

• Failure in Spec Setup (BeforeEach) [2.233 seconds]
Tcp Routing
/tmp/build/ddc81735/acceptance-tests/src/code.cloudfoundry.org/routing-acceptance-tests/tcp_routing/tcp_routing_test.go:19
  single app port
  /tmp/build/ddc81735/acceptance-tests/src/code.cloudfoundry.org/routing-acceptance-tests/tcp_routing/tcp_routing_test.go:24
    when multiple external ports are mapped to a single app port [BeforeEach]
    /tmp/build/ddc81735/acceptance-tests/src/code.cloudfoundry.org/routing-acceptance-tests/tcp_routing/tcp_routing_test.go:112
      routes traffic from two external ports to the app
      /tmp/build/ddc81735/acceptance-tests/src/code.cloudfoundry.org/routing-acceptance-tests/tcp_routing/tcp_routing_test.go:122

      Expected
          <int>: 0
      to equal
          <int>: 2

      /tmp/build/ddc81735/acceptance-tests/src/code.cloudfoundry.org/cf-routing-test-helpers/helpers/routes.go:65

The assertion line is: https://github.com/cloudfoundry/cf-routing-test-helpers/blob/9b3553a85dfb2662ac94f78cab32d801009d8f3c/helpers/routes.go#L65

Our previous run with cf cli 6.33.0 passes. We believe this is due to a change in the cf cli where they are printing stdout response of the create route command with ANSI colors. The regex is now broken.

This might be the CF CLI story that introduced the issue: https://www.pivotaltracker.com/n/projects/892938/stories/153200724

Steps to Reproduce

Run RATs with CF CLI 6.33.1

Expected result

RATs pass.

Current result

RATs fail.

Possible Fix

Not a fix but it might be a good idea to also escape the special chars in the domain when building the regex.

cc @acrmp

Expose registration_interval as link

The identity project is implementing bbr for uaa-release

In the scenario where a restore takes place the following sequence happens

bbr pre-restore-lock -> shuts down the uaa process
wait for the route registrar to remove the route
restore happens
bbr post-restore-unlock -> starts the UAA process
wait for the route registrar to add the route

The clear flaw in this approach is that if the registration_interval in the manifest changes these scripts are still hard coded.

We are wondering if the registration_interval property can be

exposed as a link so that the colocated UAA job can consume it
document possible time values. documentation mentions s for seconds, but not any other time units.

Package name collision detected while running 'bosh -n deploy'

I get this error while running bosh -n deploy:

Started preparing deployment > Preparing deployment.
Failed: Package name collision detected in job routing_api_z1: template cf-routing/routing-api depends on package cf-routing/common, template cf/consul_agent depends on cf/common. BOSH cannot currently collocate two packages with identical names from separate releases. (00:00:00)

Here are the commands I ran after cloning:

. .envrc
./scripts/generate-bosh-lite-manifest ../../cloudfoundry/cf-release/bosh-lite/deployments/cf.yml ../diego-release/bosh-lite/deployments/diego.yml

# => Deployment set to `/home/ericp/git/cloudfoundry-incubator/cf-routing-release/bosh-lite/deployments/cf-routing-manifest.yml'

So try this:
bosh create release --force
bosh -n upload release
bosh -n deploy

I'm working on cf v224, so here's the version/compatibility report I'm working with:

cf-release v224
commit_hash: 65621dd0
DATE: Mon Nov 9 13:39:49 UTC 2015
DIRECTOR_VERSION:1.2992.0
CF_DEPLOYMENT_STEMCELL:bosh-aws-xen-hvm-ubuntu-trusty-go_agent/3126
DIEGO_RELEASE_VERSION:0.1440.0
DIEGO_RELEASE_COMMIT_SHA:deb51e25
DIEGO_DEPLOYMENT_STEMCELL:bosh-aws-xen-hvm-ubuntu-trusty-go_agent/3126
GARDEN_LINUX_RELEASE_VERSION:0.326.0
ETCD_RELEASE_VERSION:18

I don't see "common" anywhere in bosh-lite/deployments/cf-routing-manifest.yml (assuming that's the file 'bosh create release' uses). Do I need to move to a newer cf-release version? Or is there something else I can do?

Double logging happens when using the syslog-release

Issue

Due to the teeing of output to the logger command (via the tee_output_to_sys_log function), when syslog-release is used to send all log files to syslog, we get double entries in syslog.

Possible Fix

Stop teeing output to logger?

cf router-groups can't work

I just deployed cf-routing-release version 0.126.0 to bosh-lite according to the steps in README.md, but meet an issue, when I run command "cf router-groups", the following error is shown:
FAILED This command requires the Routing API. Your targeted endpoint reports it is not enabled.

But I have enabled routing_api by setting "routing_api: enabled: true" in cf.yml. what else do I need configure?

CF release version: v235
Diego version: v0.1467.0

Router VM starts to be very slow when gorouter generates a lot of logs

Issue

Router VM starts to be very slow when gorouter generates a lot of logs. This can lead to a situation when a single broken application can make routing layer fail.

Context

The situation when gorouter generates a lot of logs.

Steps to Reproduce

Deploy an application which does not accept requests fast (sleep 5 seconds before accepting a tcp connection).
Run load test against this app

Expected result

Load on a router VM slightly increases

Current result

Load on a router VM highly increases

Possible Fix

This happens because of the following part of bash code https://github.com/cloudfoundry-incubator/routing-release/blob/0.152.0/src/routing_utils/syslog_utils.sh#L15-L16. read makes a syscall for every single character and date is run for every log line.

To solve the issue prepend_datetime should be replaced by something like ts, awk or anything else which is faster than Bash's read.

How to configure tcp router?

I deployed two applications app1 and app2 to diego, in application app1, I started an activemq server, the port is 61616, in application app2, I create a java client to connect the activemq server in app1, but there is the following error, I can telnet app1.bosh-lite.com:61616 in command console of my lap-top, but can't do it in app2, I also tried the ip address of app1, got the same error

message: Could not connect to broker URL: tcp://app1.bosh-lite.com:61616. Reason: java.net.ConnectException: Connection refused

I also deployed cf-routing-release to bosh-lite, but didn't work, got the same error. I have two questions here:
(1) I saw cf release has already sub module cf-routing-release, does cf release support tcp router if I don't git clone cf-routing-release and deploy it to bosh-lite?
(2) I also tried an API call to map an external port on the router to the app1.bosh-lite.com:61616

curl 10.24.0.134:3000/routing/v1/tcp_routes/create -X POST -d '[{"route":{"router_group_guid": "tcp-default", "port": 61616}, "backend_ip": "app1.bosh-lite.com", "backend_port": 61616}]'

But got the following error:

curl: (52) Empty reply from server

The result of command bosh vms is as below:

Deployment `cf-warden'

Director task 30

Task 30 done

+------------------------------------+---------+---------------+--------------+
| Job/index                          | State   | Resource Pool | IPs          |
+------------------------------------+---------+---------------+--------------+
| api_z1/0                           | running | large_z1      | 10.244.0.134 |
| consul_z1/0                        | running | small_z1      | 10.244.0.54  |
| doppler_z1/0                       | running | medium_z1     | 10.244.0.142 |
| etcd_z1/0                          | running | medium_z1     | 10.244.0.42  |
| ha_proxy_z1/0                      | running | router_z1     | 10.244.0.34  |
| hm9000_z1/0                        | running | medium_z1     | 10.244.0.138 |
| loggregator_trafficcontroller_z1/0 | running | small_z1      | 10.244.0.146 |
| nats_z1/0                          | running | medium_z1     | 10.244.0.6   |
| postgres_z1/0                      | running | medium_z1     | 10.244.0.30  |
| router_z1/0                        | running | router_z1     | 10.244.0.22  |
| runner_z1/0                        | running | runner_z1     | 10.244.0.26  |
| uaa_z1/0                           | running | medium_z1     | 10.244.0.130 |
+------------------------------------+---------+---------------+--------------+

VMs total: 12
Deployment `cf-warden-diego'

Director task 31

Task 31 done

+--------------------+---------+------------------+---------------+
| Job/index          | State   | Resource Pool    | IPs           |
+--------------------+---------+------------------+---------------+
| access_z1/0        | running | access_z1        | 10.244.16.6   |
| brain_z1/0         | running | brain_z1         | 10.244.16.134 |
| cc_bridge_z1/0     | running | cc_bridge_z1     | 10.244.16.142 |
| cell_z1/0          | running | cell_z1          | 10.244.16.138 |
| database_z1/0      | running | database_z1      | 10.244.16.130 |
| route_emitter_z1/0 | running | route_emitter_z1 | 10.244.16.146 |
+--------------------+---------+------------------+---------------+

VMs total: 6
Deployment `cf-warden-routing'

Director task 32

Task 32 done

+-----------------+---------+---------------+------------+
| Job/index       | State   | Resource Pool | IPs        |
+-----------------+---------+---------------+------------+
| tcp_router_z1/0 | running | tcp_router_z1 | 10.244.8.2 |
+-----------------+---------+---------------+------------+

cloudfoundry / routing-release Goto Github PK

routing-release's Introduction

Routing Release

Downloads

Getting Help

Contributing

Table of Contents

Routing Operator Resources

High Availability

Routing API

Metrics

Routing App Developer Resources

Session Affinity

Headers

routing-release's People

Contributors

Stargazers

Watchers

Forkers

routing-release's Issues

Issue

Context

Steps to Reproduce

Expected result

Current result

Possible Fix

name of issue Output Results

Issue

Context

Steps to Reproduce

Expected result

Current result

Possible Fix

name of issue Output Results

Workaround

Issue

Context

Steps to Reproduce

Expected result

Current result

Possible Fix

Issue

Context

Steps to Reproduce

Expected result

Current result

Possible Fix

Issue

Context

Steps to Reproduce

Expected result

Current result

Possible Fix

name of issue Output Results

Issue

RATS Failure Output Results

Issue

Context

Steps to Reproduce

Expected result

Current result

Possible Fix

Issue

Issue

Context

Steps to Reproduce

Expected result

Current result

Possible Fix

name of issue Output Results

Issue

Context

Steps to Reproduce

Expected result

Current result

Possible Fix

name of issue Output Results

Issue

Context

Steps to Reproduce

`name of issue` Output Results

`name of issue` Output Results

`name of issue` Output Results

`RATS Failure` Output Results

`name of issue` Output Results

`name of issue` Output Results