epics-extensions / ca-gateway Goto Github PK

View Code? Open in Web Editor NEW

15.0 8.0 17.0 4.59 MB

Channel Access PV Gateway

Home Page: http://www.aps.anl.gov/epics/extensions/gateway/

License: Other

Makefile 1.37% Shell 0.17% C++ 82.84% C 0.57% Python 15.05%

epics channel-access

ca-gateway's People

Contributors

Stargazers

Watchers

Forkers

ralphlange nevstop christianfehlinger bfrk krisztianloki dls-controls anjohnson bhill-slac freddieakeroyd goetzpf acidburn0zzz smarsching mfurseman sangwonx klauer minijackson dirk-zimoch

ca-gateway's Issues

Make Creation of gateway.restart gateway.killer File Optional

When starting the gateway binary it always tries to create a gateway.restart and gateway.killer file
This is not so nice (as it produces some error/warning messages on stdout/journald) and is actually not needed if starting the gateway via systemd.
(of course one could set the -home option to /tmp, however these files actually are useless in the systemd scenario)

It would be nice if the creation could be optional and/or there is a option to disable the creation

Undefined Timestamp with ca-gateway >= R2-0-6-0

Bug Report

Recently at SLAC we started to notice some issues with the Archiver Appliance not properly archiving PVs.

When tracking it down we noticed that some PVs coming through the Gateway were reporting the timestamp as after the first time through the gateway.

We were running the latest from ca-gateway after an upgrade from 2.0.6.0.

Here is how it looks like:

$ EPICS_CA_SERVER_PORT=5124 caget -a KFE:TEST:HUGO:ENUM
KFE:TEST:HUGO:ENUM             <undefined> NO

I was able to isolate the issue once I moved back to 2.0.6.0 and cherry-picked every commit between 2.0.6 and 2.1.0 and testing on every change.
The commit that introduces the issue is this one: c6dc159

I am not very familiar with GDD and the ca-gateway but I am trying to investigate the root cause.

Please let me know if you have some insight or guidance or comments that could help me in finding a fix for it.
I understand if not since it is a change introduced more than 5 years ago. (I guess folks run some very old gateways =) ).

Thank you!

Steps to Reproduce

In a host start a simple IOC. In my case I am running one with softIoc -d static.db where static.db is:

record(mbbo, "KFE:TEST:HUGO:ENUM")
{
   field(ZRST, "YES")
   field(ZRVL, "0")
   field(ONST, "NO")
   field(ONVL, "1")
   field(TWST, "MAYBE")
   field(TWVL, "2")
   field(VAL,  "0")
   field(PINI, "YES")
}
record(ai, "KFE:TEST:HUGO:AI")
{
   field(VAL, "0.0")
   field(PREC, 3)
   field(PINI, "YES")
}

Start a ca-gateway running 2.0.6.0 allowing only the PVs from this IOC to pass and using a different port for clients to connect to it (in my case I chose 5124).
From another host, try to caget this PV with the -a option to fetch the whole timestamp information as well. Notice that it will work…
Run again caget with the -a option and it will still work.
Now upgrade the ca-gateway to 2.1.0 or greater and repeat the test above.
On the first caget it will work but on the second it will display the timestamp as undefined.

Note: camonitor will have the same effect in case you are monitoring from the second host and update the PV via the first. The monitor update will display the new value but undefined for the timestamp.

Gateway sigsegv's when cleaning up channels using ca_clear_channel

Original LaunchPad Bug #1279147 reported by Murali Shankar on 2014-02-12:

At LCLS, the archiver appliances connect to the IOC's thru a CA gateway. The gateway crashes once in a while. This does not seem to be related to an “out-of-memory” issue or a “Gateway has been running for a long time” issue. Instead, it seems to be related to the gateway cleaning up PVs (Feb 07 04:42) from an IOC that is CPU overloaded and keeps disconnecting ( Feb 07 02:41).

From the gateway logs...

>> Unexpected problem with CA circuit to server "eioc-und1-mp01.slac.stanford.edu:5068" was "Connection reset by peer" - disconnecting
>> Feb 07 02:21:23 Warning: Virtual circuit disconnect eioc-und1-mp01.slac.stanford.edu:5068

>> Feb 07 02:21:23 !!! Errlog message received (message is above)
>> Unexpected problem with CA circuit to server "eioc-und1-mp01.slac.stanford.edu:5068" was "Connection reset by peer" - disconnecting

>> Feb 07 02:41:49 !!! Errlog message received (message is above)
>> Feb 07 02:41:49 Warning: Virtual circuit disconnect eioc-und1-mp01.slac.stanford.edu:5068
>> Feb 07 04:42:32 PV Gateway Aborting (SIGSEGV)

I have core dumps and I am able to examine the variables etc and indeed the gateway is trying to clean up the PVs from this IOC using ca_clear_channel. However, the place where this crashes is in a fundamental place (tsDLList.h:238) in EPICS base. I can provide more details/core if needed.

Regards,
Murali

(gdb) bt
#0 0x0016c410 in __kernel_vsyscall ()
#1 0x0086de30 in raise () from /lib/libc.so.6
#2 0x0086f741 in abort () from /lib/libc.so.6
#3 0x080513a4 in sig_end (sig=11) at ../gateway.cc:300
#4 <signal handler called>
#5 0x0075a8c9 in remove (this=0xaf728260, guard=..., chan=...) at ../../../include/tsDLList.h:238
#6 tcpiiu::uninstallChan (this=0xaf728260, guard=..., chan=...) at ../tcpiiu.cpp:1981
#7 0x007512b7 in nciu::destroy (this=0x17e24b88, guard=...) at ../nciu.cpp:93
#8 0x00768347 in oldChannelNotify::destructor (this=0x17e179f0, guard=...) at ../oldChannelNotify.cpp:71
#9 0x00749039 in ca_clear_channel (pChan=0x17e179f0) at ../access.cpp:386
#10 0x080582e0 in gatePvData::~gatePvData (this=0x157f79b0, __in_chrg=<value optimized out>) at ../gatePv.cc:240
#11 0x08062064 in gatePvNode::destroy (this=0x1ca02110) at ../gateServer.h:69
#12 0x0805d6e7 in gateServer::inactiveDeadCleanup (this=0x925af40) at ../gateServer.cc:1490
#13 0x08060fc8 in gateServer::mainLoop (this=0x925af40) at ../gateServer.cc:285
#14 0x0804ef18 in startEverything (prefix=0xbfd7bbe2 "GWLCLSARCH") at ../gateway.cc:656
#15 0x080511a8 in main (argc=16, argv=0xbfd7b494) at ../gateway.cc:1299
……
(gdb) up
#4 <signal handler called>
(gdb) up
#5 0x0075a8c9 in remove (this=0xaf728260, guard=..., chan=...) at ../../../include/tsDLList.h:238
238 prevNode.pNext = theNode.pNext;
(gdb) print theNode
$1 = (tsDLNode<nciu> &) @0x17e24b98: {pNext = 0x17d44d68, pPrev = 0x0}
(gdb) up
#6 tcpiiu::uninstallChan (this=0xaf728260, guard=..., chan=...) at ../tcpiiu.cpp:1981
1981    this->createReqPend.remove ( chan );
(gdb) print chan
$2 = (nciu &) @0x17e24b88: {<cacChannel> = {_vptr.cacChannel = 0x781168, static priorityMax = 99, static priorityMin = 0, static priorityDefault = 0, static priorityLinksDB = 99,
    static priorityArchive = 49, static priorityOPI = 0, callback = @0x17e179f0}, <chronIntIdRes<nciu>> = {<chronIntId> = {<intId<unsigned int, 8u, 32u>> = {
        id = 833073}, <No data fields>}, <tsSLNode<nciu>> = {pNext = 0x0}, <No data fields>}, <channelNode> = {<tsDLNode<nciu>> = {pNext = 0x17d44d68, pPrev = 0x0},
    listMember = cs_createReqPend}, <privateInterfaceForIO> = {_vptr.privateInterfaceForIO = 0x7811d8}, eventq = {pFirst = 0x0, pLast = 0x0, itemCount = 0}, accessRightState = {
    f_readPermit = false, f_writePermit = false, f_operatorConfirmationRequest = false}, cacCtx = @0x925e2d8, pNameStr = 0x1c5838a8 "BLM:UND1:MP01:XILINX_CELS.LOW", piiu = 0xaf728260,
  sid = 4294967295, count = 0, retry = 1, nameLength = 30, typeCode = 65535, priority = 0 '\000'}
(gdb) quit

More information
This is PV Gateway Version 2.0.3.0 [Mar 2 2012 09:46:57]
Gateway is built against base-R3-14-12 with a few patches applied (I can provide a full list if needed).
IOC eioc-und1-mp01 runs on RTEMS-4.9.4-slac_p0 on top of EPICS R3.14.12-SLAC_1 $Date 2010/11/27

Our CA gateways send no beacons

The manual says: CA gateway sends beacons and also creates beacon anomalies when it comes up (and in certain other circumstances, too).

However, we do not see any beacons coming from CA gateways, none at all.

I tested this using using tcpdump 'udp port 5065', Götz uses a self-authored Python script.

subscriptions to waveforms suddenly break

After 4 weeks of running without problems, all of a sudden, an increasing number of subscriptions to waveforms (as well as subArrays, compress, ...) suddenly stopped working properly and only providing the client with the first value.

We suspected the problem to be related to our update to ca-gateway 2.1.2 and EPICS base 3.15.8.

First we could not reproduce the problem, but today Götz Pfeiffer ran a thorough test.
Finally he found out, that the problem was introduces by patch 90b24a7 fixing #11

Store the app type for DBR_CTRL requests then post complete GDDs (fixes #11)

Once knowing, that it must be related to the DBR-type of a client, we managed to reproduce the problem.

create a test-PV with random waveform data updating e.g. once per second
run a camonitor on that PV in one terminal
do a caget -d DBR_STSACK_STRING of this PV in another terminal

From step 3 on, the monitor will only post the first element of the waveform.
Setting highestDbrType to DBR_STSACK_STRING breaks all active subscriptions.

There's no reasonable higher DBR-type to fix this at runtime.
I have no idea, which client uses this DBR-type - but it's not forbidden ;-)

Make sure a large gp hash table is used

Reported by Lana Abadie (ITER):

Investigating long reconnect times through a CA Gateway, profiling found that the Gateway spends a lot of time inside GPHENTRY * epicsStdCall gphFindParse(gphPvt *pgphPvt, const char *name, size_t len, void *pvtid), namely doing string comparisons.

In that general purpose hash table, linear search and string comparisons are done in the case of collisions, i.e., when hash buckets are found not empty.

The CA Gateway – being an application that may serve many PVs and always runs on a virtual memory system – should use a large gp hash table.

Access rules based on (sub-)domain

Hi,

I am running epics with a gateway server in a small lab with an internal network and the gateway server publishing the PVs into an external network as read-only.

I was wondering if one can write an access-rule such, that I can allow write access depending on the domain of the hosts? Right now I need to put all the hostnames of the clients manually into the *.access file to allow writing PVs
I do have a laptop which is sometimes located in the lab (internal network) and sometimes it is in the office (external network). How can I verify, that I have read/write access when the laptop is in the internal lab network and that it has only read access when it is in the external network? I would again realize that with its different domain names ...

Best

Daniel

crash

I frequently see the gateway executable crash in our system. Unfortunately, I fail to see a pattern, when exactly it crashes. I enabled core dumps, and finally caught a crash when it happened:

(gdb) bt
#0  0x00007f67e7a3ed26 in assertIdenticalMutex (this=0x0, guard=..., chan=..., sidIn=4294967295, typeIn=65535, countIn=0) at ../../../include/epicsGuard.h:81
#1  tcpiiu::installChannel (this=0x0, guard=..., chan=..., sidIn=4294967295, typeIn=65535, countIn=0) at ../tcpiiu.cpp:1911
#2  0x00007f67e7a2c2bb in cac::transferChanToVirtCircuit (this=<value optimized out>, cid=<value optimized out>, sid=4294967295, typeCode=65535, count=0, minorVersionNumber=13, 
    addr=..., currentTime=...) at ../cac.cpp:639
#3  0x00007f67e7a3a4a0 in udpiiu::searchRespAction (this=<value optimized out>, msg=<value optimized out>, addr=<value optimized out>, currentTime=<value optimized out>)
    at ../udpiiu.cpp:690
#4  0x00007f67e7a3a5c2 in udpiiu::postMsg (this=0x242d760, net_addr=..., pInBuf=<value optimized out>, blockSize=48, currentTime=...) at ../udpiiu.cpp:857
#5  0x00007f67e7a3c681 in udpRecvThread::run (this=0x243db88) at ../udpiiu.cpp:394
#6  0x00007f67e77dd249 in epicsThreadCallEntryPoint (pPvt=0x243dba8) at ../../../src/libCom/osi/epicsThread.cpp:83

Can you make any sense of this?

Add pvname -> real_name cache

Original LaunchPad Bug #1404307 reported by Ralph Lange on 2014-12-19:

The Gateway checks a new pvname against the regexps twice:
Once in pvExistTest() and then again in createPV().

Maybe there should be some kind of cache that keeps the information and prevents the gateway from matching and creating the real_name twice.

Created from Mantis issue 10

CTRL and GR subscriptions get incorrect data

This problem was first reported at pyepics/pyepics#40
There are some test results at the above link as well.

It had an intermediate life as an EPICS Base bug at https://bugs.launchpad.net/epics-base/+bug/1510955 - that ticket has some more thoughts.

The Original Problem:

DBR_CTRL variables (units, etc) are reported correctly only on the initial connection for a channel subscription when the channel is hosted by either a Gateway or via pcaspy. This hints at a problem in libcas or libgdd.

Test code may be found here - https://github.com/dchabot/ca_client_test

This is simply a version of the caClient example code genereated via makeBaseApp, and modified to print the units of a channel.

To test, point the caMonitor executable at channel hosted by a Gateway or hosted by pcaspy.

misconfigured softioc causes gateway crash

A softioc with a waveform record where EPICS_CA_MAX_ARRAY_BYTES is not set, causes the gateway to crash with "PV Gateway Aborting (SIGSEGV)".
The bug was seen with EPICS base 3.15.5 and CA-gateway R2-1-0-0.
It seems to happen when the gateway is started with option "-no_cache".

I have attached a recipe that shows how to reproduce the problem:
Unpack the tar file and follow the instructions in README.rst. You need to have pyepics or edm installed to produce this error.

GATEWAY-ERROR.tar.gz

Using PCRE breaks the tests

Using the Perl compatible regular expression library by setting USE_PCRE=YES breaks all tests.

ca-gateway doesn't compile on macOS mojave

I get the following error message:
mac-130048:gateway2_0_6_0 odell$ make
/Library/Developer/CommandLineTools/usr/bin/make -C ./configure install
/Library/Developer/CommandLineTools/usr/bin/make -C O.darwin-x86 -f ../Makefile TOP=../..
T_A=darwin-x86 install
perl -CSD /Users/odell/base-3.15.7/bin/darwin-x86/convertRelease.pl checkRelease
/Library/Developer/CommandLineTools/usr/bin/make -C ./src install
/Library/Developer/CommandLineTools/usr/bin/make -C O.darwin-x86 -f ../Makefile TOP=../..
T_A=darwin-x86 install
c++ -DUNIX -Ddarwin -O3 -g -Wall -DSTAT_PVS -DRATE_STATS -DCONTROL_PVS -DCAS_DIAGNOSTICS -DHANDLE_EXCEPTIONS -DUSE_DENYFROM -arch x86_64 -fno-common -I. -I../O.Common -I. -I. -I.. -I../../include/compiler/clang -I../../include/os/Darwin -I../../include -I/Users/odell/base-3.15.7/include/compiler/clang -I/Users/odell/base-3.15.7/include/os/Darwin -I/Users/odell/base-3.15.7/include -I/Users/odell/base-3.15.7/src/cas/generic -I/Users/odell/base-3.15.7/src/ca/legacy/pcas/generic -c ../gatePv.cc
In file included from ../gatePv.cc:71:
../gateAs.h:138:27: error: field has incomplete type 'struct re_pattern_buffer'
struct re_pattern_buffer pat_buff;
^
../gateAs.h:138:9: note: forward declaration of 're_pattern_buffer'
struct re_pattern_buffer pat_buff;
^
../gateAs.h:139:22: error: field has incomplete type 'struct re_registers'
struct re_registers regs;
^
../gateAs.h:139:9: note: forward declaration of 're_registers'
struct re_registers regs;
^
2 errors generated.
make[2]: *** [gatePv.o] Error 1
make[1]: *** [install.darwin-x86] Error 2
make: *** [src.install] Error 2
mac-130048:gateway2_0_6_0 odell$

I get the same error message with more recent versions of ca-gateway as well. I'm using EPICS 3.15.7.
This version of gateway compiles fine on the beagle bone. Is there an obvious fix for this?

Statistics Channels can't support alarms (ALH)

Original LaunchPad bug #1404310 reported by Ralph Lange on 2014-12-19:

From Andrew (@anjohnson) :

Apparently pointing ALH to the stats channel of a gateway results in an ALH crash. Dirk Zimoch has more details...

It would be nice to be able to configure the gateway stats channels to generate alarms when the rates get too large, where the limits are configurable somehow.

Created from Mantis issue 355

Gateway tests leave gateway processes running

I ran the new gateway self-tests some time back, apparently on May 11th. My workstation has not been rebooted since then, and I just noticed a whole load of gateway processes still running that were started on that date. If the test programs can't manage to stop their own gateway processes at the end of testing, they should at least inform the user so they can be stopped manually.

Occasional crashes and memory corruption on exit

Original LaunchPad bug #1466776 reported by Ralph Lange on 2015-06-19:

Running the new tests on the dbe_props branch (soon to be merged) with blown-up verbosity, I occasionally see the following:

[...]
gateVcData::interestDelete() name=ioc:gwcachetest
gateVcData::destroy()
gateVcData::vcRemove() name=ioc:gwcachetest
gateVcData::vcRemove() connect/ready -> clear
gatePvData::deactivate() name=ioc:gwcachetest
gatePvData::deactivate() active PV
gatePvData::unmonitor() name=ioc:gwcachetest
gatePvData::logUnmonitor() name=ioc:gwcachetest
gatePvData::propUnmonitor() name=ioc:gwcachetest
gatePvData::alhUnmonitor() name=ioc:gwcachetest
~gateVcData()
Jun 19 10:43:22 PV Gateway Ending (SIGTERM)
~gateServer()
~gatePvData() name=ioc:gwcachetest
gatePvData::unmonitor() name=ioc:gwcachetest
gatePvData::logUnmonitor() name=ioc:gwcachetest
gatePvData::propUnmonitor() name=ioc:gwcachetest
gatePvData::alhUnmonitor() name=ioc:gwcachetest
~gatePvData() name=��F
gatePvData::unmonitor() name=��F
gatePvData::logUnmonitor() name=��F
gatePvData::propUnmonitor() name=��F
gatePvData::alhUnmonitor() name=��F
Jun 19 10:43:22 PV Gateway Aborting (SIGSEGV)

Obviously, the gatePvData destructor is called for a questionable object.

Allow hierarchical regexp patterns

Original LaunchPad Bug #1404308 reported by Ralph Lange on 2014-12-19:

To further optimize the pattern matching process, the configuration should allow to create hierarchies of general and specialized patterns instead of a simple list.

Created from Mantis issue 12

Documentation suggestions

Suggestion: Rename gateway.notes to make it more visible and clear that it contains the release notes. It would probably work as a .md file too.

In the testTop/README, move the Python prerequisites higher up in the file to make them more visible too. This document could also mention that the version of Base used should be built with SHARED_LIBRARIES=YES since the pyepics module has to be able to find the libca.so (or the user could set the pyepics variable whose name escapes me to point to the libca.so in a different Base).

My earlier reported issue with the nosetests --with-tap command is moot, I have tap.py installed in my personal PYTHONPATH but was running these tests from a different account.

dynamically sized array subscriptions

When I monitor an array PV via gateway I always get the maximum number of elements back. For instance, a waveform record with NELM=1000 and NORD=5 will give me 5 elements when requested directly (same subnet) but when the request goes via gateway I get 1000 elements back. Gateway version is 2.0.6 with base-3.14.12.3 (these are the versions that we can install as debian packages).

Connect requests for unknown channels don't trigger a client-side search

When a CA name-server sits in front of a CA Gateway, clients that use the name-server don't make new connections to PVs the gateway should be serving, it seems that the gateway only searches for and sets up a new VC for PV names that come in from a UDP search.

Our name-server knows the names published by a set of IOCs that live behind our "RBB" gateway, which has its client side on the RBB subnet and its server side on our "EP" subnet. CA clients from machines on the EP subnet can connect to the RBB IOC PVs just fine through the gateway. However we have CA clients in our "MCR" subnet which are being told (correctly) by the name-server to contact the RBB gateway for PVs from those IOCs, and some connect fine but others don't. The PVs that don't connect are the ones that no client in the EP subnet has tried to connect to yet, so there is no VC for them in the gateway. Running a caget for those PVs on an EP client sends a broadcast search which triggers the creation of the VC for that PV, and the MCR clients can then connect.

I plan to have the RBB gateway serve the MCR subnet as well to work around the problem so this isn't urgent, when I can find someone with spare time I'll try to have them take a look to see if it can be resolved though.

Enable tests to verify compatibility across versions of Base

The tests could work as compatibility tests (run from a separate Jenkins job) if they allowed running IOC, Gateway and client from different versions of Base.

Using statically linked binaries would probably help.

Gateway crashes when accessing PCAS(Py) array PV where the actual count is over the configured count

The backtrace for this crash is included below. It happens when we try to access a char array PV that's served by a PCASPy IOC, where 'count' was configured to 5000, but the actual count was 6270. It prints the following error message:

*** Error in '/opt/epics-7.0.4/modules/ca-gateway-2.1.2/bin/linux-x86_64/gateway': free(): invalid next size (normal): 0x00007fffc8001d00 ***

Interestingly, caget only requests the first 5000 bytes, and doesn't crash, and pyepics requests all bytes and stores them properly, so doesn't corrupt memory either. This makes it seem that each client has to handle this their own way, meaning ca-gateway doesn't.

The crash location and error message indicates to me that there was memory corruption which libc detected and, for security reasons, abort()ed on.

(gdb) bt
#0  0x00007ffff68d0387 in raise () from /lib64/libc.so.6
#1  0x00007ffff68d1a78 in abort () from /lib64/libc.so.6
#2  0x00007ffff6912ed7 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff691b299 in _int_free () from /lib64/libc.so.6
#4  0x00007ffff798e011 in gddDestructor::destroy(void*) ()
   from /opt/epics-7.0.4/modules/pcas-4.13.2/lib/linux-x86_64/libgdd.so.4.13.0
#5  0x00007ffff797c1f9 in gdd::~gdd() () from /opt/epics-7.0.4/modules/pcas-4.13.2/lib/linux-x86_64/libgdd.so.4.13.0
#6  0x00007ffff7bc1823 in casAsyncReadIOI::~casAsyncReadIOI() ()
   from /opt/epics-7.0.4/modules/pcas-4.13.2/lib/linux-x86_64/libcas.so.4.13.0
#7  0x00007ffff7bc1869 in casAsyncReadIOI::~casAsyncReadIOI() ()
   from /opt/epics-7.0.4/modules/pcas-4.13.2/lib/linux-x86_64/libcas.so.4.13.0
#8  0x00007ffff7bc1089 in casAsyncIOI::cbFunc(casCoreClient&, epicsGuard<casClientMutex>&, epicsGuard<evSysMutex>&) ()
   from /opt/epics-7.0.4/modules/pcas-4.13.2/lib/linux-x86_64/libcas.so.4.13.0
#9  0x00007ffff7bc28d5 in casEventSys::process(epicsGuard<casClientMutex>&) ()
   from /opt/epics-7.0.4/modules/pcas-4.13.2/lib/linux-x86_64/libcas.so.4.13.0
#10 0x00007ffff7bc7a5e in casStreamEvWakeup::expire(epicsTime const&) ()
   from /opt/epics-7.0.4/modules/pcas-4.13.2/lib/linux-x86_64/libcas.so.4.13.0
#11 0x00007ffff74d386e in timerQueue::process(epicsTime const&) ()
   from /opt/epics-7.0.4/base/lib/linux-x86_64/libCom.so.3.18.0
#12 0x00007ffff74b527f in fdManager::process(double) () from /opt/epics-7.0.4/base/lib/linux-x86_64/libCom.so.3.18.0
#13 0x000000000041a9ee in gateServer::mainLoop() ()
#14 0x000000000040d7f1 in main ()

Gateway treats int/float values as hex

This is a followup to the tech-talk thread:
https://epics.anl.gov/tech-talk/2021/msg01394.php

When writing numeric values to a PV, Gateway will treat the input as hex if the value is followed by excess (non-digit) characters. This does not happen when writing directoy to an IOC.

If I caput to an IOC (no Gateway involved):
$ EPICS_CA_ADDR_LIST=ioc_address caput x:ai 10extra
or
$ EPICS_CA_ADDR_LIST=ioc_address caput x:ai 10 extra
The "extra" characters are ignored and the value 10 is stored in the record, which is OK.

Doing the same through CA Gateway:
$ EPICS_CA_ADDR_LIST=gateway_address caput x:ai 10extra
or
$ EPICS_CA_ADDR_LIST=gateway_address caput x:ai 10 extra
The "extra" characters at the end cause the record to store the value 16 (10 hex), which I don't expect.

The problem originated when an operator habitually added 'w' when caput-ing to a setpoint record controlling some output in Watts. In that case the caput PS-Power 100w resulted in PS-Power being set to 256 instead of the expected 100 (Watts).

I tried the change proposed by @anjohnson (thanks!) in his tech-talk reply, substituting the call to epicsScanDouble() with sscanf() in aitConvert.cc:getStringAsDouble() and it seems to work: caput x:ai 10w stores 10 into the record, as expected. However I'm not sure if that will break any other functionality relying on the use of epicsScanDouble().

What's the best way to handle this issue?

Versions used:
OS: RHEL8
base-7.0.6
ca-gateway-2.1.2
pcas-4.13.2

caPutLog JSON support

Recently the caPutLog module added JSON logging support, which needs different calls to initialize it instead of the old format. Support for this should be added to ca-gateway at some point.

String representation of ENUMs with more than 16 states

Tom Fors pointed out:

Running a caget command through the gateway returns the ENUM value rather than the string while a direct connection returns the string:
phoebus $ caget B:SD:CurrentRefWF.DTYP
B:SD:CurrentRefWF.DTYP         19

helios $ caget B:SD:CurrentRefWF.DTYP
B:SD:CurrentRefWF.DTYP         Xycom 566 Waveform

He didn't realize the significance of the enum value 19, which I have explained.

However it might be feasible for the Gateway to fetch the string representation of an ENUM when its integer value is ≥16 (it could even cache the representations of such higher values, although cache invalidation might be a problem) and hence return the correct string when asked. I don't know if gdd can handle more than 16 enums strings though, and I'm not sure that implementing this would be worth the effort, but I'm just filing this as a future possibility.

testTop/configure contents

Original LaunchPad bug #1461687 reported by Andrew Johnson (@anjohnson) 2015-06-03:

Building a stand-alone (static) gateway without creating any RELEASE.local files [which are good for developers but not for operational installation builds], my build failed under testTop because EPICS_BASE was not set. One of the uncommented lines in the testTop/configure/RELEASE file should be

    -include $(GATEWAY)/configure/RELEASE

to get the definition of EPICS_BASE from the parent module.

Thus as with other standard support modules I should only have to edit the top-level configure/RELEASE file and the whole thing should then build successfully.

Does this reorganized gateway still build using the gnuregex extension or an equivalent Linux library? There's nothing in either the RELEASE or CONFIG_SITE files explaining how to configure this. I would expect to see a switch in the CONFIG_SITE file, and maybe another path in the RELEASE file for it; this wasn't previously needed when the EPICS gnuregex extension installed its files into the extensions install directories, but if Linux distributions come with a suitable library already built maybe it's not needed.

Gateway build is not EPICS 7 ready

The ca-gateway does not yet have a configuration for the EPICS7 structure.

"gateServer::pvAttach() bad PV" when using Name Server

We've been chasing an issue with PVs not connecting through a gateway from a client (which actually happens to be another gateway but I now don't think that really matters) when the client uses a CA name-server to find the server.

We're still developing a working example, but what's happening is that the gateway is returning CA_PROTO_CREATE_CH_FAIL (26) through TCP for a channel names that end in .VAL but is succeeding for the same name without the .VAL. One important thing to note from our packet captures is that there are no UDP searches for these channel names being sent to the gateway, the name server already knows where the .VAL PV can be found from the other channel name.

Turning on debug in the gateway we see the "bad PV" error messages in the log for the .VAL connection attempts. However as soon as we try connecting to the .VAL channel from a client that is not behind the name-server, the gateway finds the channel, and now the name-server client can connect to it as well.

Here's the "bad PV" error, from gateServer.cc's gateServer::pvAttach() method:

	// See if we have a gateVcData
	if(vcFind(real_name,rc) < 0)
	{
		// We don't have it, create a new one
		rc=new gateVcData(this,real_name);

		if(rc->getStatus())
		{
			gateDebug1(5,"gateServer::pvAttach() bad PV %s\n",real_name);

The error implies that the gateVcData constructor is returning an object with a non-zero status, implying construction failed. Looking in gateVc.cc at that constructor I see this comment:

	// Important Note: The exist test should have been performed for this
	// PV already, which means that the gatePvData exists and is connected
	// at this point, so it should be present on the pv list

I believe our use of the name server invalidates the assumption described in the above note. It is causing clients to connect to the gateway over TCP that have not previously sent a UDP search request, which I guess that means the gateServer::pvExistTest() method has not been called for the name. It may have been called for the name without the .VAL but the name-server knows that both must be on the same CA server.

The notes for the caServer::pvAttach() method in casdef.h do say "The request is allowed to complete asynchronously" so it should be possible to support our use-case, but it looks like the current code isn't prepared to do so.

BTW I am currently reorganizing the new ca-nameserver git module which I only recently converted from CVS, to build stand-alone. I am also not finished with the investigation, although ideas and comments welcome...

Config files ignore last line without \n

If the pvlist file (maybe other config files) does not end with a newline (\n) character, the last line is not parsed and its content is ignored.