tbotnz / netpalm Goto Github PK

ReST based network device broker

License: GNU Lesser General Public License v3.0

Python 63.99% CSS 35.39% Shell 0.20% Jinja 0.42%

jinja2-templates webhook napalm ncclient netmiko network-programming network-automation docker juniper cisco

netpalm's Introduction

The Open API Platform for Network Devices

netpalm makes it easy to push and pull state from your apps to your network by providing multiple southbound drivers, abstraction methods and modern northbound interfaces such as open API3 and REST webhooks.

Supporting netpalm

Apcela

_{Because Enterprise Speed Matters}

_{Delivering the power to communicate}

Maybe you?

What is netpalm?
Features
Concepts
Additional Features
Examples
API Docs
Caching
Configuration
Installation
Further Reading
Contributing

What is netpalm?

Leveraging best of breed open source network components like napalm, netmiko, ncclient and requests, netpalm makes it easy to abstract from any network devices native telnet, SSH, NETCONF or RESTCONF interface into a modern model driven open api 3 interface.

Taking a platform based approach means netpalm allows you to bring your own jinja2 config, service and webhook templates, python scripts and webhooks for quick adoption into your existing devops workflows.

Built on a scalable microservice based architecture netpalm provides unparalleled scalable API access into your network.

Features

Speaks REST and JSON RPC northbound, then CLI over SSH or Telnet or NETCONF/RESTCONF southbound to your network devices
Turns any Python script into a easy to consume, asynchronous and documented API with webhook support
Large amount of supported network device vendors thanks to napalm, netmiko, ncclient and requests
Built in multi-level abstraction interface for network service lifecycle functions for create, retrieve and delete and validate
In band service inventory
Ability to write your own service models and templates using your own existing jinja2 templates
Well documented API with postman collection full of examples and every instance gets it own self documenting openAPI 3 UI.
Supports pre- and post-checks across CLI devices raising exceptions and not deploying config as required
Multiple ways to queue jobs to devices, either pinned strict (prevent connection pooling at device)or pooled first in first out
Modern, container based scale out architecture supported by every component
Highly configurable for all aspects of the platform
Leverages an encrypted Redis layer providing caching and queueing of jobs to and from devices

Concepts

Basic Concepts

netpalm acts as a ReST broker and abstraction layer for NAPALM, Netmiko, NCCLIENT or a Python Script. netpalm uses TextFSM or Jinja2 to model and transform both ingress and egress data if required.

Component Concepts

netpalm is underpinned by a container based scale out architecture for all components.

Queueing Concepts

netpalm provides domain focused queueing strategy for task execution on network equipment.

Scaling Concepts

Every netpalm container can be scaled in and out as required. Kubernetes or Swarm is recommended for any large scale deployments.

To scale out the basic included compose deployment use the docker-compose command

docker-compose scale netpalm-controller=1 netpalm-worker-pinned=2 netpalm-worker-fifo=3

Additional Features

Jinja2
- BYO jinja2 config templates
- BYO jinja2 service templates
- BYO jinja2 webhook templates
- Can be used to just render Jinja2 templates via the REST API
- Automatically generates a JSON schema for any Jinja2 Template
Parsers
- TextFSM support via netmiko
- NTC-templates for parsing/structuring device data (includes)
- TTP Template Text Parser - Jinja2-like parsing of semi-structured CLI data
- Napalm getters
- Genie support via netmiko
- Automated download and installation of TextFSM templates from http://textfsm.nornir.tech online TextFSM development tool
- Optional dynamic rendering of Netconf XML data into JSON
Webhooks
- Comes with standard REST webhook which supports data transformation via your own jinja2 template
- Supports you to bring your own (BYO) webhook scripts
Scripts
- Execute ANY python script as async via the ReST API and includes passing in of parameters
- Supports pydantic models for data validation and documentation
Queueing
- Supports a "pinned" queueing strategy where a dedicated process and queue is established for your device, tasks are sync queued and processed for that device
- Supports a "fifo" pooled queueing strategy where a pool of workers
- Supports on the fly changes to the async queue strategy for a device
Caching
- Can cache responses from devices so that the same request doesn't have to go back to the device
- Automated cache poisioning on config changes on devices
Scaling
- Horizontal container based scale out architecture supported by each component

Examples

We could show you examples for days, but we recommend playing with the online postman collection to get a feel for what can be done. We also host a public instance where you can test netpalm via the Swagger UI.

getconfig method

netpalm also supports all arguments for the transport libs, simply pass them in as below

check response

ServiceTemplates

netpalm supports model driven service templates, these self render an OpenAPI 3 interface and provide abstraction and orchestration of tasks across many devices using the get/setconfig or script methods.

The below example demonstrates basic SNMP state orchestration across multiple devices for create, retrieve, delete

Template Development and Deployment

netpalm is integrated into http://textfsm.nornir.tech so you can ingest your templates with ease

API Docs

netpalm comes with a Postman Collection and an OpenAPI based API with a SwaggerUI located at http://localhost:9000/ after starting the container.

Caching

Supports the following per-request configuration (/getconfig routes only for now)
- permit the result of this request to be cached (default: false), and permit this request to return cached data
- hold the cache for 30 seconds (default: 300. Should not be set above redis_task_result_ttl which defaults to 500)
- do NOT invalidate any existing cache for this request (default: false)
```
  {
    "cache": {
      "enabled": true,
      "ttl": 30,
      "poison": false
    }
  }
```
Supports the following global configuration:
- Enable/Disable caching: "redis_cache_enabled": true for caching to apply it must be enabled BOTH globally and in the request itself
- Default TTL: "redis_cache_default_timeout": 300
Any change to the request payload will result in a new cache key EXCEPT:
- JSON formatting. { "x": 1, "y": 2 } == {"x":1,"y":2}
- Dictionary ordering: {"x":1,"y":2} == {"y":2,"x"1}
- changes to cache configuration (e.g. changing the TTL, etc)
- fifo vs pinned queueing strategy
Any call to any /setconfig route for a given host:port will poison ALL cache entries for that host:port
- Except /setconfig/dry-run of course

Configuration

Edit the config/config.json file to change any parameters ( see defaults.json for example )

Installation

Ensure you first have docker installed

sudo apt-get install docker.io
sudo apt-get install docker-compose

Clone this repository

git clone https://github.com/tbotnz/netpalm.git
cd netpalm

Build the container

sudo docker-compose up -d --build

After the container has been built and started, you're good to go! netpalm will be available on port 9000 under your docker hosts IP.

http://$(yourdockerhost):9000

Contributing

We are open to contributions, before making a PR, please make sure you've read our CONTRIBUTING.md document.

You can also find us in the channel #netpalm on the networktocode Slack.

netpalm's People

Contributors

Stargazers

Watchers

Forkers

edurguti poeblu aspath1 apcela aaglenn psaneem lboue k01ek mariusasirum bobbygr andymor jevgenious carolo87 haskhr jpobeda dtembe rhwendt ichisuke55 devnullnz aramidetosin brobare weiplanet th3architect wmclendon jsumak anjelo42 gaozhengwei bartdorlandt krishnaprasanthg northlandteam diorgesl yijxiang jtishey wrgeorge1983 xdai555 python-collection-org superbirds2008 hanunes zkzqzk mirzawaqasahmed mwallraf himanshukumaargit amandaym wangxin688 hujunbob juanpablodev1213 kevincrick lamiskin saiswaroopaa radtrentasei raul-flores moayyaed jianan1104 rajkishes

netpalm's Issues

TFSM template addtemplate route won't create a new section for a new driver

e.g. this "completes" just fine but never shows up in "gettemplates" output. Not sure if it's just the index rewrite that's failing or what.

{
	"key": "",
	"driver": "cisgo_foo",
	"command": "show asdf events"
}

bundled redis TLS self signed cert has expired

Can’t use j2 templates inside a directory in netpalm/backend/plugins/jinja2_templates

This feature is useful to organised the jinja2 templates.

Install packages at runtime accross the cluster - Feature Request

I have a bunch of python scripts that are using modules that I would like to add as a script and execute via the REST API.

Maybe upload a .zip with the script and all modules.

FR: new extensibles endpoints

Would certainly be nice to have a few new extensibles endpoints:

Bulk upload. Reload using the /reload-extensibles endpoint once at the end of processing the payload.
Delete extensibles. Bulk delete, maybe?

Issue Huawei netconf

I'm having this problem with Huawei NETCONF, is this a ncclient problem?
Task error:

  "Invalid tag name '<Element {urn:ietf:params:xml:ns:netconf:base:1.0}rpc at 0x7fd021d7d380>'",
  "failed: Invalid tag name '<Element {urn:ietf:params:xml:ns:netconf:base:1.0}rpc at 0x7fd021d7d380>'"

My body:

{
  "library": "ncclient",
  "connection_args": {
    "host": "1.1.2.3",
    "username": "user",
    "password": "pass",
    "port": 830,
    "hostkey_verify": false,
    "device_params" : { "name": "huawei"}
  },
  "args": {
    "rpc": "<get-config></get-config>",
    "render_json":true
  },
  "queue_strategy": "fifo"
}

Out of process by orphan processes

When i deploy the project on a physical machine with the pinned worker model, the network worker like "rq:worker:11.0.2.2_4a06654f-0a23-4ee3-8ae6-dceb8d2b6f54" will be orphan process when the pinned worker process down, which should be killed.

I looked through the source code，the RQ module use the os.fork to create a sub process in fork_work_horse, if the subprocess exit after execution，that's right. However, the subprocess is pinned_worker_constructor, it also has a subprocess pinned_worker which is loop, not exit, when the pinned_worker_constructor completed, it will be exited, so the pinned_worker will be orphan process. The orphan process will be sending heartbeat message after default_worker_ttl, so the pinned worker state has only one filed "last_heartbeat".

Out of process will be caused by the pinned worker multiple restarts. The container environment doesn't has the problem due to container only monitor the init process which will also take over the orphan process.

Ability to set TTL on a per-task basis

Per slack conversation, logging a FR to request a "per-task TTL" hook to allow long running jobs (think config backups for lots of devices) to influence the task TTL specifically for that task.

Scripts to support model definition in same file

Opening issue in regards to a slack conversation. Would like the ability to define both the script content and the Model in the same file. Currently, the model and code are in separate files, which could get hairy at scale. Would be sweet to be able to define in the same file, following the "services" pattern.

Failed to Conect

I think I missed something on the install I can run a getconfig command but the task is telling me it can not connect to the device. I am connecting to a cisco 3750 using cisco_ios. I can ssh in to that unit just fine from the machine that Docker is running on...what did I miss?

Service-template dry-run option is not dere

Please add dry-run option for create delete operation in service template. This can be provided as napalm library has this inbuilt feature

netpalm meet cisco_s300 clients ? & 403

hi, could you please include cisco_s300 clients ? and how could i use ssh as netmiko connection ?

Change 5xx codes to 4xx for "get" methods ?

This issue is to request behavior changes on "get" routes that are currently returning 500 codes that might be better as 4xx.

For example, a "get" to /task/<task_id> if the task does not exist in redis returns a 500. An example:
https://github.com/tbotnz/netpalm/blob/master/netpalm/routers/task.py#L21

Might it be better to return something like a 404 if the task does not exist?

Issue with netmiko setcondig multiple

I'm using in the body of the setconfig netmiko the folllowing
{
"library": "netmiko",
"connection_args":{
    "device_type":"cisco_xe", "host":"10.0.2.1", "username":"", "password":""
},
"config": ["interface GigabitEthernet1\n", "description test\n"]
}
I get back the following (on the setconfig):
{
  "data": {
    "created_on": "Tue, 14 Apr 2020 18:20:43 GMT",
    "task_id": "6bd59805-0fdc-4f88-8603-0e1242f6257d",
    "task_queue": "10.0.2.1",
    "task_result": null,
    "task_status": "queued"
  },
  "status": "success"
}

FR - caching on scripts

Logging request per slack conversation with @tbotnz to enable caching on the /script/ endpoint. See:

https://github.com/tbotnz/netpalm/blob/38aec21548fb62664e92fb927dc732bd70228e35/netpalm/routers/route_utils.py#L115

Debugging / Timeline

Hey first of all I really appreciate all the effort you have put into this project.

That being said I am trying to get some more debugging info on whats happening in the containers for a particular restconf request I'm running. I wrote a python module that interacts with NXOS REST API, and I'm trying to drop a lot of the work I've already done to use the netpalm.

This is an example request im sending to /getconfig

{
	"library": "restconf",
	"connection_args": {
		"host": "{{ device_ip_address }}",
		"port": 8443,
		"username": "admin",
		"password": "admin",
		"verify": false,
		"timeout": 10,
		"transport": "https",
		"headers": {
			"Content-Type": "application/json",
			"Accept": "*/*"
		}
	},
	"args": {
		"uri": "/api/mo/aaaLogin.json",
		"action": "post",
		"payload": {
			"aaaUser": {
				"attributes": {
					"name": "{{ device_username }}",
					"pwd": "{{ device_password }}"
				}
			}
		}
	},
	"queue_strategy": "fifo"
}

However on the task result I'm seeing this.

{
  "status": "success",
  "data": {
    "task_id": "eb0bae87-d4d2-4530-8569-4c2804073773",
    "created_on": "2020-08-21 13:48:44.666103",
    "task_queue": "fifo",
    "task_meta": {
      "enqueued_at": "2020-08-21 13:48:44.666502",
      "started_at": "2020-08-21 13:48:44.740164",
      "ended_at": "2020-08-21 13:48:45.400580",
      "enqueued_elapsed_seconds": "0",
      "total_elapsed_seconds": "0"
    },
    "task_status": "finished",
    "task_result": {
      "https://10.254.0.101:8443/api/mo/aaaLogin.json": {
        "status_code": 400,
        "result": {
          "imdata": [
            {
              "error": {
                "attributes": {
                  "code": "400",
                  "text": "Failed to parse login request"
                }
              }
            }
          ]
        }
      }
    },
    "task_errors": []
  }
}

I've connected to the docker containers but I don't see any logs of requests etc. I'd like to be able to see the request sent to the device, or some logging related to it.

Thanks!

Error installing netpalm

Hi team,

I've been using netpalm for a project recently. Today I was trying to install over a new installation and I got this when doing the docker-compose build

W: GPG error: http://deb.debian.org/debian bookworm InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 648ACFD622F3D138 NO_PUBKEY 0E98404D386FA1D9 NO_PUBKEY F8D2585B8783D481
E: The repository 'http://deb.debian.org/debian bookworm InRelease' is not signed.
W: GPG error: http://deb.debian.org/debian bookworm-updates InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 0E98404D386FA1D9 NO_PUBKEY 6ED0E7B82643E131
E: The repository 'http://deb.debian.org/debian bookworm-updates InRelease' is not signed.
W: GPG error: http://deb.debian.org/debian-security bookworm-security InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 54404762BBB6E853 NO_PUBKEY BDE6D2B9216EC7A8
E: The repository 'http://deb.debian.org/debian-security bookworm-security InRelease' is not signed.
E: Problem executing scripts APT::Update::Post-Invoke 'rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true'
E: Sub-process returned an error code

Docker containers keep restarting

I have done a fresh install of netpalm but the containers do not stay UP.

(py3venv) [developer@devbox ~]$ docker ps
CONTAINER ID   IMAGE                           COMMAND                  CREATED             STATUS                          PORTS                                       NAMES
802b12dd5126   netpalm_netpalm-worker-fifo     "python3 worker.py f…"   About an hour ago   Restarting (1) 26 seconds ago                                               netpalm_netpalm-worker-fifo_1
ce39b3e672e8   netpalm_netpalm-worker-pinned   "python3 worker.py p…"   About an hour ago   Restarting (1) 6 seconds ago                                                netpalm_netpalm-worker-pinned_1
706c38ed13ec   netpalm_netpalm-controller      "/bin/sh -c 'gunicor…"   About an hour ago   Up 7 seconds                    0.0.0.0:9000->9000/tcp, :::9000->9000/tcp   netpalm_netpalm-controller_1
300a7e0bfa37   netpalm_redis                   "docker-entrypoint.s…"   About an hour ago   Up About an hour                6379/tcp                                    netpalm_redis_1

This is the error I am getting when running docker-compose not in background

netpalm-worker-fifo_1    | TypeError: To define root models, use `pydantic.RootModel` rather than a field called '__root__'

pinned worker bug

current pinned worker is being duplicated due to bug in the worker_is_alive method on the rediz class

Can't use https://textfsm.nornir.tech to import textfsm templates

I used to be able to import textfsm templates from https://textfsm.nornir.tech. Now with the update of the website, this is no longer possible, or is it?

If it is possible please let us know how.

Thanks in advance.

Genie not available as parser

I checked the requirements.txt that genie is in the list of modules to be installed. But If I try to use it in a get Config request I get an error that it is not installed.

"args": {
   "use_genie": "true"
  },

"exception_args": [
          "\nGenie and PyATS are not installed. Please PIP install both Genie and PyATS:\npip install genie\npip install pyats\n"
        ]

read_timeout not working in setconfig

After setting read_timeout key to setconfig, it doesn't seem to be passed down to netmiko, for some reason.

I've debugged that it is getting through to the netmiko.drvr config function, but netmiko doesn't seem to be picking up on it.

I can't seem to figure out what's going on, but was looking at netmiko's docs, and everything lines up correctly.

adding kafka support

hello,

i mentioned in slack today but perhaps it is worth opening an issue here for discussion and posterity.

I think netpalm having native ability to publish the task result data to kafka topics would be very handy for various use cases for downstream consumption - especially when combined with the scheduled tasks - and at least functionally, i've been able to add it fairly easily by adding the library kafka-python (https://pypi.org/project/kafka-python/) to the worker containers. I think this doesn't have any additional dependencies.

i've also been able to add a fairly simple webhook/callback script that will produce the task results to a topic, and also updated the docker-compose file to include kafka and zookeeper (though i'll be honest, i'm not sure whether zookeeper is actually required any longer). i'm planning to clean up and improve the script this week after i do a bit more testing.

If there is interest to add it i'll be happy to open a PR with what i've done after I clean it up a bit and do a little more testing, to get the process started.

Thanks,

Will

Can't retrieve response

On fresh deployment using : ./redis_gen_new_certs.sh && docker-compose up -d --build

Create a getconfig task :

POST /getconfig HTTP/1.1
Host: xxxx.com:9000
x-api-key: xxxxx
Content-Type: application/json
Content-Length: 360

{
    "library": "netmiko",
    "connection_args": {
        "device_type": "cisco_ios",
        "host": "10.x.x.x",
        "username": "florian_lacommare",
        "password": "xxxxx."
    },
    "command": "show ip int brief",
    "args": {
        "use_textfsm": true
    },
    "queue_strategy": "fifo"
}

Response :

{
    "status": "success",
    "data": {
        "task_id": "90ddad9d-f169-43a6-8a70-b6120ce8570a",
        "created_on": "2022-09-30 06:19:57.094087",
        "task_queue": "fifo",
        "task_meta": {
            "enqueued_at": "2022-09-30 06:19:57.094526",
            "started_at": null,
            "ended_at": null,
            "enqueued_elapsed_seconds": "0",
            "total_elapsed_seconds": "0",
            "assigned_worker": null
        },
        "task_status": "queued",
        "task_result": null,
        "task_errors": []
    }
}

Log on controler :

[2022-09-30 06:19:57,090:netpalm.routers.route_utils:wrapper:DEBUG] cacheable_model: req_data {'library': <LibraryName.netmiko: 'netmiko'>, 'connection_args': {'device_type': 'cisco_ios', 'host': '10.x.x.x', 'username': 'florian_lacommare', 'password': '******'}, 'command': 'show ip int brief', 'args': {'use_textfsm': True}, 'webhook': {}, 'queue_strategy': <QueueStrategy.fifo: 'fifo'>, 'post_checks': [], 'cache': {}, 'ttl': None}
[2022-09-30 06:19:57,091:netpalm.routers.route_utils:cache_key_from_req_data:INFO] hashed key: add992c10c403a5fde3f95b9e9a45c32040445bbcfd572382dbb1c91271093d6
[2022-09-30 06:19:57,091:netpalm.routers.route_utils:cache_key_from_req_data:DEBUG] cache_key_from_req_data: cache key 10.x.x.x:None:show ip int brief:add992c10c403a5fde3f95b9e9a45c32040445bbcfd572382dbb1c91271093d6
[2022-09-30 06:19:57,093:netpalm.backend.core.redis.rediz:__sendtask:DEBUG] __sendtask: {'library': <LibraryName.netmiko: 'netmiko'>, 'connection_args': {'device_type': 'cisco_ios', 'host': '10.x.x.x', 'username': 'florian_lacommare', 'password': '******'}, 'command': 'show ip int brief', 'args': {'use_textfsm': True}, 'webhook': {}, 'queue_strategy': <QueueStrategy.fifo: 'fifo'>, 'post_checks': [], 'cache': {}}

Get task info for 90ddad9d-f169-43a6-8a70-b6120ce8570a

GET /task/90ddad9d-f169-43a6-8a70-b6120ce8570a HTTP/1.1
Host: xxxx.com:9000
x-api-key: xxxx
Content-Type: application/json

Response :

{
    "detail": "Not Found"
}

log controller :

[2022-09-30 06:22:06,565:netpalm.backend.core.redis.rediz:fetchtask:INFO] fetching task: 90ddad9d-f169-43a6-8a70-b6120ce8570a

That's it ... no info, no task ...

For info, my task failed but I still should get task info which tell me that my task failed, right ? :

[2022-09-30 06:19:57,100:rq.worker:dequeue_job_and_maintain_ttl:INFO] fifo: fifo (90ddad9d-f169-43a6-8a70-b6120ce8570a)
[2022-09-30 06:19:57,130:netpalm.backend.core.utilities.rediz_meta:write_meta_error:ERROR] `write_meta_error` processing error
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/netmiko/base_connection.py", line 920, in establish_connection
    self.remote_conn_pre.connect(**ssh_connect_params)
  File "/usr/local/lib/python3.8/site-packages/paramiko/client.py", line 368, in connect
    raise NoValidConnectionsError(errors)
paramiko.ssh_exception.NoValidConnectionsError: [Errno None] Unable to connect to port 22 on 10.x.x.x

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/code/netpalm/backend/plugins/drivers/netmiko/netmiko_drvr.py", line 27, in connect
    netmikoses = ConnectHandler(**self.connection_args)
  File "/usr/local/lib/python3.8/site-packages/netmiko/ssh_dispatcher.py", line 312, in ConnectHandler
    return ConnectionClass(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/netmiko/cisco/cisco_ios.py", line 17, in __init__
    return super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/netmiko/base_connection.py", line 346, in __init__
    self._open()
  File "/usr/local/lib/python3.8/site-packages/netmiko/base_connection.py", line 351, in _open
    self.establish_connection()
  File "/usr/local/lib/python3.8/site-packages/netmiko/base_connection.py", line 942, in establish_connection
    raise NetmikoTimeoutException(msg)
netmiko.ssh_exception.NetmikoTimeoutException: TCP connection to device failed.

Common causes of this problem are:
1. Incorrect hostname or IP address.
2. Wrong TCP port.
3. Intermediate firewall blocking access.

Device settings: cisco_ios 10.x.x.x:22


[2022-09-30 06:19:57,134:rq.worker:handle_exception:ERROR] Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/rq/worker.py", line 1075, in perform_job
    rv = job.perform()
  File "/usr/local/lib/python3.8/site-packages/rq/job.py", line 854, in perform
    self._result = self._execute()
  File "/usr/local/lib/python3.8/site-packages/rq/job.py", line 877, in _execute
    result = self.func(*self.args, **self.kwargs)
  File "/code/netpalm/backend/plugins/calls/getconfig/exec_command.py", line 111, in exec_command
    write_meta_error(e)
  File "/code/netpalm/backend/core/utilities/rediz_meta.py", line 32, in write_meta_error
    raise exception from None  # Don't process the same exception twice
  File "/code/netpalm/backend/plugins/calls/getconfig/exec_command.py", line 36, in exec_command
    sesh = netmik.connect()
  File "/code/netpalm/backend/plugins/drivers/netmiko/netmiko_drvr.py", line 30, in connect
    write_meta_error(e)
  File "/code/netpalm/backend/core/utilities/rediz_meta.py", line 49, in write_meta_error
    raise NetpalmMetaProcessedException from exception
netpalm.exceptions.NetpalmMetaProcessedException
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/rq/worker.py", line 1075, in perform_job
    rv = job.perform()
  File "/usr/local/lib/python3.8/site-packages/rq/job.py", line 854, in perform
    self._result = self._execute()
  File "/usr/local/lib/python3.8/site-packages/rq/job.py", line 877, in _execute
    result = self.func(*self.args, **self.kwargs)
  File "/code/netpalm/backend/plugins/calls/getconfig/exec_command.py", line 111, in exec_command
    write_meta_error(e)
  File "/code/netpalm/backend/core/utilities/rediz_meta.py", line 32, in write_meta_error
    raise exception from None  # Don't process the same exception twice
  File "/code/netpalm/backend/plugins/calls/getconfig/exec_command.py", line 36, in exec_command
    sesh = netmik.connect()
  File "/code/netpalm/backend/plugins/drivers/netmiko/netmiko_drvr.py", line 30, in connect
    write_meta_error(e)
  File "/code/netpalm/backend/core/utilities/rediz_meta.py", line 49, in write_meta_error
    raise NetpalmMetaProcessedException from exception
netpalm.exceptions.NetpalmMetaProcessedException

We need a CI pipeline

Custom scripts fail at failing

It seems like when a script runs into an exception, it gets caught by s_exec in netpalm/backend/plugins/calls/scriptrunner/script.py and returns the exception string to the script_exec function, which then has no way to know it failed. This makes scripts always be reported as successful. Would it be better to either remove the try/except in s_exec or maybe have it re-raise the exception or something? Am I just missing something here?

Modified "hello_world" script to fail and raise an exception:

def run(**kwargs):
        args = kwargs.get("kwargs")
        world = args.get("hello")
        return non_existent_world

Current result:

<...>
   "task_status": "finished",
    "task_result": {},
    "task_errors": []
  }

Modified scriptrunner/script.py function:

    def s_exec(self):
        module = importlib.import_module(self.script_name)
        runscrp = getattr(module, "run")
        res = runscrp(kwargs=self.arg)
        return res

Result with modified s_exec():

<...>
    "task_status": "failed",
    "task_result": null,
    "task_errors": [
      "name 'non_existent_world' is not defined"
    ]

issue with "command" when using /netmiko/getconfig however legacy /getconfig works

{
"status": "success",
"data": {
"task_id": "e58dda33-a0a6-4fee-ada4-734878ca159a",
"created_on": "2020-10-06 06:43:34.170339",
"task_queue": "fifo",
"task_meta": {
"enqueued_at": "2020-10-06 06:43:34.170598",
"started_at": "2020-10-06 06:43:34.193613",
"ended_at": "2020-10-06 06:43:38.531095",
"enqueued_elapsed_seconds": null,
"total_elapsed_seconds": "4"
},
"task_status": "finished",
"task_result": null,
"task_errors": [
"send_command() got multiple values for argument 'command_string'"
]
}
}

commit not implemented erro

issue with Network device does not support 'commit()' method due to netmiko supporting the attr but just raising out

Published Containers have expired certs

Just performed fresh install using the documented instructions below:

Containers fail to start due to the included certificate having expired in August 2022:
_netpalm-controller_1 | redis.exceptions.ConnectionError: Error 1 connecting to redis:6379. [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (ssl.c:1131).

Please update documentation on the repo to include the additional step below:

Ensure you first have docker installed
sudo apt-get install docker.io
sudo apt-get install docker-compose
Clone this repository
git clone https://github.com/tbotnz/netpalm.git
cd netpalm
Regenerate certificates <--- New Step
./redis_gen_new_certs.sh
After the container has been built and started, you're good to go! netpalm will be available on port 9000 under your docker hosts IP.

Not sure if this the best way but it worked for me.

Additional improvement suggestions:

Would it be possible for the compose script to automatically recreate the certificate upon each new build?
Could steps be documented to explain how an admin can provide their own certificates if auto-regeneration is not possible/preferred?

It looks as though the regen script puts the certs into the below location:
/netpalm/netpalm/backend/core/security/cert/tls

Thanks

Lee

No rollback option for a task

Please add rollback option for any task. This can be solved by two ways-

auto generating the delete config for service template which can be triggered when asked to rollback.
A rollback api can be developed which can just copy the rollback config to current running configuration device( generated during task execution by napalm)

Cache interactions with Pre-Check and Post-Check are entirely unknown

It's possible they already act as desired, or could be made that way quite easily. It's also possible it'll take some real digging into.

Since Checks are sort of "sub-tasks":

Do (or should) they require their own cache config?
Do (or should) they inherit the cache config of the parent task?
If the Parent is a setconfig call, it should ALWAYS invalidate any cache after a PreCheck and before a PostCheck.

Task Start -> Pre Check -> SetConfig -> Poison Cache -> Post Check

Bug for newly updated extensibles - not available in swagger and 404 when called

Currently, if the extensible management endpoints are used to upload extensibles, the controller(s) never updates the swagger docs and all API calls to the (newly) updated extensibles return a 404 code.

Per slack conversation, logging a FR/bug to look at porting the CMDBoss, or a similar solution to refresh the templates/swagger/routes.

Feature request - always included webhook

This is to request additional behavior for the webhook callbacks. Would love to be able to have a concept of an "always webhook" where a webhook could be defined within the config file that would always get ran.

The idea being, a custom webhook could be built that would always get ran upon task completion to create an archive of executions/results.

Would be great if the behavior was "hierarchical" as well so that specifying a webhook in the POST payload would also get ran. For example:

If a webhook is specified in the payload:
a. The "always" webhook gets executed
b. The webhook in the payload gets executed
If a webhook is not specified in the payload:
a. the "always" webhook gets executed

Thoughts?

Jinja2 include tag is not working for GET j2template API

When I use include tag in j2 templates GET j2template API is not working. However “j2 render template” API is working fine. Include and extend tag is required to build complex jinja2 templates as per the requirement.

Adding new driver SOAP

I have built a similar program to netpalm for our team. It is less complex and doesn't support task queuing and some of the other more advanced concepts built here. But it does support the same drivers with the addition of a SOAP driver I created to interface with Adtran and Calix systems.

Would you be willing to work together to add a new driver for the system?

"error reading schema" when include statement is used in jinja 2 teamplate for including another jinja2 template

API --->
x.x.x.x:9000/j2template/iosxe_vrf --> 200 ok response
cat ../test.j2
{% include "iosxe_vrf.j2" %}
x.x.x.x:9000/j2template/test
Output-->
{
"data": {
"task_result": {
"template_data": "{% include "iosxe_vrf.j2" %}\n",
"template_schema": "error reading schema"
}
},
"status": "success"
}

Use multiple ttp templates with multiple commands

At the moment it's not possible to pass an array of commands to getconfig endpoint, and match multiple ttp templates to the output, meaning that you can't effectively parse data correctly. When you try to do this, only the first command get's parsed correctly.

Would this be too hard to implement?

FR: method to scrub logs

Per slack conversation with @tbotnz opening a ticket to create an impelmention of "log scrubbing". Idea being, a method of removing certain patterns, such as usernme/password, from being presented in logs.

One idea on the implementation would be to allow for customized patterns to be matched:

{
    "scrub": [
         "password",
         "userrname",
         "some_special_key",
         "x-api-key"
    ]
}

Auto reload of template directory

Templates are currently loaded on container creation and containers have to be re-built if the templates are edited. It would be ideal if the template directory was reloaded automatically after a certain time period or when a template has been changed.

Allow for customization of Response objects

Would be nice to be able to customize the Response objects for all the routes and their different http response codes. Currently there is just a placeholder for the 200/201 and 422 codes.

Being able to pull cache items directly is probably more a liability than it's worth

Listing them is, I think, fine. But pulling them individually is just another way for someone with the Netpalm API Key to snoop on any result obtained by any other user :/

Realtime Application

Hello, I'm developing an app to get realtime graphs, so, what is the best way to call the API every two seconds?

I done rising the workers for fifo, because the requests got queued a lot.

Thank you in advance!

FR: static work queues

Following up on slack conversation. Idea is to implement methods to implement a "static queue" within redis and allow tasks to target that queue. So if you had situations that you wanted to, say, run a config backup job for 5000 devices, you could spin up a FIFO worker and "route" those 5000 jobs to that worker. Would allow for the "main" worker queue to take on other (possibly/likely higher priority) tasks.

example ncclient call

hello,

this isn't actually an issue, but i'm not sure how to propose a wiki page or something -- hoping to just provide a simple example of using ncclient with juniper equipment for others.

Example /getconfig POST body that calls an RPC and will render the result into JSON:

{
    "library": "ncclient",
    "connection_args": {
        "host": "192.168.1.4",
        "username": "foo",
        "password": "bar",
        "port": 830,
        "hostkey_verify": false,
        "device_params": {
            "name": "junos"
        }
    },
    "args": {
        "rpc": "<get-software-information></get-software-information>",
        "render_json": true
    },
    "queue_strategy": "fifo"
}

and the task result with rendered JSON:

{
    "status": "success",
    "data": {
        "task_id": "73af9d88-4424-44fb-9be5-3453514dedf8",
        "created_on": "2020-11-22 17:59:57.427318",
        "task_queue": "fifo",
        "task_meta": {
            "enqueued_at": "2020-11-22 17:59:57.428441",
            "started_at": "2020-11-22 17:59:57.512601",
            "ended_at": "2020-11-22 18:00:03.786630",
            "enqueued_elapsed_seconds": "0",
            "total_elapsed_seconds": "6"
        },
        "task_status": "finished",
        "task_result": {
            "get_config": {
                "rpc-reply": {
                    "@message-id": "urn:uuid:10a9d9d9-09ea-4e70-a035-d9c06aba1a88",
                    "multi-routing-engine-results": {
                        "multi-routing-engine-item": {
                            "re-name": "fpc0",
                            "software-information": {
                                "host-name": "OFFICE-EX2200",
                                "product-model": "ex2200-c-12p-2g",
                                "product-name": "ex2200-c-12p-2g",
                                "package-information": [
                                    {
                                        "name": "junos",
                                        "comment": "JUNOS Base OS boot [12.3R12.4]"
                                    },
                                    {
                                        "name": "jbase",
                                        "comment": "JUNOS Base OS Software Suite [12.3R12.4]"
                                    },
                                    {
                                        "name": "jkernel-ex-2200",
                                        "comment": "JUNOS Kernel Software Suite [12.3R12.4]"
                                    },
                                    {
                                        "name": "jcrypto-ex",
                                        "comment": "JUNOS Crypto Software Suite [12.3R12.4]"
                                    },
                                    {
                                        "name": "jdocs-ex",
                                        "comment": "JUNOS Online Documentation [12.3R12.4]"
                                    },
                                    {
                                        "name": "jswitch-ex",
                                        "comment": "JUNOS Enterprise Software Suite [12.3R12.4]"
                                    },
                                    {
                                        "name": "jpfe-ex22x",
                                        "comment": "JUNOS Packet Forwarding Engine Enterprise Software Suite [12.3R12.4]"
                                    },
                                    {
                                        "name": "jroute-ex",
                                        "comment": "JUNOS Routing Software Suite [12.3R12.4]"
                                    },
                                    {
                                        "name": "jweb-ex",
                                        "comment": "JUNOS Web Management [12.3R12.4]"
                                    },
                                    {
                                        "name": "fips-mode-arm",
                                        "comment": "JUNOS FIPS mode utilities [12.3R12.4]"
                                    }
                                ]
                            }
                        }
                    }
                }
            }
        },
        "task_errors": []
    }
}

Thanks,

Will

Why to specify args for service jinja2 template. Unnecessary args mapping in service template.

Args will come from restapi call so can't be directly linked to jinja2 template called.
If we have 20 variables for jinja2 variable then each one needs to be mapped in service template and then value to be rendered from rest API call

Cache TTL can currently exceed request TTL. This should not be permitted.

Since what's actually cached is the Task metadata, and not the actual result, this should not be permitted.

Either:

raise an error
set cache_ttl to min(cache_ttl, response_ttl)
set response_ttl to max(cache_ttl, response_ttl)

I don't have any firm opinions on which way we go, but we should do one of those

support for Mikrotik vendor via napalm-ros

I'm a beginner to network automation and I tried to use napalm-ros driver with a RouterBoard without success, nothing happens.

Is this something that need to be added in netpalm source code?

TFSM template sections created by Netpalm aren't updated correctly on future runs

Sure. So to clarify a bit more, this only happens in the second template I add for a specific driver (generic, linux, etc.).

{ "key": "<any nornir key>", "driver": "generic", "command": "test1" }

If you go to the endpoint to show the list of templates, it is there, and if you check the template itself, you get the template contents.

If, after you add the first one, you try the above payload, but with command being test2, you'll see it adds the template, but it does not appear on the list. Furthermore, you can see the content of the template file.

Even with diferent key values, the template does not appear on the list, but you can see its contents.

Originally posted by @hanunes in #72 (comment)

Swagger documentation now rendered anymore."Please indicate a valid Swagger or OpenAPI version field"

When you use the latest version and after you fixed the issue with the pydantic version you don't get the usual swagger/openAPI screen as the file cannot be rendered. Instead you get this message:

Please indicate a valid Swagger or OpenAPI version field. Supported version fields are swagger: "2.0" and those that match openapi: 3.0.n (for example, openapi: 3.0.0

I am not familiar with FastAPI but it seems something was change between version 0.98.0 and 0.99.0 (current version is 0.103.2) that breaks the rendering. It might have something todo with the customer CSS in netpalm.

In order to fix the issue the specific version of FastAPI should be in the requirements.txt. Usually is bad practice to pin down a specific version, same as not providing any version et all. But in this case we can expect no further versions between 0.98.0 and 0.99.0 with security patches. Anyhow to stick with proper syntax I did changed this line in requirements.txt:

fastapi>=0.98.0,<0.99.0

rebuild containers and it is working again.

Off course this is only a quick fix, as you miss out on all security and functional updates for FastAPI. The proper way would be to solve the root cause why it is not rendered with FastAPI version >0.98.