fbtftp is Facebook's implementation of a dynamic TFTP server framework.
License: MIT License
Makefile 1.52%Python 98.48%
fbtftp's Introduction
What is fbtftp?
fbtftp is Facebook's implementation of a dynamic TFTP server framework. It
lets you create custom TFTP servers and wrap your own logic into it in a very
simple manner.
Facebook currently uses it in production, and it's deployed at global scale
across all of our data centers.
Why did you do that?
We love to use existing open source software and to contribute upstream, but
sometimes it's just not enough at our scale. We ended up writing our own tftp
framework and decided to open source it.
fbtftp was born from the need of having an easy-to-configure and
easy-to-expand TFTP server, that would work at large scale. The standard
in.tftpd is a 20+ years old piece of software written in C that is very
difficult to extend.
fbtftp is written in python3 and lets you plug your own logic to:
publish per session and server wide statistics to your infrastructure
define how response data is built:
can be a file from disk;
can be a file created dynamically;
you name it!
How do you use fbtftp at Facebook?
We created our own Facebook-specific server based on the framework to:
stream static files (initrd and kernels) from our http repositories (no need
to fill your tftp root directory with files);
publish per-server and per-connection statistics to our internal monitoring
systems;
deployment is easy and "container-ready", just copy the application somewhere,
start it and you are done.
Is it better than the other TFTP servers?
It depends on your needs! fbtftp is written in Python 3 using a
multiprocessing model; its primary focus is not speed, but flexibility and
scalability. Yet it is fast enough at our datacenter scale :)
It is well-suited for large installations where scalability and custom features
are needed.
RFC 2349 (Timeout Interval and Transfer
Size Options).
Note that the server framework only support RRQs (read only) operations.
(Who uses WRQ TFTP requests in 2019? :P)
How does it work?
All you need to do is understanding three classes and two callback functions,
and you are good to go:
BaseServer: This class implements the process which deals with accepting new
requests on the UDP port provided. Default TFTP parameters like timeout, port
number and number of retries can be passed. This class doesn't have to be used
directly, you must inherit from it and override get_handler() method to
return an instance of BaseHandler.
The class accepts a server_stats_callback, more about it below. the callback
is not re-entrant, if you need this you have to implement your own locking
logic. This callback is executed periodically and you can use it to publish
server level stats to your monitoring infrastructure. A series of predefined
counters are provided. Refer to the class documentation to find out more.
BaseHandler: This class deals with talking to a single client. This class
lives into its separate process, process which is spawned by the BaserServer
class, which will make sure to reap the child properly when the session is
over. Do not use this class as is, instead inherit from it and override the get_response_data() method. Such method must return an instance of a subclass of ResponseData.
ResponseData: it's a file-like class that implements read(num_bytes),
size() and close(). As the previous two classes you'll have to inherit
from this and implement those methods. This class basically let you define how
to return the actual data
server_stats_callback: function that is called periodically (every 60
seconds by default). The callback is not re-entrant, if you need this you have
to implement your own locking logic. This callback is executed periodically
and you can use it to publish server level stats to your monitoring
infrastructure. A series of predefined counters are provided.
Refer to the class documentation to find out more.
session_stats_callback: function that is called when a client session is
over.
I'm having trouble figuring out what the best way to respond to a client w/ ERR_FILE_NOT_FIND (error 1) . Would this be done as part of the server's get_handler()? Of the handler's get_response_data()? The handler has a private method _transmit_error() which would seem to be useful, but it appears to only be intended to be called by fbtftp internals?
I am deploying an Nvidia DGX system. Basically, with exact same configuration, fbtftp server fails while Ubuntu native tftpd-hpa works.
My current server is running Ubuntu 18.04, and I am deploying Ubuntu 20.04 on the DGX target.
While booting in PXE (in EFI), the DGX system gets the ip from dhcp, then tries to download a file from the fbtftp server, but it strangely fails.
Logs on server side with fbtftp server:
Aug 31 22:55:44 mngt01 python3[43123]: INFO:root:Server stats - every 60 seconds
Aug 31 22:55:44 mngt01 python3[43123]: DEBUG:root:Starting the metrics callback in 60s
Aug 31 22:56:20 mngt01 python3[43123]: INFO:root:New connection from peer `('::ffff:172.31.95.1', 1340, 0, 0)` asking for path `efi64/syslinux.efi`
Aug 31 22:56:20 mngt01 python3[43123]: INFO:root:Options requested from peer ('::ffff:172.31.95.1', 1340, 0, 0): OrderedDict([('mode', 'octet'), ('tsize', '0'), ('blksize', '1468')])
Aug 31 22:56:20 mngt01 python3[43123]: INFO:root:Options to ack for peer ('::ffff:172.31.95.1', 1340, 0, 0): OrderedDict([('tsize', '199952'), ('blksize', '1468')])
Aug 31 22:56:24 mngt01 python3[43123]: INFO:root:New connection from peer `('::ffff:172.31.95.1', 1340, 0, 0)` asking for path `efi64/syslinux.efi`
Aug 31 22:56:24 mngt01 python3[43123]: INFO:root:Options requested from peer ('::ffff:172.31.95.1', 1340, 0, 0): OrderedDict([('mode', 'octet'), ('tsize', '0'), ('blksize', '1468')])
Aug 31 22:56:24 mngt01 python3[43123]: INFO:root:Options to ack for peer ('::ffff:172.31.95.1', 1340, 0, 0): OrderedDict([('tsize', '199952'), ('blksize', '1468')])
Aug 31 22:56:28 mngt01 python3[43123]: INFO:root:New connection from peer `('::ffff:172.31.95.1', 1340, 0, 0)` asking for path `efi64/syslinux.efi`
Aug 31 22:56:28 mngt01 python3[43123]: INFO:root:Options requested from peer ('::ffff:172.31.95.1', 1340, 0, 0): OrderedDict([('mode', 'octet'), ('tsize', '0'), ('blksize', '1468')])
Aug 31 22:56:28 mngt01 python3[43123]: INFO:root:Options to ack for peer ('::ffff:172.31.95.1', 1340, 0, 0): OrderedDict([('tsize', '199952'), ('blksize', '1468')])
Aug 31 22:56:32 mngt01 python3[43123]: INFO:root:New connection from peer `('::ffff:172.31.95.1', 1340, 0, 0)` asking for path `efi64/syslinux.efi`
Aug 31 22:56:32 mngt01 python3[43123]: INFO:root:Options requested from peer ('::ffff:172.31.95.1', 1340, 0, 0): OrderedDict([('mode', 'octet'), ('tsize', '0'), ('blksize', '1468')])
Aug 31 22:56:32 mngt01 python3[43123]: INFO:root:Options to ack for peer ('::ffff:172.31.95.1', 1340, 0, 0): OrderedDict([('tsize', '199952'), ('blksize', '1468')])
Aug 31 22:56:34 mngt01 python3[43123]: ERROR:root:timeout after 6 retransmits.
Aug 31 22:56:34 mngt01 python3[43123]: INFO:root:Stats: for ('::ffff:172.31.95.1', 1340, 0, 0) requesting 'efi64/syslinux.efi'
Aug 31 22:56:34 mngt01 python3[43123]: INFO:root:Error: {'error_code': 0, 'error_message': 'timeout after 6 retransmits.'}
Aug 31 22:56:34 mngt01 python3[43123]: INFO:root:Time spent: 14021ms
Aug 31 22:56:34 mngt01 python3[43123]: INFO:root:Packets sent: 7
Aug 31 22:56:34 mngt01 python3[43123]: INFO:root:Packets ACKed: 0
Aug 31 22:56:34 mngt01 python3[43123]: INFO:root:Bytes sent: 0
Aug 31 22:56:34 mngt01 python3[43123]: INFO:root:Options: OrderedDict([('tsize', '199952'), ('blksize', '1468')])
Aug 31 22:56:34 mngt01 python3[43123]: INFO:root:Blksize: 1468
Aug 31 22:56:34 mngt01 python3[43123]: INFO:root:Retransmits: 6
Aug 31 22:56:34 mngt01 python3[43123]: INFO:root:Server port: 69
Aug 31 22:56:34 mngt01 python3[43123]: INFO:root:Client port: 1340
Aug 31 22:56:34 mngt01 python3[43123]: DEBUG:root:Closing response data object
Aug 31 22:56:34 mngt01 python3[43123]: DEBUG:root:Closing socket
Aug 31 22:56:34 mngt01 python3[43123]: DEBUG:root:Dying.
Aug 31 22:56:36 mngt01 python3[43123]: INFO:root:New connection from peer `('::ffff:172.31.95.1', 1340, 0, 0)` asking for path `efi64/syslinux.efi`
Aug 31 22:56:36 mngt01 python3[43123]: INFO:root:Options requested from peer ('::ffff:172.31.95.1', 1340, 0, 0): OrderedDict([('mode', 'octet'), ('tsize', '0'), ('blksize', '1468')])
Aug 31 22:56:36 mngt01 python3[43123]: INFO:root:Options to ack for peer ('::ffff:172.31.95.1', 1340, 0, 0): OrderedDict([('tsize', '199952'), ('blksize', '1468')])
Aug 31 22:56:38 mngt01 python3[43123]: ERROR:root:timeout after 6 retransmits.
Aug 31 22:56:38 mngt01 python3[43123]: INFO:root:Stats: for ('::ffff:172.31.95.1', 1340, 0, 0) requesting 'efi64/syslinux.efi'
Aug 31 22:56:38 mngt01 python3[43123]: INFO:root:Error: {'error_code': 0, 'error_message': 'timeout after 6 retransmits.'}
Aug 31 22:56:38 mngt01 python3[43123]: INFO:root:Time spent: 14019ms
Aug 31 22:56:38 mngt01 python3[43123]: INFO:root:Packets sent: 7
Aug 31 22:56:38 mngt01 python3[43123]: INFO:root:Packets ACKed: 0
Aug 31 22:56:38 mngt01 python3[43123]: INFO:root:Bytes sent: 0
Aug 31 22:56:38 mngt01 python3[43123]: INFO:root:Options: OrderedDict([('tsize', '199952'), ('blksize', '1468')])
Aug 31 22:56:38 mngt01 python3[43123]: INFO:root:Blksize: 1468
Aug 31 22:56:38 mngt01 python3[43123]: INFO:root:Retransmits: 6
Aug 31 22:56:38 mngt01 python3[43123]: INFO:root:Server port: 69
Aug 31 22:56:38 mngt01 python3[43123]: INFO:root:Client port: 1340
Aug 31 22:56:38 mngt01 python3[43123]: DEBUG:root:Closing response data object
Aug 31 22:56:38 mngt01 python3[43123]: DEBUG:root:Closing socket
Aug 31 22:56:38 mngt01 python3[43123]: DEBUG:root:Dying.
Aug 31 22:56:40 mngt01 python3[43123]: INFO:root:New connection from peer `('::ffff:172.31.95.1', 1340, 0, 0)` asking for path `efi64/syslinux.efi`
Aug 31 22:56:40 mngt01 python3[43123]: INFO:root:Options requested from peer ('::ffff:172.31.95.1', 1340, 0, 0): OrderedDict([('mode', 'octet'), ('tsize', '0'), ('blksize', '1468')])
Aug 31 22:56:40 mngt01 python3[43123]: INFO:root:Options to ack for peer ('::ffff:172.31.95.1', 1340, 0, 0): OrderedDict([('tsize', '199952'), ('blksize', '1468')])
Aug 31 22:56:42 mngt01 python3[43123]: ERROR:root:timeout after 6 retransmits.
Aug 31 22:56:42 mngt01 python3[43123]: INFO:root:Stats: for ('::ffff:172.31.95.1', 1340, 0, 0) requesting 'efi64/syslinux.efi'
Aug 31 22:56:42 mngt01 python3[43123]: INFO:root:Error: {'error_code': 0, 'error_message': 'timeout after 6 retransmits.'}
Aug 31 22:56:42 mngt01 python3[43123]: INFO:root:Time spent: 14020ms
Aug 31 22:56:42 mngt01 python3[43123]: INFO:root:Packets sent: 7
Aug 31 22:56:42 mngt01 python3[43123]: INFO:root:Packets ACKed: 0
Aug 31 22:56:42 mngt01 python3[43123]: INFO:root:Bytes sent: 0
Aug 31 22:56:42 mngt01 python3[43123]: INFO:root:Options: OrderedDict([('tsize', '199952'), ('blksize', '1468')])
Aug 31 22:56:42 mngt01 python3[43123]: INFO:root:Blksize: 1468
Aug 31 22:56:42 mngt01 python3[43123]: INFO:root:Retransmits: 6
Aug 31 22:56:42 mngt01 python3[43123]: INFO:root:Server port: 69
Aug 31 22:56:42 mngt01 python3[43123]: INFO:root:Client port: 1340
Aug 31 22:56:42 mngt01 python3[43123]: DEBUG:root:Closing response data object
Aug 31 22:56:42 mngt01 python3[43123]: DEBUG:root:Closing socket
Aug 31 22:56:42 mngt01 python3[43123]: DEBUG:root:Dying.
Aug 31 22:56:44 mngt01 python3[43123]: DEBUG:root:Running the metrics callback
Aug 31 22:56:44 mngt01 python3[43123]: INFO:root:Server stats - every 60 seconds
Aug 31 22:56:44 mngt01 python3[43123]: INFO:root:Number of spawned TFTP workers in stats time frame : 6
Aug 31 22:56:44 mngt01 python3[43123]: DEBUG:root:Starting the metrics callback in 60s
Aug 31 22:56:46 mngt01 python3[43123]: ERROR:root:timeout after 6 retransmits.
Aug 31 22:56:46 mngt01 python3[43123]: INFO:root:Stats: for ('::ffff:172.31.95.1', 1340, 0, 0) requesting 'efi64/syslinux.efi'
Aug 31 22:56:46 mngt01 python3[43123]: INFO:root:Error: {'error_code': 0, 'error_message': 'timeout after 6 retransmits.'}
Aug 31 22:56:46 mngt01 python3[43123]: INFO:root:Time spent: 14020ms
Aug 31 22:56:46 mngt01 python3[43123]: INFO:root:Packets sent: 7
Aug 31 22:56:46 mngt01 python3[43123]: INFO:root:Packets ACKed: 0
Aug 31 22:56:46 mngt01 python3[43123]: INFO:root:Bytes sent: 0
Aug 31 22:56:46 mngt01 python3[43123]: INFO:root:Options: OrderedDict([('tsize', '199952'), ('blksize', '1468')])
Aug 31 22:56:46 mngt01 python3[43123]: INFO:root:Blksize: 1468
Aug 31 22:56:46 mngt01 python3[43123]: INFO:root:Retransmits: 6
Aug 31 22:56:46 mngt01 python3[43123]: INFO:root:Server port: 69
Aug 31 22:56:46 mngt01 python3[43123]: INFO:root:Client port: 1340
Aug 31 22:56:46 mngt01 python3[43123]: DEBUG:root:Closing response data object
Aug 31 22:56:46 mngt01 python3[43123]: DEBUG:root:Closing socket
Aug 31 22:56:46 mngt01 python3[43123]: DEBUG:root:Dying.
Aug 31 22:56:50 mngt01 python3[43123]: ERROR:root:timeout after 6 retransmits.
Aug 31 22:56:50 mngt01 python3[43123]: INFO:root:Stats: for ('::ffff:172.31.95.1', 1340, 0, 0) requesting 'efi64/syslinux.efi'
Aug 31 22:56:50 mngt01 python3[43123]: INFO:root:Error: {'error_code': 0, 'error_message': 'timeout after 6 retransmits.'}
Aug 31 22:56:50 mngt01 python3[43123]: INFO:root:Time spent: 14020ms
Aug 31 22:56:50 mngt01 python3[43123]: INFO:root:Packets sent: 7
Aug 31 22:56:50 mngt01 python3[43123]: INFO:root:Packets ACKed: 0
Aug 31 22:56:50 mngt01 python3[43123]: INFO:root:Bytes sent: 0
Aug 31 22:56:50 mngt01 python3[43123]: INFO:root:Options: OrderedDict([('tsize', '199952'), ('blksize', '1468')])
Aug 31 22:56:50 mngt01 python3[43123]: INFO:root:Blksize: 1468
Aug 31 22:56:50 mngt01 python3[43123]: INFO:root:Retransmits: 6
Aug 31 22:56:50 mngt01 python3[43123]: INFO:root:Server port: 69
Aug 31 22:56:50 mngt01 python3[43123]: INFO:root:Client port: 1340
Aug 31 22:56:50 mngt01 python3[43123]: DEBUG:root:Closing response data object
Aug 31 22:56:50 mngt01 python3[43123]: DEBUG:root:Closing socket
Aug 31 22:56:50 mngt01 python3[43123]: DEBUG:root:Dying.
Aug 31 22:56:54 mngt01 python3[43123]: ERROR:root:timeout after 6 retransmits.
Aug 31 22:56:54 mngt01 python3[43123]: INFO:root:Stats: for ('::ffff:172.31.95.1', 1340, 0, 0) requesting 'efi64/syslinux.efi'
Aug 31 22:56:54 mngt01 python3[43123]: INFO:root:Error: {'error_code': 0, 'error_message': 'timeout after 6 retransmits.'}
Aug 31 22:56:54 mngt01 python3[43123]: INFO:root:Time spent: 14020ms
Aug 31 22:56:54 mngt01 python3[43123]: INFO:root:Packets sent: 7
Aug 31 22:56:54 mngt01 python3[43123]: INFO:root:Packets ACKed: 0
Aug 31 22:56:54 mngt01 python3[43123]: INFO:root:Bytes sent: 0
Aug 31 22:56:54 mngt01 python3[43123]: INFO:root:Options: OrderedDict([('tsize', '199952'), ('blksize', '1468')])
Aug 31 22:56:54 mngt01 python3[43123]: INFO:root:Blksize: 1468
Aug 31 22:56:54 mngt01 python3[43123]: INFO:root:Retransmits: 6
Aug 31 22:56:54 mngt01 python3[43123]: INFO:root:Server port: 69
Aug 31 22:56:54 mngt01 python3[43123]: INFO:root:Client port: 1340
Aug 31 22:56:54 mngt01 python3[43123]: DEBUG:root:Closing response data object
Aug 31 22:56:54 mngt01 python3[43123]: DEBUG:root:Closing socket
Aug 31 22:56:54 mngt01 python3[43123]: DEBUG:root:Dying.
Aug 31 22:57:44 mngt01 python3[43123]: DEBUG:root:Running the metrics callback
Aug 31 22:57:44 mngt01 python3[43123]: INFO:root:Server stats - every 60 seconds
Aug 31 22:57:44 mngt01 python3[43123]: DEBUG:root:Starting the metrics callback in 60s
And logs on client side (PXE boot):
>>Start PXE over IPv4 on MAC: XX-XX-XX-XX-XX-XX.
Station IP address is 172.31.95.1
Server IP address is 172.30.0.1
NBP filename is efi64/syslinux.efi
NBP filesize is 0 Bytes
PXE-E18: Server response timeout.
So, to debug, I tried to download the file from the server itself, using atftp client, and it worked perfectly. I was also able to deploy other more "standard" servers using this same fbtftp server.
To investigate, I disabled fbtftp server, and installed Ubuntu's tftpd-hpa server instead, and started it. This time, it worked and the client was able to download the file, but with a warning on server:
Aug 31 23:13:15 mngt01 in.tftpd[43986]: RRQ from 172.31.95.1 filename efi64/syslinux.efi
Aug 31 23:13:15 mngt01 in.tftpd[43986]: tftp: client does not accept options
Aug 31 23:13:15 mngt01 in.tftpd[43987]: RRQ from 172.31.95.1 filename efi64/syslinux.efi
So it seems (not sure, just a guess) that client is requesting something, that fails, and then client or server kind of "adapt" and it works the second time.
Do you have any guess on what is happening with the fbtftp server ? I would prefer to stay with fbtftp server as I am using multiple Linux distributions, and having a single tool for all of them is really nice ๐
Process StaticHandler-1:
Traceback (most recent call last):
File "/usr/lib/python3.4/multiprocessing/process.py", line 254, in _bootstrap
self.run()
File "/tmp/autotftp/.local/lib/python3.4/site-packages/fbtftp/base_handler.py", line 259, in run
self.run_once()
File "/tmp/autotftp/.local/lib/python3.4/site-packages/fbtftp/base_handler.py", line 272, in run_once
self._handle_timeout()
File "/tmp/autotftp/.local/lib/python3.4/site-packages/fbtftp/base_handler.py", line 350, in _handle_timeout
self._transmit_data()
File "/tmp/autotftp/.local/lib/python3.4/site-packages/fbtftp/base_handler.py", line 396, in _transmit_data
fmt = '!HH%ds' % len(self._current_block)
TypeError: object of type 'NoneType' has no len()
Is there any Python 2 deployment out there (probably not since it will require the backport of ipaddress module) or can they be safely removed now? I can send a pull request if they are not needed now.
Would you please consider releasing a "0.2" version to pypi? There's been a number of fixes since the 0.1 release last year and it'd be a great help for us to be able to install from pypi directly.
Would it be on your interest to have BSD compatibility added to the framework?
If it's of interest to you, I will work on making it compatible with both BSD and Linux and send a PR.
I already have it BSD-only in my fork, so I figured it wouldn't be too much work to get both.