ggist / bip-rs Goto Github PK
View Code? Open in Web Editor NEWBitTorrent Infrastructure Project In Rust
License: Apache License 2.0
BitTorrent Infrastructure Project In Rust
License: Apache License 2.0
Because of the implications with the Timer
capacity, we want to make sure we prevent any panics that we surface to the user due to not enough timer slots being available per peer.
We should have the user set the number of peers they will want in the PeerManager
at any one time, and configure the Timer
capacity based on that.
Both of these are nice to haves, which should significantly speed up performance for users on a fast network (where disk transfer speed is the bottleneck, due to our implementation).
Two components that are required, FileHandleCache
and ReadWriteCache
. Each of these could implement FileSystem
, and could take some inner object implementing FileSystem
, that way, calls are forwarded on to the underlying FileSystem
(like NativeFileSystem
) when the cache doesn't have some entry and allows us to easily layer different FileSystem
s on top of one another.
Both of these will require us to support a new message, IDiskMessage::SyncTorrent(InfoHash)
. This will also create a new method, FileSystem::sync(Self::File)
, which will sync the contents of the file to disk. On NativeFileSystem
, this could be a no-op (though we should decide if we should call fsync
here...). For ReadWriteCache
, this could flush any cached file contents to the underlying FileSystem
, and for FileHandleCache
, this could drop all of the file handles.
A separate thing which may be a nice feature in the future, would be to have different allocation methods for AddTorrent
, implemented in terms of FileSystem
:
AddTorrent
)FileSystem
, then untangling the files later (possibly when SyncTorrent
is send to us)Linux:
test benches::bench_native_fs_1_mb_pieces_128_kb_blocks ... bench: 2,122,092 ns/iter (+/- 137,163)
test benches::bench_native_fs_1_mb_pieces_16_kb_blocks ... bench: 5,743,897 ns/iter (+/- 1,613,292)
test benches::bench_native_fs_1_mb_pieces_2_kb_blocks ... bench: 26,742,021 ns/iter (+/- 11,962,446)
Windows (Antivirus Disabled):
test benches::bench_native_fs_1_mb_pieces_128_kb_blocks ... bench: 2,922,467 ns/iter (+/- 236,198)
test benches::bench_native_fs_1_mb_pieces_16_kb_blocks ... bench: 22,277,612 ns/iter (+/- 3,752,197)
test benches::bench_native_fs_1_mb_pieces_2_kb_blocks ... bench: 156,519,336 ns/iter (+/- 11,694,039)
Windows (Antivirus Enabled):
test benches::bench_native_fs_1_mb_pieces_128_kb_blocks ... bench: 39,069,596 ns/iter (+/- 8,133,512)
test benches::bench_native_fs_1_mb_pieces_16_kb_blocks ... bench: 270,543,506 ns/iter (+/- 23,768,410)
test benches::bench_native_fs_1_mb_pieces_2_kb_blocks ... bench: Too Damn Long!
Windows Localhost (via Deluge):
Setup:
3.9GB Torrent
2MB Piece Size
16KB Block Size
Results (Best Timings):
200 MB/s Download
? (Reported 60MB/s Disk Activity)
Rust strings must be valid utf8. Bencoding is a binary format where keys are primarily binary data which just usually happen to be ascii or utf8.
In some cases they do not represent valid utf8 sequences. E.g. HTTP scrape responses contain binary infohashes as dictionary keys.
MetainfoFile
currently provides methods for initializing itself from either some bytes or a file at the given Path
. However, the MetainfoBuilder
API doesnt provide subsequent methods for saving the file out to a given Path
, only retriving the output bytes. We should either add functionality for saving out to a Path
for the MetainfoBuilder
or remove the ability to initialize a MetainfoFile
from a Path
.
MetainfoBuilder
typically terminates on either the build_from_file()
or build_from_directory()
function calls. The user typically should not have to differentiate between the two methods as they know what they are pointing their provided Path
at.
If clients are using the MetainfoBuilder
in our library for building metainfo files from large files, we want to be able to provide them a way of obtaining the percentage of the files that have been processed which will allow them to provide a loading bar of some sort to the user's of their application.
This is easy enough to do as our master hasher is constantly sending out pieces for workers to process along with the associated piece index. All we have to do is divide that piece index by the total number of pieces and send that value, probably an f64
value between 0 and 1 to either a user provided callback or a channel.
I like the callback idea because it would allow us to push large(ish) amounts of updates without taking up memory as opposed to a user who either did not care about the current status or did not realize they were keeping their end of the channel open, with messages being queued and taking up memory. However, I am not sure I want to execute user provided code in our master hasher loop due to slow downs and/or panics.
Peer wire protocol headers include a 4 byte message id. For most purposes, this u32
value need to be used as a usize
value. We should validate that the cast from a u32
to a usize
doesn't overflow, and if it does, we should be terminating the connection and propagating an appropriate error as currently we just panic.
Many of our crates try to be agnostic over the execution mechanism for their futures. However, currently we hardcode against Handle
to spin off asynchronous tasks.
It would be nice if our crates depended instead on some trait in bip_util
that exposed a function to pass a future to, which would be executed when some event loop/other mechanism, starts up.
Issue for tracking what is implemented and what is left for implementing the bip_peer
module which will include an API for programmatically queueing up torrent files for download given a MetainfoFile
or MagnetLink
.
The basic idea is that the TorrentClient
communicates with the selection strategy thread over a two way channel. From the client to the strategy thread, we can stop, start, pause, or remove torrents from the download queue. We can also provide configuration options to limit upload/download bandwidth either client wide or on a per torrent basis. From the strategy thread to the client thread we can provide notifications for when torrents are done or if any errors occurred.
The selection strategy thread is concerned with sending and receiving high level peer wire protocol messages, initiating peer chokes/unchokes, and deciding what piece to transmit or receive next and from what peer. Each peer is pinned to a channel which is connected to one of potentially many peer protocols, the strategy thread doesn't care what protocol. If a peer disconnects in the protocol layer, a message is sent to the strategy layer alerting it that the peer is no longer connected to us.
The peer protocol layer is concerned with reading messages off the wire, deserializing them into peer wire protocol messages heads (variable length data is ignored at this point). Special regions of memory may be set aside for bitfield messages, not sure if we should eat the cost of pre allocating or allocating on demand (they are only sent once per peer so on demand might not be bad).
The disk manager is what both layers use as an intermediary for sending and receiving pieces. If we determine in the selection strategy layer that we should send a piece to a peer, instead of loading that data in and sending it through the channel to the peer protocol layer, we will ask the disk manager to load in that data if it isn't already in memory. We will then receive a token for that request and send the token down to the peer protocol layer which will tell the disk manager to notify it when the piece has been loaded. It will then be able to access the memory for that piece. For receiving, the peer protocol layer will tell the disk manager to allocate memory for the incoming piece and get notified when it is ready. It will then be able to write the piece directly to that region of memory. I am not sure whether to do checksumming at this point or defer it to the selection strategy layer so that is TBD. After the write occurs, a message will be sent up to the selection strategy thread letting it know what piece it received from what peer.
This may change as I go about implementation as I want to make it easy to provide HTTP or Socks proxies in the future so I may have to go one layer below the protocol layer for that. At the same time, I want to reduce the number of threads that a TorrentClient
requires as currently, just taking into account tcp peers, it will take at least 8 threads (includes 4 worker threads for the disk manager but not including the thread running the user code that is calling into the TorrentClient
).
Disk Manager:
Handshaker:
Peer Protocol Layer:
Selection Strategy Layer:
Torrent Client:
Playing around with the examples, it looks like for the handshaking and peer wire protocols, we can't really take advantage of tokio-service
or tokio-proto
as both of those are oriented towards request/response communication (let alone long lived connections perhaps?).
Right now, I am looking at tokio-core
and futures
. However, I don't really feel like each of the components that we are building should spin up their own cores and communicate between each other. Instead, it would be ideal if bip_handshake
and bip_peer
depended solely on futures
and exported their own futures that could be connected in some way (peer handshake -> peer connect) to be run in one Core
by the end application, ideally as friction less as possible.
Will update with more information.
Currently any peers would be able to drain the client machine of memory by sending a message with a large payload (this would only affect variable length message fields).
We should add a max message length to PeerProtocolCodec
so that when the codec checks the number of bytes that the message will use, it can kill the connection and propagate an appropriate error (one that clients can identify and filter the peer in the Handshaker
).
The MIT license requires reproducing countless copies of the same copyright
header with different names in the copyright field, for every MIT library in
use. The Apache license does not have this drawback, and has protections from
patent trolls and an explicit contribution licensing clause. However, the
Apache license is incompatible with GPLv2. This is why Rust is dual-licensed as
MIT/Apache (the "primary" license being Apache, MIT only for GPLv2 compat), and
doing so would be wise for this project. This also makes this crate suitable
for inclusion in the Rust standard distribution and other project using dual
MIT/Apache.
To do this, get explicit approval from each contributor of copyrightable work
(as not all contributions qualify for copyright) and then add the following to
your README:
## License
Licensed under either of
* Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
* MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
at your option.
### Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you shall be dual licensed as above, without any
additional terms or conditions.
and in your license headers, use the following boilerplate (based on that used in Rust):
// Copyright (c) 2015 t developers
// Licensed under the Apache License, Version 2.0
// <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT
// license <LICENSE-MIT or http://opensource.org/licenses/MIT>,
// at your option. All files in the project carrying such
// notice may not be copied, modified, or distributed except
// according to those terms.
And don't forget to update the license
metadata in your Cargo.toml
!
Using milestones and issues can help people understand the project's direction.
Issues can be used as tasks-to-complete and can be assigned to developers.
We can follow the CoreUtils way for example.
This issue was automatically generated. Feel free to close without ceremony if
you do not agree with re-licensing or if it is not possible for other reasons.
Respond to @cmr with any questions or concerns, or pop over to
#rust-offtopic
on IRC to discuss.
You're receiving this because someone (perhaps the project maintainer)
published a crates.io package with the license as "MIT" xor "Apache-2.0" and
the repository field pointing here.
TL;DR the Rust ecosystem is largely Apache-2.0. Being available under that
license is good for interoperation. The MIT license as an add-on can be nice
for GPLv2 projects to use your code.
The MIT license requires reproducing countless copies of the same copyright
header with different names in the copyright field, for every MIT library in
use. The Apache license does not have this drawback. However, this is not the
primary motivation for me creating these issues. The Apache license also has
protections from patent trolls and an explicit contribution licensing clause.
However, the Apache license is incompatible with GPLv2. This is why Rust is
dual-licensed as MIT/Apache (the "primary" license being Apache, MIT only for
GPLv2 compat), and doing so would be wise for this project. This also makes
this crate suitable for inclusion and unrestricted sharing in the Rust
standard distribution and other projects using dual MIT/Apache, such as my
personal ulterior motive, the Robigalia project.
Some ask, "Does this really apply to binary redistributions? Does MIT really
require reproducing the whole thing?" I'm not a lawyer, and I can't give legal
advice, but some Google Android apps include open source attributions using
this interpretation. Others also agree with
it.
But, again, the copyright notice redistribution is not the primary motivation
for the dual-licensing. It's stronger protections to licensees and better
interoperation with the wider Rust ecosystem.
To do this, get explicit approval from each contributor of copyrightable work
(as not all contributions qualify for copyright) and then add the following to
your README:
## License
Licensed under either of
* Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
* MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
at your option.
### Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.
and in your license headers, use the following boilerplate (based on that used in Rust):
// Copyright (c) 2016 redox-rs developers
//
// Licensed under the Apache License, Version 2.0
// <LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0> or the MIT
// license <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. All files in the project carrying such notice may not be copied,
// modified, or distributed except according to those terms.
Be sure to add the relevant LICENSE-{MIT,APACHE}
files. You can copy these
from the Rust repo for a plain-text
version.
And don't forget to update the license
metadata in your Cargo.toml
to:
license = "MIT/Apache-2.0"
I'll be going through projects which agree to be relicensed and have approval
by the necessary contributors and doing this changes, so feel free to leave
the heavy lifting to me!
To agree to relicensing, comment with :
I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to chose either at their option
Or, if you're a contributor, you can check the box in this repo next to your
name. My scripts will pick this exact phrase up and check your checkbox, but
I'll come through and manually review this issue later as well.
Currently failed handshakes will take up room in the buffer we allocate for connection tokens. We need to wait for Slabs to support something like insert_with_opt
because we want our connection to keep track of the Token
it is associated with so it can set it's own timeouts, however, the creation of the connection may fail and in that case the Token
would go unused.
Doing this will then let us support a continuous stream of handshakes without filling up with stale handshakes.
Document how we expose the name
field in torrent files as just a single File
with an Option::None
directory. This will confuse people using the library, who are already familiar with the internals of torrent files.
Our HandshakeFilter
could also filter on whether or not we initiated the handshake or a remote peer initiated it.
This would be pretty trivial to implement (create some enum, HandshakeInitiator
, with two variants and pass those in when we initiate/complete a handshake and test our filters).
Currently we stash errors like payload checks, protocol errors, block size checks, etc in io::Error
. However, tokio_io::codec::Decode
and tokio_io::codec::Encode
would allow us to specify some wrapper error.
Currently we store informative strings in the custom error type, so its not too difficult to track down why we severed a peer connection, but if we want client to be able to easily check why we severed the connection, and ban peers based on this, we need to export some enum of all possible error types they may want to ban on (as well as a catch all io::Error
variant).
Currently, we forward both initiate and complete handshakes to a central future that executes a handshake (with timeout), which then gets forwarded on to our Stream
to buffer completed handshakes for the user until the user pulls them out.
Even though we have timeouts on handshakes, it would be nice to be able to do n handshakes in parallel. For that, I believe we would need some sort of futures compatible mpmc implementation.
Some mechanism for persisting DHT nodes to disk is necessary to make our DHT truly decentralized and not rely on bootstrap nodes as is the current scenario. This would also give us a faster start up time depending on how we want to implement it for the DHT.
A naive implementation would be fairly easy, however, we want to decide whether this is something we should provide to the client or if we should just provide a RoutingTable
dump the the client and have them persist our nodes.
I prefer the former as it would let us be in charge of the encoding of the peer information and would simplify what clients would have to do to load those peers back in when starting the DHT up again since we know how the node information was stored.
Currently any crate using mio
is relying on the default values set for EventLoop
which could lead to inconsistent behavior for our APIs when capacity's set by our crates are larger than the default capacitys set by the EventLoop
. Therefore, we should be using EventLoopConfig
for all such crates.
We are probably doing something wrong with the Sink interface here.
Currently we are boxing Iterators for objects implementing TorrentView
. This allows us to create different Metainfo
parsers while allowing us to expose the fields from all parsers regardless of how those fields are stored at any moment. This does incur some overhead, especially since we are creating a new Box for every iteration of FilePath
. Whether or not we want this flexibility with the incurred overhead is debatable.
Currently most of our iterators that have two levels of boxing are consumed when the second level boxed iterator is requested. We could allow iterators to be cloned allowing users to request the iterator willy nilly but I don't want to gloss over the fact that this is expensive and so I would like to reflect that fact within the API.
Most of this may not be a problem when/if generic return types are implemented, but I am leaving this open for discussion.
Hi, many thanks for starting this project!
I've started looking how to use BTHandshaker::new()
, read up on mio, and wondered why I don't need to pass in my EventLoop. Instead handshaker.stream(info_hash)
returns a channel reader that I'm supposed to block for in my per-torrent thread? Especially when handling multiple torrents, it'd be great to continue to use mio in the calling code.
Which of the two paradigms do you intend for htracker and the wire protocol, async or threaded?
Sorry, this may be not the appropriate place for asking this, but how do I even connect to swarm?
Consider several scenarios:
It's been said that there is a DHT support, but I've been studying code for hours and could not locate the place to connect to DHT.
Currently we decode from a &[u8]
, but since our actual protocol codec uses BytesMut
, we should probably decode from that, as some messages that contain arbitrary byte payloads, like bitfield message and piece message, are currently allocating again.
If we wanted to have zero copy end to end, we should make it so that the corresponding messages instead store the BytesMut
slices, and that we can then pass these through to DiskManager
to then store the bytes (when applicable). When we get those bytes sent back to use via ODiskManagerMessage
, we can then drop them immediately, and the network side of things should be able to reuse that region of memory again.
Right now it is dependent on the order in which walkdir
gives them to us. However, for some torrents, you may want to have bigger file(s) in the front (streaming), order files to match piece boundaries as close as possible (selective file downloading), file name alphanumeric ordering, or perhaps a custom user defined ordering.
We should be able to support all of these use cases.
Currently our lazy bencode parser uses recursion to decode and encode data. This makes it trivial for anyone to crash one of the applications using bip_bencode
where the data is coming off the network. With a maximum stack of say 80 stack frames, they could crash our services using a minimum of 160 bytes (ex: just nest a bunch of lists, l l l l l l ... e e e e e e
).
The obvious solution is to implement an iterative decoder/encoder. However, since we are already introducing complexity into the bip_bencode
module by doing so, we could also reach for yet another performance boost in terms of our implementation.
What we used to have implemented was a level 1 bencode parser where all dictionary keys and byte arrays were allocated on the heap. We then moved to a level 2 bencode parser where those two structures were just references, but we still allocated bencode lists and dictionaries on the heap. LibTorrent has a nice blog post where they go over the implementation of a level 3 bencode parser that has both a borrowed list of bytes, as well as a (heap allocated) list of tokens pointing into the list of bytes.
The benefit that a level 3 bencode parser brings is that we get great token locality as well as amortized heap allocations since we are using a single (fairly small) heap allocated structure instead of many small ones. We can also see that encoding is as easy as returning the already made list of bytes (essentially a noop). The downside is we have to copy data used in our macros to that list of bytes and make corresponding tokens for them. However, we may be able to pre compute the needed pre allocation capacity for macro related bencode construction (depends on what macros allow). Additonally, dictionary searching goes down to (at best) log n retrival speed instead of constant depending on if we want to store key offsets and inserting ANYTHING into an already made structure will push (potentially many) bytes back which could get expensive depending on the usage patterns. For this reason, we may want to keep the current implementation and just add the new implementation as the current implementation is great at cheaply inserting data into an already made bencode structure.
Will benchmark against the current implementation to see the kind of performance boost we get to see if it is worth it or not.
Edit: Vulnerability only requires the list/dictionary to be started which means l l l l l ...
would suffice meaning a maximum of 80 stack frames requires a minimum of 80 bytes.
The BTHandshaker
implementation which is part of the dht_support
branch is structured to provide a common object to perform handshaking with connections from sources such as trackers, dhts, local peer discovery, or other discovery mechanisms.
At the moment, some initial benchmarking has been done and under full load the handshaker blows up and causes system instability. Initially it looks like it was creating too many system handles so a ThreadPool
has been used to cap the amount of thread handles generated. This improved the stability greatly but it looks like it is possible to also create too many TcpStreams
on both connection initiation, and completion via TcpListener
.
At this point, the ThreadPool
should act as a throttle for connection initiation, but the TcpListener
thread may be causing too many TcpListener
handles to be created. We should investigate whether a SyncSender
can solve this and where the sweet spot is in terms of performance for capping the number of completion worker messages.
To support http://www.bittorrent.org/beps/bep_0009.html in bip_peer
, users need to be able to parse, as well as build (serialize), InfoDictionary
s directly.
We should see if we want to make this a configuration options (or separate method), on MetainfoBuilder
, or a separate builder. Subsequently, we should allow a user a method similar to MetainfoFile::from_bytes
, but for an InfoDictionary
.
Since we now have macros that allow us to easily create Bencode
objects in code, we could look in to both the memory and performance benefits of switching Bencode::Dict
String
keys and Bencode::Bytes
Vec<u8>
objects over to Cow<'a, T>
objects.
As long as this doesn't cause regressions in performance for out current use case (parsing large amounts of bencoded data from a file) then it can be implemented on the current Bencode
object. Otherwise, we can always move it to a new object implementing BencodeView
and adjust the current macros accordingly.
There is a large performance drop between these two commits. The code in question is run on a sample of 20000 random torrents. It only reads the torrent files and then executes bip_bencode::Bencode::decode(&bytes).unwrap()
. Example torrent file is attached.
The first commit 202e3a7, specified in Cargo.toml as
bip_bencode = { git = "https://github.com/GGist/bip-rs", rev = "202e3a7" }
has a runtime of 300ms.
Just changing the revision to 4b08461 results in a runtime of 530ms.
This issue was automatically generated. Feel free to close without ceremony if
you do not agree with re-licensing or if it is not possible for other reasons.
Respond to @cmr with any questions or concerns, or pop over to
#rust-offtopic
on IRC to discuss.
You're receiving this because someone (perhaps the project maintainer)
published a crates.io package with the license as "MIT" xor "Apache-2.0" and
the repository field pointing here.
TL;DR the Rust ecosystem is largely Apache-2.0. Being available under that
license is good for interoperation. The MIT license as an add-on can be nice
for GPLv2 projects to use your code.
The MIT license requires reproducing countless copies of the same copyright
header with different names in the copyright field, for every MIT library in
use. The Apache license does not have this drawback. However, this is not the
primary motivation for me creating these issues. The Apache license also has
protections from patent trolls and an explicit contribution licensing clause.
However, the Apache license is incompatible with GPLv2. This is why Rust is
dual-licensed as MIT/Apache (the "primary" license being Apache, MIT only for
GPLv2 compat), and doing so would be wise for this project. This also makes
this crate suitable for inclusion and unrestricted sharing in the Rust
standard distribution and other projects using dual MIT/Apache, such as my
personal ulterior motive, the Robigalia project.
Some ask, "Does this really apply to binary redistributions? Does MIT really
require reproducing the whole thing?" I'm not a lawyer, and I can't give legal
advice, but some Google Android apps include open source attributions using
this interpretation. Others also agree with
it.
But, again, the copyright notice redistribution is not the primary motivation
for the dual-licensing. It's stronger protections to licensees and better
interoperation with the wider Rust ecosystem.
To do this, get explicit approval from each contributor of copyrightable work
(as not all contributions qualify for copyright) and then add the following to
your README:
## License
Licensed under either of
* Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
* MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
at your option.
### Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.
and in your license headers, use the following boilerplate (based on that used in Rust):
// Copyright (c) 2016 bip-rs developers
//
// Licensed under the Apache License, Version 2.0
// <LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0> or the MIT
// license <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. All files in the project carrying such notice may not be copied,
// modified, or distributed except according to those terms.
Be sure to add the relevant LICENSE-{MIT,APACHE}
files. You can copy these
from the Rust repo for a plain-text
version.
And don't forget to update the license
metadata in your Cargo.toml
to:
license = "MIT/Apache-2.0"
I'll be going through projects which agree to be relicensed and have approval
by the necessary contributors and doing this changes, so feel free to leave
the heavy lifting to me!
To agree to relicensing, comment with :
I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to chose either at their option
Or, if you're a contributor, you can check the box in this repo next to your
name. My scripts will pick this exact phrase up and check your checkbox, but
I'll come through and manually review this issue later as well.
This issue was automatically generated. Feel free to close without ceremony if
you do not agree with re-licensing or if it is not possible for other reasons.
Respond to @cmr with any questions or concerns, or pop over to
#rust-offtopic
on IRC to discuss.
You're receiving this because someone (perhaps the project maintainer)
published a crates.io package with the license as "MIT" xor "Apache-2.0" and
the repository field pointing here.
TL;DR the Rust ecosystem is largely Apache-2.0. Being available under that
license is good for interoperation. The MIT license as an add-on can be nice
for GPLv2 projects to use your code.
The MIT license requires reproducing countless copies of the same copyright
header with different names in the copyright field, for every MIT library in
use. The Apache license does not have this drawback. However, this is not the
primary motivation for me creating these issues. The Apache license also has
protections from patent trolls and an explicit contribution licensing clause.
However, the Apache license is incompatible with GPLv2. This is why Rust is
dual-licensed as MIT/Apache (the "primary" license being Apache, MIT only for
GPLv2 compat), and doing so would be wise for this project. This also makes
this crate suitable for inclusion and unrestricted sharing in the Rust
standard distribution and other projects using dual MIT/Apache, such as my
personal ulterior motive, the Robigalia project.
Some ask, "Does this really apply to binary redistributions? Does MIT really
require reproducing the whole thing?" I'm not a lawyer, and I can't give legal
advice, but some Google Android apps include open source attributions using
this interpretation. Others also agree with
it.
But, again, the copyright notice redistribution is not the primary motivation
for the dual-licensing. It's stronger protections to licensees and better
interoperation with the wider Rust ecosystem.
To do this, get explicit approval from each contributor of copyrightable work
(as not all contributions qualify for copyright, due to not being a "creative
work", e.g. a typo fix) and then add the following to your README:
## License
Licensed under either of
* Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
* MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
at your option.
### Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.
and in your license headers, if you have them, use the following boilerplate
(based on that used in Rust):
// Copyright 2016 bittorrent-rs developers
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.
It's commonly asked whether license headers are required. I'm not comfortable
making an official recommendation either way, but the Apache license
recommends it in their appendix on how to use the license.
Be sure to add the relevant LICENSE-{MIT,APACHE}
files. You can copy these
from the Rust repo for a plain-text
version.
And don't forget to update the license
metadata in your Cargo.toml
to:
license = "MIT/Apache-2.0"
I'll be going through projects which agree to be relicensed and have approval
by the necessary contributors and doing this changes, so feel free to leave
the heavy lifting to me!
To agree to relicensing, comment with :
I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to chose either at their option.
Or, if you're a contributor, you can check the box in this repo next to your
name. My scripts will pick this exact phrase up and check your checkbox, but
I'll come through and manually review this issue later as well.
While testing the TrackerClient
in bip_utracker
it looks like if many requests are sent to a single tracker in a short amount of time we will start to see something akin to request throttling happening to us. I believe this throttling was just for connection id requests judging from the packets that were being emitted by our client.
We should ideally be caching connection ids so that if multiple requests are sent to the same tracker (ip, port) in a short amount of time we can avoid being throttled (or at least, throttled so easily). We could probably just store the ids in a map and when creating a new request, populate the connection id for the ConnectTimer
if there is one in the map.
We also need to account for two things, build up of unused connection ids for long running clients (memory leak) and renewing of connection ids:
During lookups, we use both alpha to specify how many requests we want to send in parallel on the initial lookup, and beta which is how many requests we want to send in parallel on responses with nodes whose ids are closer to our target id.
Currently, these parallel lookups could have overlap in the nodes that they request from. In practice, I have seen ~15 parallel lookups that all converge on one node. So it seems like all of the parallel lookups found the closest node they could and all ended up requesting from it. Doing stuff like this, coupled with clients potentially executing many searches in a short period of time, could get our client's node banned by the nodes that our lookups converge on.
Instead, we should filter out nodes we have already requested from when deciding if a response is worth iterating on or not. This may reduce the amount of contact information we receive, although I wouldnt expect it to be by a large margin since the contact information we would be missing would be from nodes that are not as close as we have found up to that point.
Because the calling code needs to setup and invoke both anyway, it could also:
That way a utracker client wouldn't depend on the handshaker.
Do you intend to start a project that combines the various crates into a functioning application?
Just stumbled across this project and noticed that it uses mio. Any thoughts about switching to Tokio and Futures?
Will have to double check that we aren't breaking spec, but in general, having to deal with the current types has been painful in downstream libraries (like bip_disk
).
File::length()
should return a u64InfoDictionary::piece_length()
should return a u64On a not spec breaking related note:
File::paths()
should return a &Path (so we should allocate a PathBuf
internally)We should update the return value to Option<&Path>
instead of Option<&str>
.
Currently experiencing sporadic performance with the MetainfoBuilder
. Initially I suspected it was due to the SHA-1 library that bip_util
uses. However, after switching that out, while the performance is a lot better (the old library may have been allocating while hashing), there are still cases where the piece hasher workers are experiencing major slowdowns.
Profiling has shown that the slowdown occurs when computing the SHA-1 value. However, I suspect it has something to do with the OS paging to the disk for the memory mapped bytes while computing the SHA-1 values. This is because when the slowdown occurs, the processors go from being maxed out to hovering around 20%, which is around what they would idle at outside of the benchmark.
The slowdown could be related to many factors as I have not been able to pin down what makes it occur exactly. It could be related to piece length (lengths could be slightly larger than the page size), workers getting out of sync (unlucky OS scheduling cause different threads to be on pieces VERY far away from one another causing more disk paging), or something I have not thought of.
In the mean time, I will be trying out other approaches, namely, switching out mmap for just reading files in directly into pre-allocated buffers, and sending those buffers as something the workers can hash directly and then re-using those buffers to read in more data.
This issue was automatically generated. Feel free to close without ceremony if
you do not agree with re-licensing or if it is not possible for other reasons.
Respond to @cmr with any questions or concerns, or pop over to
#rust-offtopic
on IRC to discuss.
You're receiving this because someone (perhaps the project maintainer)
published a crates.io package with the license as "MIT" xor "Apache-2.0" and
the repository field pointing here.
TL;DR the Rust ecosystem is largely Apache-2.0. Being available under that
license is good for interoperation. The MIT license as an add-on can be nice
for GPLv2 projects to use your code.
The MIT license requires reproducing countless copies of the same copyright
header with different names in the copyright field, for every MIT library in
use. The Apache license does not have this drawback. However, this is not the
primary motivation for me creating these issues. The Apache license also has
protections from patent trolls and an explicit contribution licensing clause.
However, the Apache license is incompatible with GPLv2. This is why Rust is
dual-licensed as MIT/Apache (the "primary" license being Apache, MIT only for
GPLv2 compat), and doing so would be wise for this project. This also makes
this crate suitable for inclusion and unrestricted sharing in the Rust
standard distribution and other projects using dual MIT/Apache, such as my
personal ulterior motive, the Robigalia project.
Some ask, "Does this really apply to binary redistributions? Does MIT really
require reproducing the whole thing?" I'm not a lawyer, and I can't give legal
advice, but some Google Android apps include open source attributions using
this interpretation. Others also agree with
it.
But, again, the copyright notice redistribution is not the primary motivation
for the dual-licensing. It's stronger protections to licensees and better
interoperation with the wider Rust ecosystem.
To do this, get explicit approval from each contributor of copyrightable work
(as not all contributions qualify for copyright, due to not being a "creative
work", e.g. a typo fix) and then add the following to your README:
## License
Licensed under either of
* Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
* MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
at your option.
### Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.
and in your license headers, if you have them, use the following boilerplate
(based on that used in Rust):
// Copyright 2016 bip-rs Developers
//
// Licensed under the Apache License, Version 2.0, <LICENSE-APACHE or
// http://apache.org/licenses/LICENSE-2.0> or the MIT license <LICENSE-MIT or
// http://opensource.org/licenses/MIT>, at your option. This file may not be
// copied, modified, or distributed except according to those terms.
It's commonly asked whether license headers are required. I'm not comfortable
making an official recommendation either way, but the Apache license
recommends it in their appendix on how to use the license.
Be sure to add the relevant LICENSE-{MIT,APACHE}
files. You can copy these
from the Rust repo for a plain-text
version.
And don't forget to update the license
metadata in your Cargo.toml
to:
license = "MIT OR Apache-2.0"
I'll be going through projects which agree to be relicensed and have approval
by the necessary contributors and doing this changes, so feel free to leave
the heavy lifting to me!
To agree to relicensing, comment with :
I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to chose either at their option.
Or, if you're a contributor, you can check the box in this repo next to your
name. My scripts will pick this exact phrase up and check your checkbox, but
I'll come through and manually review this issue later as well.
Currently BTHandshaker
exports mio::tcp::TcpStream
because internally that is the TcpStream
implementation we are using to make the handshake. We should instead convert completed handshake TcpStream
s into std::net::TcpStream
which would involve going through net2
to re-bind the socket as a blocking socket.
When mio
adds support to access the SOCKET
handle on windows we can move the conversion code into bip_util
and have implementations for both windows and unix conversions.
BEP 3 specifies that the infohash must be calculated from the raw representation of the dictionary as found in the source, not from round-tripping through the decoder and encoder. This requires the ability to find the offset and length of a particular value in the raw source buffer.
Similarly BEP 44 requires extraction of the raw data for the v
key.
Right now, we have Extensions
in bip_handshake
as well as built in extension messages in bip_peer
.
However, we should be able to allow users to define their own extensions as part of the handshaking process, and subsequently send those messages using our PeerManager
in bip_peer
when the unioned Extensions
struct indicates the peer supports such a message.
The bip
ecosystem is meant to contain a modular set of crates that expose functionality and services to clients wishing to leverage BitTorrent infrastructure in their applications. We want to be able to provide flexibility for a client so that they can painlessly integrate either a single crate or many crates from our ecosystem into their application. In the case of a single crate, that crate should provide a usable interface to clients; in the case of many crates, those crates should provide a unified interface to clients.
Because of this, we cannot afford to export a per crate asynchronous or synchronous interface for clients to use as that would force a specific architecture on our clients for the purposes of tieing our API into their application.
Provide a generic interface that clients can use to accept callbacks from every peer discovery service that our ecosystem offers. This callback interface should accept at the bare minimum:
My proposal is to modify the current Handshaker
to adhere to the following interface:
/// Handshaker for peer discovery services which may or may not contain request metadata.
trait Handshaker: Send {
/// Type that the metadata will be passed back to the client as.
type Envelope;
/// PeerId exposed to peer discovery services.
fn id(&self) -> PeerId;
/// Port exposed to peer discovery services.
fn port(&self) -> u16;
/// Connect to the given address with the InfoHash and expecting the PeerId.
fn connect(&mut self, expected: Option<PeerId>, hash: InfoHash, addr: SocketAddr);
/// Sets a new filter to filter requests based on an InfoHash and SocketAddr.
fn filter(&mut self, filter: Box<Fn(InfoHash, SocketAddr) -> bool + Send>);
/// Send the given Metadata back to the client.
fn metadata(&mut self, data: Self::Envelope);
}
Handshaker
implementations would typically accept some channel that an Envelope
can be sent over and make sure that the result they yield in response to a connect
can be convertible to an Envelope
type. Similarly, when a client goes to use the Handshaker
in a peer discovery service, the metadata returned from that service must also be convertible to an Envelope
.
As an example, lets see how we would integrate a BTHandshaker
, as well as one or more peer discovery services in with a mio
event loop. A concrete BTHandshaker
would accept a mio
channel that can send Envelope
types. The BTHandshaker
impl would assert that the user has provided a From
implementation for creating an Envelope
from a TcpStream
. The client then goes over to a TrackerClient
and tries to create one using our BTHandshaker
as the generic Handshaker
. The TrackerClient
impl would assert that the user has provided a From
implementation for creating an Envelope
from SomeMetadata
. Similarly, for every peer discovery service the client uses, this would be enforced for the service specific metadata.
For services which receive no metadata, a generic Handshaker
would be accepted and no constraint would be put on the contained Envelope
.
With this example, we can see how a mio
event loop is now integrated with a number of peer discovery services and can accept both metadata as well as connections over a single channel.
From
impls for the initial BTHandshaker
as well as the services it usesHandshaker
makes _little_ assumptions about the underlying transportHandshaker
Handshaker
is manipulated in a peer discovery service's own thread, synchronous programming requires a bit more effortBTHandshaker
requires user's to wrap types such as mio
's Sender
We need message containers for common extension messages, as well as some standard extension protocol container, which itself can contain multiple extension protocols (along with their messages):
InfoDictionary
(http://www.bittorrent.org/beps/bep_0009.html)Currently, we take a read lock on our Filters
in the event loop, which is less than ideal. We would like some way to get some read lock on that structure, which can work with our event loop.
Any chance some examples of how to use the different crates together will be added?
When we serialize a Bencode object we are explicitly sorting the keys before encoding them so that we follow the bencode specification.
Since then, we have derived Debug for Bencode and allowed Bencode building via macros. In the current design we do not sort or validate data coming or leaving from these two avenues respectively. We would have to sort and verify the sorting of dictionary keys every time we do a Debug print or build a Bencode object via macros. It will be significantly easier (and maybe more performant depending on our usage) if we just made the switch from a HashMap to a BTreeMap in Bencode::Dict.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.