Giter Site home page Giter Site logo

redwood's Introduction

Redwood is an internet content-filtering program. It adds flexibility and granularity to the filtering by classifying sites into multiple categories instead of just “Allow” and “Block.”

Basic Architecture

Redwood runs as an HTTP proxy server. It examines each HTTP message to determine if it should be allowed to proceed. If so, it passes the message on to its final destination. If not, it replaces the message with a customizable page stating that the request is not allowed (optionally giving the reason and providing a link for filing an overblock request).

Redwood’s filtering is based on URLs and also, where applicable, on page content.

Configuration File

By default, the main configuration file is located at /etc/redwood/redwood.conf. This path can be changed by using the -c command line switch. Configuration options may be specified either in the configuration file or as command-line switches. In the configuration file, they may be specified either as key = value or as key value. Comments are delimited with #. Values may be enclosed in double quotes, with the usual backslash escapes. Additional configuration files may be included by using the include directive.

An example configuration file:

# Listen for connections on port 8000.
http-proxy :8000

# the template for the block page
blockpage "/etc/redwood/block.html"

# directory of static files to be served by the internal web server
static-files-dir /etc/redwood/static

# directory of CGI scripts to run by the internal web server
cgi-bin /etc/redwood/cgi

# the directory containing the category information
categories /etc/redwood/categories

# the file containing the Access Control List configuration
acls /etc/redwood/acls.conf

# the minimum total score from a blocked category needed
# to block a page
threshold 275

# file configuring the content pruning
content-pruning /etc/redwood/pruning.conf

# file configuring URL query modification
query-changes /etc/redwood/safesearch.conf

# path to the access log
access-log /var/log/redwood/access.log

Categories

The configuration files for Redwood allow the user to establish any number of categories corresponding to the types of content that he wishes to block or to allow. As each HTTP message is processed, it is assigned a score in each category, based on the filter lists that are set up for that category. These scores are then used to determine whether the page should be blocked.

Each category is a assigned an action: allow, block, or ignore. A page will be blocked if the score for any category listed as block is higher than the highest score for any category listed as allow. If a category is listed as ignore, its score does not affect whether a page is blocked or not. However, a page is not blocked unless the score for the highest block category is greater than a certain configurable threshold. This prevents overblocks of pages with almost no textual content.

A category's action may also be set to acl. Then the category is ignored in the process of finding the top-scoring category for the page, but it is available for ACLs to act on it, whenever the page's score in that category is greater than zero.

The categories are stored in a directory whose location is specified in the configuration file. Each subdirectory of that directory defines a category (with the same name as the directory).

Each category’s directory contains a file named category.conf and any number of rule-list files. A category named “mechanical” might have a category.conf file like the following:

description: Auto Repair
action: allow

This configuration would mean that the category’s user-visible description would be “Auto Repair” rather than “mechanical,” and that pages that fall into the category would be allowed. The description defaults to the category name, and the action defaults to ignore. Actions can be overriden for specific users by the use of ACLs. A category.conf file may also have the entry invisible: true; this indicates that when a page is blocked because it belongs to that category, the response will be an invisible image instead of the usual block page.

Rule Lists

The rule-list files define the rules used to calculate the category’s score. Each rule-list file must have an extension of .list. (This rule ensures that files ending in .bak, .orig, etc. are ignored.) It is a plain-text file encoded in UTF-8. Comments are delimited with #. Here is an example of a rule-list file that might be in the directory for the “mechanical” category mentioned earlier:

napaonline.com 200 # Give napaonline.com 200 points for this category.
www.napaonline.com/catalog/ 50 # bonus points for NAPA's catalog

default 150 # The following domains will each get 150 points.
carquest.com
autozone.com

/t[iy]re/ 75 # Any page with tire or tyre in the URL will get 75 points.
/parts/h 50 # A page with parts in the hostname will get 50 points

<grease gun> 25 # 25 points for each occurrence of "grease gun" in the content
<oil filter> 25 100 # 25 points for each occurrence, but no more than 100 total

%909841dcf4d4c000ff7f00fe30820000 100 # A hash of an image from napaonline.com

There are four kinds of filter rules:

  • URL matching

    A URL matching rule consists of a domain name, optionally followed by a path. After it, separated by a space, is the weight—the number of points that get added to this category’s score for sites that match the rule.

    A rule for a domain will also match subdomains: napaonline.com also matches www.napaonline.com. A rule with a path will also match longer paths: www.napaonline.com/catalog also matches www.napaonline.com/catalog/result.aspx.

    If a domain and a subdomain (or a path and a subdirectory) are both listed, the subdomain will effectively get the sum of the two weights. For example, if xerox.com were listed with 100 points, and support.xerox.com were listed with 50 points, support.xerox.com would actually get a score of 150 points.

    If the host in the URL is an IP address, it can by matched by an IP rule. An IP rule starts with ip: (with no space after the colon). Then it has an IP address or an IP address range in any of three forms: "10.1.10.0-10.1.10.255", "10.1.10.0-255", and "10.1.10.0/24".

  • URL regular expressions

    A regular expression to match the URL is listed between slashes. The points are added to the category score for each page whose URL matches the regular expression. The URL is converted to lower case before comparing it to the regular expressions. The regular expression syntax is that supported by the RE2 library.

    A regular expression can be restricted to matching a certain part of the URL by adding a one-character suffix immediately after the final slash. A suffix of h matches the hostname (e.g. www.google.com), d matches the base domain name (e.g. google), p matches the path, and q matches the query.

  • Content phrases

    Unlike the other two kinds of rules, these apply to the content of the page, not the URL. Phrases are enclosed between angle brackets. Before testing to see if a phrase matches, both the phrase and the page are simplified: capital letters are converted to lowercase, all characters that are not letters or digits are replaced by spaces, and multiple spaces are replaced by single spaces. Then the phrase weight is added to the page’s score for the category for each time the phrase is found on the page. But if the phrase has a second weight listed, no more than that amount will be added no matter how many times the phrase occurs. (In the example, if “oil filter” occurred more than four times, the additional occurrences wouldn’t count.)

    The content of the page is scanned for phrases only if phrase scanning is selected with the phrase-scan ACL action.

  • Image Hashes

    Redwood can hash images using the library at https://github.com/andybalholm/dhash. The rule consists of a percent sign (%) followed by the 32-character hash calculated by the dhash program. The hash my optionally be followed by a hyphen and a threshold, which is an integer specifying the number of bits that may be different for another hash to be considered to match this hash (this overrides the global dhash-threshold setting).

    Images are hashed only if hashing is selected with the hash-image ACL action.

There is also a default rule. It specifies what weight will be assigned to rules that don’t specify a weight. It applies to all rules without a specified weight between it and the next default rule or the end of the file. If there is no default rule, the default weight is zero.

Weights must be integers, but they may be negative. Negative weights can be used to offset short, general matches with long, more-specific ones, e.g.:

<grease> 10
<grease paint> -10

If a page is blocked based on its URL (i.e. by URL matching and/or URL regular expressions), its content will not be evaluated because the page will not be downloaded.

URL Lists

If you have a really large list of URL matching rules that should all have the same score, you can put them in a special URL list file to save memory. The list will be stored in a much more compact format than regular rules, but there will be some false positives—URLs that are detected as matching the list, even though they actually don’t.

URL list files have the extension .urllist. They can contain only URL matching rules. The individual rules can’t have scores assigned to them; instead the score is set at the start of the file with a line like

score 1000

Access Control Lists (ACLs)

Much of Redwood’s functionality is configured with Access Control Lists (ACLs). Each request is assigned a number of ACL tags, and then an action is chosen based on those tags. For example:

acl no-web user-ip 192.168.1.25
block no-web

The first line creates an ACL tag no-web, and assigns it to all requests coming from IP address 192.168.1.25. The second line causes all requests with that tag to be blocked.

ACLs are checked at several points during the processing of a request: before sending the request to the origin server, after receiving a response, and after scanning the content for phrases. Each time, the request may have different ACL tags, since more information is available. Each stage also has a different set of possible actions, although there is some overlap. (The allow, block, and block-invisible actions are always available.)

Any number of ACL files can be loaded with the acl directive in the configuration file. An ACL file can load other ACL files with a line that contains include and the filename.

Assigning ACL Tags

ACL tags are assigned by lines starting with acl. These lines have the format:

acl tag-name attribute values

The tag-name can be any name that does not include spaces. The attribute refers to some property of the request or response (listed below). The values are a space-separated list; if any of them matches the attribute’s value, the tag will be assigned. If there is more than one acl line with the same tag name, the tag will be applied if any of them matches (logical OR). An ACL may have a description associated with it with a line like describe tag-name A long description.

In addition to the tags assigned by acl lines, a request is assigned a tag for its highest-scoring category (if the score is above the threshold). There is also an ACL invalid-ssl, which is automatically assigned to CONNECT requests when the data being sent over the connection is not valid SSL or TLS. Another virtual ACL is transparent, which is assigned to TLS connections intercepted on the transparent-https port.

The following attributes are available:

  • connect-port

    (request only) The destination port of a CONNECT request. This attribute never matches if the request method is not CONNECT.

  • content-type

    (response only) The response’s media type, usually taken from the Content-Type header. This can also be a generic type, with an asterisk after the slash:

      acl images content-type image/*
    
  • http-status

    (response only) The response's HTTP status code. If this is a multiple of 100, all status codes in that block of 100 will match.

  • method

    The HTTP request method, such as GET or POST.

  • referer

    The request’s Referer header. (This matches the same way as regular URL matching rules.)

  • referer-category

    A category in which the request’s Referer header has a positive score. (Note that this is just based on a positive score, not on the top category.)

  • server-ip

    The server’s IP address, or a range of addresses (in CIDR format, or with a dash). This attribute only matches if the request URL contains a literal IP address; it does not do a DNS lookup.

      acl google server-ip 172.217.0.0/16
    
  • time

    The current time.

      acl work-hours time MTWHF 9:00-17:00
    

    This attribute lets you select certain days of the week and/or ranges of times of the day. If the days of the week are specified, they must come first; they are abbreviated SMTWHFA. Any number of time ranges may be specified; the rule will match if the current time falls within any of them. Times must be in 24-hour format.

  • url

    The URL requested. (This matches the same way as regular URL matching rules.)

  • user-agent

    The User-Agent header. Instead of interpreting the remainder of the line as a list of values, this attribute interprets it as a single regular expression to be matched against the User-Agent string. The matching is case-insensitive.

  • user-ip

    The user’s IP address, or a range of addresses (in CIDR format, or with a dash).

      acl managers 10.0.2.5 10.0.1.0/24 10.0.2.18-25
    
  • user-name

    The username from HTTP proxy authentication.

ACL Actions

After the ACL tags are assigned, Redwood goes through the ACL files looking for an action to perform. An action will be selected only if it has all the tags specified in the action line. (And none of the negated tags; if a tag in an action line is preceded by an exclamation point, the request must not have that tag.) Since it goes through the files in order, earlier action lines take precedence over later ones. If it gets to the end of the file without finding a matching rule, it will use the default action of the highest-scoring category. If there is no category that scores over the threshold, the default action is allow.

An ACL action line may optionally have a description string at the end. This is a double-quoted string whose value will be available to the block page template as {{.RuleDescription}}.

  • allow

    Allow the request to proceed.

  • block

    Respond with an HTTP status code of 403, and send the standard block page.

  • block-invisible

    Respond with HTTP 403, and send an invisible 1-pixel image instead of a block page.

  • disable-proxy-headers

    Don't add headers that indicate that the request has passed through a proxy (X-Forwarded-For and Via).

  • hash-image

    (response only) Calculate a hash of the image, and compare it to the hash rules. If the difference between this hash and the one in the rule is less than the number of bits specified with --dhash-threshold (or the hash's individual threshold), it matches.

    This action should only be applied when the content is an image:

      acl image content-type image/jpeg image/gif image/png
      hash-image image
    
  • ignore-category

    Drop the highest-scoring category off the list of categories, and go through the ACL files again.

  • log-content

    Log the page's content. The content-log-dir configuration directive must be set. The page's content will be saved in that directory, with its MD5 hash as the filename. A line will be added to index.csv in that directory, linking the page's URL to its MD5 hash.

  • phrase-scan

    (response only) Run a phrase scan on the page content. Normally this will be configured to depend on the content type:

      acl text content-type text/* application/xhtml+xml
      acl css content-type text/css
      phrase-scan text !css
    
  • require-auth

    (request only) Send an HTTP 407 response if the request doesn’t have a Proxy-Authorization header.

  • ssl-bump

    (CONNECT requests only) Activate the SSLBump feature, to filter HTTPS connections. (Transparently intercepted HTTPS connections produce a virtual CONNECT request inside Redwood, so they can be filtered too.)

URL Query Modification

When processing an HTTP request, Redwood can modify the query parameters in the URL. The configuration file for these changes is specified with the query-changes keyword. Each line contains a URL-matching or URL-regular-expression rule, followed by a query expression. If the query in the URL already contains parameters with the same names as those specified in the file, they will be replaced with the new values. Otherwise the new values will be added.

# Force safe search on several search engines.
/www\.google\.[^/]+/search/ safe=vss
search.lycos.com adv=1&adf=on
search.yahoo.com vm=r
/hotbot/h adf=on
www.metacrawler.com familyfilter=1

Content Pruning

Between downloading a page and scanning its content for phrases, Redwood can perform “content pruning.” This is scanning the parsed HTML tree for elements matching certain criteria, and deleting those elements and their children.

Content pruning is controlled by a configuration file. Each line of the file contains a URL-matching or URL-regular-expression rule to specify what site or page the pruning applies to, and a CSS selector to specify what elements to delete. Between the two, there may be a threshold value. If a threshold is specified, the element and its children are deleted after the page is phrase-scanned if the score from the phrases found in a blocked category is at least the threshold.

# Craigslist personals and discussion forums
craigslist.org div#ppp, div#forums, option[value=ppp]

# Bing ad sidebar
bing.com div.sb_adsNv2

# Delete questionable forum topics.
talk.newagtalk.com/forums 50 td.messagecellbody > ul

Block Pages

When Redwood blocks access to a web page, it returns an HTTP response with a status of 404 Forbidden. Unless the category that caused the page to be blocked is configured as invisible, the body of the 404 response will be HTML rendered from a template file. The template file is specified with the blockpage configuration directive. The following placeholders may be used in the template file, to be replaced by the appropriate information when the block page is sent:

  • {{.URL}}

    the URL of the page that was blocked

  • {{.Categories}}

    the names of the categories that caused the page to be blocked

  • {{.Conditions}}

    the conditions of the ACL rule that caused the page to be blocked

  • {{.User}}

    the user’s IP address or username

  • {{.Tally}}

    a list of the rules that matched, and how many times each one matched

  • {{.Scores}}

    a list of categories, and how many points the page scored in each category

The block page is generated using the Go template package; see http://golang.org/pkg/text/template and http://golang.org/pkg/html/template for documentation.

There is one custom function defined for the templates to use, eq, which tests its parameters for equality.

Virtual Web Servers

Since the block page may need to refer to external resources (such as images, stylesheets, and scripts), Redwood includes an internal web server. This web server does not accept connections directly, but whenever Redwood processes a request with a server address of 203.0.113.1, it directs the request to the internal server instead of processing it normally. The content of the internal web server is configured with the static-files-dir, and cgi-bin directives.

If a more advanced virtual server is needed, you can use the virtual-host directive to transparently redirect requests for a given hostname to a different address, such as an Apache web server running on your gateway. If the server is running on your gateway, listening on port 8888, and you want it to be available as myserver.local, use virtual-host myserver.local localhost:8888. (Note: proxy settings are not set on the client, and Redwood is intercepting requests transparently, this will work only if the DNS server resolves the name to an IP address outside your local network. Any IP address will do, though. OpenDNS’s website-unavailable address works fine. Also note that virtual-host only works with HTTP, not with HTTPS.)

Test Mode

If Redwood is run with the -test switch, it does not run as a proxy server. Instead, it evaluates the URL given as an argument after the switch. It prints detailed debugging information about how the URL and its content would be rated if that page were requested in normal operation: how many times each rule matches, what the score is in each category, which categories would block the page, etc.

Log Files

Redwood has several categories of messages that can be logged:

General diagnostic messages are sent to standard error by default, and may be redirected to a file using normal shell redirection.

The access log has a line for each request processed. It is in CSV format and goes to standard output by default. It can be sent to a file by including the access-log directive in Redwood’s configuration file. The access log has the following fields: time, username or IP address, action (allow or block), URL, HTTP method (GET, PUT, etc.), HTTP response status (if an HTTP response was being processed), content type, content-length, whether the content was modified by Redwood, which rules matched (and how many times), the score for each category, the list of categories that caused the page to be blocked (if it was), the page title (if log-title is enabled), a list of the categories that were ignored even though they had higher scores than the one that determined the action, the User-Agent header (if log-user-agent is enabled), the HTTP version, the Referer header, the client platform (such as Windows or iPad, found in the User-Agent header), the filename from the Content-Disposition header (for downloaded files), the virus-scan result, the rule’s description, and the client’s IP address. The content length is meaningful only if a phrase scan was performed. The page title is available only if a phrase scan was performed and log-title was enabled in the configuration (logging the page title requires parsing the HTML, so it is disabled by default).

The TLS log has a line for each HTTPS connection that was intercepted. Like the access log, it goes to standard output by default, and it can be sent to a file with the tls-log directive. The TLS log has the following fields: time, username or client IP address, server name, server address, any error that was encountered, and whether the certificate used came from the certificate cache.

The Auth log has a line for each authentication event. As the other loggers, it goes to standard output by default, and it can be sent to a specific file with the auth-log directive. The Auth log has the following fields: time, auth status, auth type, remote ip address, client port, username, password, device platform, remote network, user agent, and a message explaining the auth event.

Authentication

Redwood can be configured (using the require-auth ACL action) to require HTTP basic proxy authentication, with a username and password. The usernames and passwords can come from a file that is specified by the --password-file configuration directive. Each line in the file consists of a username, a password, and some optional items, separated by spaces or tabs. Alternatively, a program can be specified to perform authentication with --authenticator. The program will be invoked with the username and password as command-line arguments. Its exit status determines whether the authentication is successful; if the exit status is zero, the user will be accepted.

The optional items in the password file are for setting up a custom proxy port for that individual user, to make authentication easier. The first optional item is a port number; if it is present, Redwood will listen for HTTP requests on that port. Only the specified user may use that port, but once a client has authenticated as that user, all further requests from that IP address to this port will be considered authenticated, whether they have the Proxy-Authorization header or not.

In addition, you can set up automatic authentication based on the device platform and the network the device is on. For example, to automatically authenticate an iPad on the Verizon network, you could have the following line in the password file:

ipad-user mySecurePassword 7500 iPad myvzw.com

Redwood uses the HTTP User-Agent string to determine a device's platform. The currently recognized platforms are Windows, Linux, Android, Macintosh, iPhone, iPad, and iPod.)

The network can be specified as an IP address range in CIDR notation (70.192.0.0/11) or as a domain name to be compared to an IP address's reverse DNS entry. Multiple networks can be specified, separated by commas. If a device successfully authenticates (using the username and password) from a network that is not on the list, that network will be added to the list of expected networks.

You can configure certain IP addresses (normally LAN addresses) to be pre-authenticated as specific users by specifying a mapping file with the ip-to-user option. The file must be formatted like this:

192.168.1.66 joe_pc
192.168.1.87 fred_pc

In this example, requests coming from 192.168.1.66 would be automatically authenticated as joe_pc, and requests coming from 192.168.1.87 would be authenticated as fred_pc.

See the Log Files section above for logging of Authentication events.

SSLBump

Redwood can be configured (using the ssl-bump ACL action) to perform Man-in-the-Middle filtering of HTTPS traffic. This feature is called SSLBump after the corresponding feature in Squid.

For SSLBump to work, Redwood must be configured with a root certificate that is trusted by the users’ browsers. Paths to the certificate and its private key are specified with the tls-cert and tls-key options. The certificate and key should be in PEM format.

Redwood uses the system root certificates to verify the identity of the sites it bumps. Other trusted root certificates can be specified with the trusted-root option.

The SSLBump feature only works with SSL version 3 and newer (including all TLS versions). By default, earlier versions are passed through unfiltered. It can be configured to block them instead with the block-obsolete-ssl option.

Transparent Proxy

With the proper firewall setup, Redwood can transparently intercept connections to web servers and filter them without needing to configure proxy settings on the client computers. Intercepted HTTP connections can use the same proxy port as is used for manually-configured proxy connections. For HTTPS connections, Redwood must be configured to listen for intercepted connections on a separate port, with the transparent-https directive.

The following configuration lines will set Redwood to listen on ports 6502 and 6510:

http-proxy :6502
transparent-https :6510

If Redwood is running on a Linux gateway/router system, the following iptables rules will enable transparent filtering for the computers on the LAN (assuming that the LAN interface is eth1):

iptables -t nat -A PREROUTING -i eth1 -p tcp --dport 80 -j REDIRECT --to-ports 6502
iptables -t nat -A PREROUTING -i eth1 -p tcp --dport 443 -j REDIRECT --to-ports 6510

To do the same thing with pf on a FreeBSD gateway (assuming that the LAN interface is re0):

table <filtered> { re0:network }
rdr pass inet proto tcp from <filtered> to any port 80 -> re0 port 6502
rdr pass inet proto tcp from <filtered> to any port 443 -> re0 port 6510

Classification Service

In addition to running as a proxy, Redwood can also be used as a URL classification service. It receives an HTTP request specifying a URL, and returns a JSON object that tells what categories it was classifed in, and the score for each category.

Categories can be excluded from the classification reports:

classifier-ignore sslbump
classifier-ignore masterwhitelist

The classification request must have the URL as an HTTP form parameter named "url." The JSON object in the response has the following keys:

  • url: the URL being classified
  • categories: an object with category names for keys, and their scores for values
  • error: any error that was encountered fetching or processing the page

For example, if Redwood is running on port 6502 on 10.1.10.1, http://10.1.10.1:6502/classify?url=https%3A%2F%2Fgolang.org might return {"url":"https://golang.org","categories":{"computer":266}}.

It can also classify text directly. http://10.1.10.1:6502/classify-text?text=programming+language might return {"text":"programming language","categories":{"computer":27}}.

PAC Files

Redwood can provide PAC (Proxy Auto-Configuration) files to automatically configure client computers to use it as their proxy. Then, whenever Redwood receives a request for /proxy.pac, it sends a PAC file directing the client to proxy its requests.

PAC files also make another feature possible: listening on separate, pre-authenticated ports for individual users. This helps to address various authentication problems resulting from software that doesn't support proxy authentication properly. To enable this, specify a custom port number on the user's line in the password file. Then, put a base64-encoded username/password pair (just like in an HTTP basic authentication header) in the PAC request URL (e.g. /proxy.pac?a=dXNlcm5hbWU6cGFzc3dvcmQ=). (Generate it by typing a command like echo -n username:password | base64 at a UNIX command prompt.) All requests received on that port from the same IP address as the PAC file request will be automatically authenticated as that user.

For devices that require an HTTPS PAC URL, an upstream proxy can be configured to handle the TLS termination. The reverse proxy config must set the X-Forwarded-For header so that Redwood can authenticate the correct IP Address. Also, the X-Forwarded-Host header should be set to return the hostname and default proxy port.

# Nginx example location block
location /proxy.pac {
    proxy_pass http://127.0.0.1:6502/proxy.pac;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Host $host:6502;
}

Scripting

To further customize its behavior, Redwood lets you define scripts that are run at various points as it processes a request. The original JavaScript scripting is described below. The newer Starlark scripting is described in the file starlark.md.

For more specialized rules than the ones that are supported by normal ACLs, you can write scripts (in JavaScript) that assign ACLs to requests. For example, suppose you want to block the sites that OpenDNS classifies as adult or phishing sites, but you want to have your own block page instead of the one OpenDNS provides. You could put a script like this in /etc/redwood/opendns.js:

var openDNSResult = lookupHost(request.URL.Host, "208.67.222.123");

if (openDNSResult == "146.112.61.106") {
	addACL("opendns-adult");
} else if (openDNSResult == "146.112.61.108") {
	addACL("opendns-phishing");
}

Put the following line in redwood.conf:

request-acl-script /etc/redwood/opendns.js

And in your ACL configuration file:

block opendns-adult
block opendns-phishing

redwood's People

Contributors

andybalholm avatar cbrake avatar dependabot[bot] avatar elico avatar mrbluecoat avatar obay avatar thinkwelltwd avatar unixabg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

redwood's Issues

Tagging versions

Currently there aren't any official versions tagging for the software.
If there is a possibility to say that this revision is has some versioning it would nice.
@andybalholm I can say that if you have a list of things that the software does then it's a list and can be percented from to V1.0 from 0% to 100% in some fashion.

What do you think?

Update from config files

Hey,

Can we reload updated config files without Restart the Application (like reconfigure in squid)?

HTTP/2 Is being used for all bumped HTTP/1.1 server which breaks connections and protocol

In my test's now I have a CentOS 7 server which has Cockpit on port 9090.
I am using RedWood with ssl-bump for all connections as a plain http proxy.
I have trouble accessing the cockpit web interface which is based on web sockets.
The basic issue is that the remote host is being bumped blindly into HTTP/2 but the remote server is using HTTP/1.1.
For many services it works fine but for websocket(wss://) and couple other security features the connections is breaking.
I do not remember where I have seen the sources on another project but, it is doable to verify what is the remote server tls/http support before forcing the client into HTTP/2.

What do you think @andybalholm?

Content phrases support multiple words

Hey,

For now you support word matching like this(to block page need 100):

<game>  10
<sport>  10

If in page the word "game" count 5 times all ok
if in other page the word "sport" count 2 times all ok
Now I want that if in page there is "game" and also "sport" (other place in page) I want to block it,
how to do it?
This not support :

<game>,<sport> 150

Thanks

Error in the README example of blockpage value

For quite a while I didn't found what is causing my setup to generate bad block page.
ie empty pages..
In the README the example is:
blockpage "/etc/redwood/block.html"

but it only works like this:
blockpage /etc/redwood/block.html

Without the double quotes.

How do I define ssl-bump exception rules for a big set of domains?

In squid I have the option to use regex for a single domain or multiple ie .microsoft.com will take both microsoft.com and all the subdomains.
And the destination IP is acl is missing for me, is it possible to add in some way an option for that?
Is the there such an option in RedWood?
(I am willing to put some efforts into it)

HTTPS

Hi,
I am having trouble configuring redwood for https.
For every https request the logs show:
2017-09-28 19:11:27,192.168.1.59,,,"error reading client hello: expected content type of 22, got 67",
What is content type 67? Any ideas?

Thanks

Not an ICAP server

The description says "Web content filter that runs as an ICAP server" but 
looking at the code this actually runs as a proxy, not an ICAP server.  

Original issue reported on code.google.com by [email protected] on 16 Aug 2014 at 4:16

santanderbank login issue

I cant login to santanderbank.com through the redwood proxy. I dont get an error just, after i put in my password it takes be back to home page. I confirmed that I can login without the proxy.

I tried with a min redwood setup.

redwood.conf
http-proxy :6502
acls /etc/redwood/acls.conf
tls-cert /etc/redwood/root.pem
tls-key /etc/redwood/root_key.pem
acls.conf
acl connect method CONNECT
ssl-bump connect

Just reporting the issue.

Thanks

Content injection

Hi,
I'm thinking about adding content injection to this project. I am wondering if there is any technical reasons why the author don't include content injection or is it just not within the authors use case?

Thanks

Build and install instructions?

Greetings,
I am new to golang and could you provide a set of build instructions?

I tried the following:

export GOPATH=/path/to/git-checkout-of/redwood/
go get ./

Then I got:
go install: no install location for directory /home/Code-Work/redwood outside GOPATH
For more details see: go help gopath

Not sure how to proceed. Please advise.

Replace Content inside scanned Page

Hey,

Is there a way to replace html code inside scanned page?
For now we have the "pruning.conf" that can only remove from page.
For example can it do like Dansguardian (contentregexplist):

-> <script language='javascript'>....some code.... Or: .*car -> new Text

Enhancement: Source ip acl action

We would like to specify the source ip address used for outbound requests in the ACL, so we are able to distinguish internal systems. Would that be an enhancement worth considering?

example:

acl users 10.0.0.0/24
acl managers 10.0.1.0/24
source 172.217.22.110 users
source 172.217.22.111 managers

Server buffer size defonition, is it exists?

Hey @andybalholm,
I have been using Squid-Cache and RedWood for quite some time and I would like to know if something like: read_ahead_gap of Squid-Cache exists in RedWood.
I am using it here since I have a server in an unrestricted and unlimited traffic zone in a server farm and a DSL line connected to this server farm.
When I am contacting directly many sites what happens is that I'm stuck because of some connection's limit per client and also by some QOS system for the DSL clients.
But the connection to the server farm which terminates the DSL pppoe connection is not restricted by any mean, I can utilize 100% of this segment.
With Squid-Cache I am using "ead_ahead_gap 16MB" and the server does the heavy lifting for me.
Is something like this exists or can be added into RedWood?

dns server

Is there a simple way I can set the dns server that redwood should use?
Thanks

Integration of drbl-peer into redwod

Hey,
I wrote https://github.com/elico/drbl-peer and I wanted to add redwood support for it.
I can try to contributre the coding for it via a fork and then a pull requet but I will need coupel pointers.
The basic setup is based on a single text file which will contain the relevenat details.
The caching can be done using a caching DNS server and\or an http caching proxy so there is no need to implement caching of results inside redwood.

I think that a simple "if file exists" then "use" the function with the file settings would be the most apropriate.
The drbl peers list can be contain a custom DB or a publicone like OpenDNS or Symantec.

Captive Portal Redirect

I have noticed that when I connect to a network with a captive portal I do not get a splash page allowing me to authenticate and am thus unable to browse the web. This happens on my iPhone as well as my laptop. I am aware that this is not necessarily an issue with Redwood but with proxies in general and was wondering if you knew of a solution.

android app ssl certificate

I am getting this type of error when some android apps try to connect with the proxy.
2017-10-11 00:28:53,192.168.1.90,slack.com,slack.com:443,error in handshake with client: remote error: tls: unknown certificate,cached certificate
Does this mean the app is not trusting my CA? i.e. it has a pinned CA?
Thanks

Craigslist.org very slow to load

I am noticing that whenever I try to access Craigslist through redwood it is taking a really long time. Over 2 minutes to fully load a page. The issues appears to be related to loading the javascript and css assets.

I am attaching a screenshot from developer tools to show what I mean.

craigslist

What can I do to get around this?

iptables rules

whats the iptables rules for a linux router to use the redwood proxy in a separate machine on the local network?
Thanks

Per user content pruning?

Is it possible to do per user content pruning? Let's say I have a user that wants to have all images removed from a number of sites. Is there a way that I can do this and apply it to a specific user or group of users or is there only a global policy?

Exclude some Https sites from ssl_bump

Hi,

There is some issue in "acl config" file that not working.
When I try this(in acl.conf):

acl connect method CONNECT
acl nobump url dk.com
ssl-bump connect !nobump

all working well(got dk.com real certificate)

But this not working:

acl connect method CONNECT
acl nobump url /dk/h
ssl-bump connect !nobump

Why can't use Url regular expressions?
How can I put sites tjat need to exclude from ssl-bump in some list file in categories dir(I try it without success)?

Redirect all outgoing traffic through redwood

I want to setup redwood so that it transparently filters all the outgoing traffic on my mac.

For my attempt, I copied the default configuration and (I hope correctly) created the root certificate and key. After this, redwood was run as user nobody and the firewall (PF) was configured to redirect all outgoing traffic not arising from user nobody to the redwood ports.

This configuration works for http, but for each https request the redwood logs show something like:

2018-03-14 02:34:03.817397,192.168.1.6,,127.0.0.1:6510,infinite redirect loop

Here is the PF configuration file I am using.

rdr pass inet proto tcp from any to any port = 80 -> 127.0.0.1 port 6502
rdr pass inet proto tcp from any to any port = 443 -> 127.0.0.1 port 6510
pass out route-to (lo0 127.0.0.1) inet proto tcp from any to any port = 80 user != nobody 
pass out route-to (lo0 127.0.0.1) inet proto tcp from any to any port = 443 user != nobody

These config-rules user a workaround mentioned here because the redirect command only applies to incoming traffic, but that may or may not be very relevant here.

Even without going into the specific details, would such an approach work?

EDIT: By the way, awesome project :)

pruning elements

Quick question, my /etc/redwood/pruning.conf file has this line but the "More information..." link still shows up when I browse to http://example.com/

example.com a

Thoughts?

iptables typo?

In your README iptables command, I believe it should be

iptables -t nat -A PREROUTING -i eth1 -p tcp --dport 80 -j REDIRECT --to-ports 6502
iptables -t nat -A PREROUTING -i eth1 -p tcp --dport 443 -j REDIRECT --to-ports 6510

URL based filtering

We are looking for a proxy to filter outbound connections from our servers based upon an url whitelist. We thought about running a combination of iptables and Redsocks on our servers, which redirects outbund connections to a proxy for filtering (as done by Spotify)

This still poses problems as adding something like github.com to the whitelist exposes quite a lot of content to those servers. We would like to filter this based upon the requested URL such as https://github.com/andybalholm/redwood.*. Especially for bumped TLS traffic, as most of the destinations are encrypted already.

Is this something which is possible with redwood or worth considering as a future feature?

Regex support for negative lookbehind/lookahead

Hi,

Seems that the regex implementation doesn't support negative lookbehind/lookahead syntax/feature.

See here for more info:
Regex Tutorial - Lookahead and Lookbehind Zero-Length Assertions

Below an example of how I would like to use it (the ?! part is the negative lookahead):

  • ^(.*\.)*((?!font[s]*).)*\.googleapis\.com$
  • Meaning whitelist googleapis.com domain, except if it is fonts.googleapis.com

This would make it easier just to block just one subdomain instead of whitelisting a lot if not all possible subdomains.

It is not a deal-breaker but makes it easier to whitelist stuff from a regex perspective.

See some regex examples here I use:
https://github.com/cbuijs/accomplist/blob/master/chris/regex.white
https://github.com/cbuijs/accomplist/blob/master/chris/tld-black.regex

The second one is interesting as it blocks anything except valid TLD's registered by IANA (it is a blacklist as it negates).

Cheers,
-Chris

YouTube Restrict

Hey,

To enfoce Google SafeSearch I added in safesearch.conf file this line:

/google/d safe=active

and it working very well.

How can I do safe search for Youtube that say to add this:

YouTube-Restrict: Strict

in the header(not in the end of the url)?

YouTube urls and links classification

Part of the logic of my classification DB SquidBlocker is youtube related.
To allow the option to block urls based on external classification pages I will write a daemon that will receive youtube urls and will return a weight of a category inside a JSON.

Before reaching to YouTube I will write a tiny classification service that uses the drbl-peer(https://github.com/elico/drbl-peer/) library that will only check for malware and offending abusive content ie(porn ,nudity and violence).

128 is the test for both abusive and malware content(phishing is considered abusive) while not testing for other categories.

Let me know if it sounds right.

Blocking categories for specific users

I'm a little confused as to how you can block certain categories of sites based on the username. For example, I would like to create a policy that blocks all image searches by certain users while allowing it for others. I tried adding changing the action to acl under categories/image-search but how can I apply this to a specific username?

iCloud block on iOS

I setup Redwood on an ubuntu server and it works on my devices, including iOS, but I'm receiving a certificate error, which is blocking iCloud. Is there any extra configuration that needs to be done to fix this?

Integration into NethServer (based on CentOS)

I introduced RedWood to the NethServer community at:
https://community.nethserver.org/t/redwood-filtering-proxy-server/6714/14

It has good potential and can be used in many deployments.
The CentOS 7 package can be used on NethServer but someone there needs to put some time on integration with the current system webui.

The first step would be to be able to enable\disable and stop\start\reconfigure the service.
The next step would be to add a DNAT\REDIRECT option with bypass Interception for specific ip addresses or domains.

Then there is more but I'm not there yet.

[Enhancement] Project Structure

I have inspected your project and found that the current structure is somehow confusing and hard to scale.
Do you have any plan in the future to refactor the current code towards the Package Oriented as Go structure?
Moreover, do you plan to supply the way to CRUD config and store in db?

content pruning example

Hi,
I am having trouble getting content pruning working.
I dont think the examples in the readme are still relevant.
If you can give me a current example of content pruning would be greatly appreciated.
Thanks

Domain Rewrite

Ref: #21

Would it be possible to implement a domain-changes capability similar to the query-changes option? This would enable the following types of scenarios:

  • youtube.com --> restrict.youtube.com
  • bing.com --> strict.bing.com
  • www.google.com --> forcesafesearch.google.com (alternate to vss querystring option)
  • duckduckgo.com --> safe.duckduckgo.com

Further development in redwood-config categories

Thank you for this project for I have used much of your work in my production.
I am wondering how you came up with the result in redwood-config categories?
I would like to develop url and web content classification base on this project and make it more accurate by doing it automatically. Do you suggest any idea to do so?
I am currently doing it manually but I don't know what keywords or phrases would be suitable to add, how many points to give and it is time consuming doing it this way.

Thank you

Logfile does not show real request?

I am pretty new to redwood so I am still trying to understand the mechanism. While doing that I stumble over the effect that one client request is printed to the log in a (for me) strange way:

This is the request my client is running:
curl --proxy http://proxy:18081 https://www.youtube.com

In the logfile I can see entries like this:
2021-02-26 15:18:59.682487,192.168.178.68,allow,http://www.youtube.com:443,CONNECT,0,,0,,youtube.com 1,"localbump 500, youtube 7",youtube,,,,HTTP/1.1,,

So if the client requests some HTTPS URL, why does redwood understand HTTP?

Redwood service stop after some time

Hey ,

I try to check the Redwood filter in port 80 only in transparent mode with heavy traffic (centos 7 ).
After some time (20 min) I get this in my messages log:
systemd: redwood.service: main process exited, code=exited, status=2/INVALIDARGUMENT
systemd: Unit redwood.service entered failed state.
systemd: redwood.service failed.

And if I try to restart again it fails, Why?

Url rewrite

Hey ,

In squid for example I used to change user url with "URL_Rewrite" so :
when he try to go to "www.google.com" I redirect him to 'bing.com'
How can I do here redirect users ?

Check Client Certificate in SSLBump

Hey,

How can I get client certificate details like "Issued By" ( to see if it my self signed certificate)
in TLS.GO (Before tlsConn.Handshake() )

Thanks,

Content phrases problem

Hi,
I try to block pages based on:
< 18+ > 1000

but it blocks all pages that have only 18 (it seem not to handle the plus sign)
How to block page that have '18+' (I tried also to put 18+, but no effect) ?

Thanks

Serve PAC file over HTTPS instead of HTTP

What would it take to have Redwood serve PAC files over HTTPS instead of HTTP? I believe there are issues with some apps on iOS that won't respect all the rules in PAC file if it isn't served securely.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.