Giter Site home page Giter Site logo

eldy / awstats Goto Github PK

View Code? Open in Web Editor NEW
368.0 25.0 121.0 20.53 MB

AWStats Log Analyzer project (official sources)

Home Page: https://www.awstats.org

Shell 0.19% Perl 76.99% Java 0.67% CSS 0.26% JavaScript 0.39% PHP 0.05% XSLT 0.48% Raku 20.49% Promela 0.48%
awstats log analyzer web web-statistics

awstats's Introduction

awstats's People

Contributors

aaronvangeffen avatar avian2 avatar bostjan avatar brentil avatar csware avatar dariodsa avatar edytuk avatar elbeardmorez avatar eldy avatar fbonzon avatar graiondilach avatar ibragimov avatar kearva avatar lambacck avatar leoshivas avatar manuelm avatar mayrstefan avatar mikelolasagasti avatar neilgierman avatar peterdavehello avatar return1 avatar s1738berger avatar shlomif avatar sjwebb avatar smortex avatar soutade avatar sveinbjornt avatar sydb avatar the-exterminator avatar visualperception avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awstats's Issues

Wrong name and missing icon for country codes AP & A1

Actual

Using version 7.5 with GeoIP plugin, the country code AP returns:

African Regional Property Organization

The text is wrong and the icon is missing.

Expected

AP refers to a special country code assigned by MaxMind:

A1,"Anonymous Proxy"
A2,"Satellite Provider"
O1,"Other Country"
AP,"Asia/Pacific Region"
EU,"Europe"
(Source)

Example IP address from AP as of filing this bug:

107.167.117.25

screen shot 2016-11-19 at 06 31 46

android-app:// What's this and is awstats able to handle it ?

I have found a few of these in my log files:

"android-app://com.google.android.googlequicksearchbox" "Mozilla/5.0 (Linux; Android 4.4.2; TegraNote-P1640 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.98 Safari/537.36"

so I created an entry in my search_engines.pm file to catch them. However, it does not find them and I have nothing in google catchall and nothing in other refers for them. So I don't know if or where they are being counted. They are not fetching robots.txt so they will not be identified as a robot either.

is awstats able to handle a referer starting with "android-app://" ?

Thanks

a suggestion for increased performance.

It occurs to me that when regex is being used to check for search engines that by using the full domain name beginning with ^ will increase performance since if the first character is not matched then regex can exit and save checking the whole string.
However, since the full domain may well begin www. I would also suggest stripping www. out of the regex check once it has been read into memory levaing example ^google.com$ instead of ^www.google.com$ and then stripping www. from the beginning of each refer domain before sending to regex for testing.
This should, I think, improve performance (possibly only marginally but a tad I think). And put a note in docs that wherever possible in search_engines.pm, to use full domain beginning with ^ when entering a new search engine into list.

I may try and amend search_engines.pm when/if I have time but can only do it for saerch engines I have log records for and know what full refer domain is. And of course there are a lot of generic ones in the list already which don't have full ^domain$ format. Having said that, since I added the google domains which cover the vast majority of searches and do have have full ^domain$ format, they should have some performance benefit even with www. included. That could well be the reason why performance has not been significantly degraded by adding so many extra search engine checks. Not 100% sure but I think so.

[edit]
Have thought some more about this and maybe it won't make much difference since so many searches are from google that the real identifier for google searches is the country TLD. Now if the regex could start at end of domain and search back to beginning of domain it would be faster but I don't think that's possible with regex without doing a string reverse on the domain and reversing all the regex. Maybe not such a good idea after all.

Country Map No Longer Visible

With the sunset of Adobe Flash in 2020 and browsers no longer supporting NPAPI, the Google Visualization Geomap already no longer displays in current browser versions. An alternate solution to this world map is needed.

References:

FTP log analyzer skips non 'p' MIME types

$REVISION = '20161204';
$VERSION = "7.6 (build $REVISION)";

18799	# Define page and extension
18800	#--------------------------
18801	my $PageBool = 1;
18802
18803	# Extension
18804	my $extension = Get_Extension($regext, $urlwithnoquery);
18805	if ( $NotPageList{$extension} || 
18806	($MimeHashLib{$extension}[1]) && $MimeHashLib{$extension}[1] ne 'p') { $PageBool = 0;}

Line 18806 excludes all mime types listed in mime.pm with non 'p' value. This is fine for HTTP logs, but there is problem with FTP logs (xferlog, ...) because analyzer skips valid records from ftp logs (images, audio, video, ...).

Analyze: Status code section is also missing code for ($LogType eq 'F') download processing, so we have problem :-)

19010	elsif ( $LogType eq 'F' ) {    # FTP record
19011	}

I made small change in code on my server to temporary fix this issue:

       ˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ
18806 ($LogType ne 'F' &&  $MimeHashLib{$extension}[1]) && $MimeHashLib{$extension}[1] ne 'p') { $PageBool = 0;}

SteamOS Debian distribution not detected

See:
https://sourceforge.net/p/awstats/feature-requests/887/

Please add SteamOS to the available detection list. I tried to add it in this post:
https://sourceforge.net/p/awstats/discussion/43428/thread/6871742e/?limit=25#882a

The unknown OS list is:

Unknown OS (useragent field)     
User agent (5)  Last visit
Debian_APT-HTTP/1.3_(1.0.10.2ubuntu1)   07 Jan 2016 - 05:37
Debian_APT-HTTP/1.3_(0.9.7.9)   07 Jan 2016 - 05:23
Debian_APT-HTTP/1.3_(1.0.9.8.1) 07 Jan 2016 - 05:22
Debian_APT-HTTP/1.3_(1.0.1ubuntu2)  06 Jan 2016 - 16:24
libwww-perl/6.08

You can view my stats page at:
http://steamos-tools-stats.libregeek.org

Most of my traffic does* come from APT, but there should be a way to detect SteamOS as an OS.

AWStats does not parse HTTP status 408 correctly from the log

It appears that the version of Awstats Debian 7 uses is ancient, but it does not correctly recognize 408 HTTP status codes. Not sure if this has been fixed in more recent releases:

Error while processing /etc/awstats/awstats.conf
Create/Update database for config "/etc/awstats/awstats.conf" by AWStats version 7.0 (build 1.971)
From data in log file "/var/log/apache2/access.log"...
Phase 1 : First bypass old records, searching new record...
Direct access after last parsed record (after line 508)
AWStats did not find any valid log lines that match your LogFormat parameter, in the 50th first non commented lines read of your log.
Your log file /var/log/apache2/access.log must have a bad format or LogFormat parameter setup does not match this format.
Your AWStats LogFormat parameter is:
4
This means each line in your web server log file need to have "common log format" like this:
111.22.33.44 - - [10/Jan/2001:02:14:14 +0200] "GET / HTTP/1.1" 200 1234
And this is an example of records AWStats found in your log file (the record number 50 in your log):
192.243.55.129 - - [18/Jul/2016:02:28:21 -0400] "-" 408 0 "-" "-"
Setup ('/etc/awstats/awstats.conf' file, web server or permissions) may be wrong.
Check config file, permissions and AWStats documentation (in 'docs' directory).

I also tried with LogFormat=1 and I got a similar error.

illigale use

can you exclude 66.249.* manipulation of used browsers

Problem after upgrading from 7.3 to 7.5

I just upgraded from 7.3 to 7.5 and my customers can't view any of their statistics.

When I browse to the awstats.pl file (http://www.witnesstoday.org/AWStats/cgi-bin/awstats.pl?config=hostmaster.witnesstoday.org), the only thing displayed to my users is Content-type:text/html; charset=utf-8 Cache-Control: public Last-Modified: Sat Jul 23 19:14:47 2016 Expires: Sat Jul 23 19:14:47 2016

It appears that header info for caching is being added incorrectly and too early, as shown below.


Content-type: text/html; charset=utf-8
Cache-Control: public
Last-Modified: Sat Jul 23 19:14:47 2016
Expires: Sat Jul 23 19:14:47 2016

<title>Statistics for hostmaster.witnesstoday.org (2016-07) - main</title> <noframes>Your browser does not support frames.
You must set AWStats UseFramesWhenCGI parameter to 0 to see your reports.
</noframes>

Popup stay on page top

I have created static reports. When I check these reports and leave the mouse cursor on an element I have a popup on window top right. But if I scroll down on the page the popup still stay stack on top right : I can't see it anymore ! Then no popup are visible when we scroll down as the page top right is not visible any more.

logresolvemerge.pl don't work correct

I have problem with sorting Apache2 combined log format

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

 # zgrep -E -h "1.2.3.4" /var/log/apache2/example.com_access.log* > ~/1.2.3.4.log
 # zgrep -E -h "4.3.2.1" /var/log/apache2/example.com_access.log* > ~/4.3.2.1.log
 # logresolvemerge.pl ~/1.2.3.4.log ~/4.3.2.1.log > ~/test.log
# head ~/test.log
1.2.3.4 - - [26/Sep/2016:15:47:25 +0200] "GET / HTTP/1.1" 200 4549 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"
1.2.3.4 - - [26/Sep/2016:15:48:39 +0200] "GET /wp-login.php HTTP/1.1" 200 1504 "http://example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"
1.2.3.4 - - [26/Sep/2016:15:48:39 +0200] "GET /wp-admin/load-scripts.php?c=0&load%5B%5D=jquery-core,jquery-migrate&ver=4.6.1 HTTP/1.1" 200 37547 "http://example.com/wp-login.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"
1.2.3.4 - - [26/Sep/2016:15:48:48 +0200] "POST /wp-login.php HTTP/1.1" 302 1259 "http://example.com/wp-login.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"
1.2.3.4 - - [26/Sep/2016:15:48:48 +0200] "GET /wp-admin/ HTTP/1.1" 200 15282 "http://example.com/wp-login.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"
1.2.3.4 - - [26/Sep/2016:15:48:50 +0200] "GET /wp-content/plugins/mailchimp-subscribe-sm/js/lpp_color_picker.js?ver=4.6.1 HTTP/1.1" 200 427 "http://example.com/wp-admin/" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"
1.2.3.4 - - [26/Sep/2016:15:48:50 +0200] "GET /wp-includes/css/editor.min.css?ver=4.6.1 HTTP/1.1" 200 6094 "http://example.com/wp-admin/" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"
1.2.3.4 - - [26/Sep/2016:15:48:56 +0200] "GET /wp-admin/plugins.php HTTP/1.1" 200 11572 "http://example.com/wp-admin/" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"
1.2.3.4 - - [26/Sep/2016:15:49:06 +0200] "GET /wp-admin/users.php HTTP/1.1" 200 8849 "http://example.com/wp-admin/plugins.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"
1.2.3.4 - - [26/Sep/2016:15:49:25 +0200] "GET /wp-admin/users.php&cmd=pwd HTTP/1.1" 404 501 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"
 # tail -50 ~/test.log
4.3.2.1 - - [24/Sep/2016:00:14:40 +0200] "POST /wp-login.php HTTP/1.1" 200 3567 "http://example.com/" "WPScan v2.9 (http://wpscan.org)"
[...]
4.3.2.1 - - [24/Sep/2016:00:14:41 +0200] "POST /wp-login.php HTTP/1.1" 200 3567 "http://example.com/" "WPScan v2.9 (http://wpscan.org)"
[...]
4.3.2.1 - - [24/Sep/2016:00:14:42 +0200] "POST /wp-login.php HTTP/1.1" 200 3567 "http://example.com/" "WPScan v2.9 (http://wpscan.org)"
[...]
4.3.2.1 - - [24/Sep/2016:00:14:43 +0200] "POST /wp-login.php HTTP/1.1" 200 3567 "http://example.com/" "WPScan v2.9 (http://wpscan.org)"
[...]
4.3.2.1 - - [24/Sep/2016:00:14:44 +0200] "GET /wp-content/plugins HTTP/1.1" 301 543 "http://example.com/" "WPScan v2.9 (http://wpscan.org)"

So it looks like it don't work correct there 👎, on top of file I have 26/Sep/2016:15:47:25 and on bottom 24/Sep/2016:00:14:44 and this are not last and first times from this file.

https://stackoverflow.com/a/6137712

# sort -t ' ' -k 4.9,4.12n -k 4.5,4.7M -k 4.2,4.3n -k 4.14,4.15n -k 4.17,4.18n -k 4.20,4.21n ~/test.log | head -1
4.3.2.1 - - [23/Sep/2016:23:50:57 +0200] "GET /%23wp-config.php%23 HTTP/1.1" 404 430 "http://example.com/" "WPScan v2.9 (http://wpscan.org)"
# sort -t ' ' -k 4.9,4.12n -k 4.5,4.7M -k 4.2,4.3n -k 4.14,4.15n -k 4.17,4.18n -k 4.20,4.21n ~/test.log | tail -1
1.2.3.4 - - [26/Sep/2016:17:00:49 +0200] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 420 "http://example.com/wp-admin/edit.php?post_type=subscribe_me_forms&cmd=ncat+4.3.2.1+4445" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"

suggestion

Is it posible to make a combi

so if you has some domain like
nl.domain.be
fr.domain.be
en.domain.be
es.domain.be
you has for any subdomain a stat and not for a collection domain.be (or i dont see it)

so if it is posible to make a miror to public, not anything public viewed only a small selection.
if it is xml we can implement it to the site (1h cache)

Could add "EncodeToPageCode" to the "geoip_asn_maxmind.pm" module?

Some ASN's returned by Geo-IPLite contain "iso-8859-1" encoded characters (eg. "ç" or c-cedilla), but with the awstats' PageCode set to "utf-8" these ASN characters are not being converted to utf-8. The city module "geoip_city_maxmind.pm" uses the function EncodeToPageCode() to convert cities to utf-8.

Could this be EncodeToPageCode() function also be used for ASN's, I think by:

In the module: "wwwroot/cgi-bin/plugins/geoip_asn_maxmind.pm"
in subroutine: "ShowInfoHost_geoip_asn_maxmind"

after the line: "if (length($asn)>0) {"

add:
$asn = EncodeToPageCode($asn);

Awstats: Icons are not displayed on the statistics page.

Hi

When open a page in the browser statistics mail servers icons are not displayed in the Visitors domains/countries (Top 10) and Hours. When you open the page code, you can see that an incorrect path to the icon
<img src="/awstatsicons/clock/hr1.png" width="12" alt="0:00 - 1:00 am">
while the right path so
/awstats/icon/clock/hr1.png
such an error in the incorrect path is for all icons

Awstats version: Advanced Web Statistics 7.5 (build 20160301)
OS: FreeBSD 10.3

Deprecated perl regex

Actual results after running /usr/share/awstats/wwwroot/cgibin/awstats.pl -update -config=model:

Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/"%{ <-- HERE Referer}i"/ at /usr/share/awstats/wwwroot/cgi-bin/awstats.pl line 9043.
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/"%{ <-- HERE User-Agent}i"/ at /usr/share/awstats/wwwroot/cgi-bin/awstats.pl line 9044.
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/%{ <-- HERE mod_gzip_input_size}n/ at /usr/share/awstats/wwwroot/cgi-bin/awstats.pl line 9045.
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/%{ <-- HERE mod_gzip_output_size}n/ at /usr/share/awstats/wwwroot/cgi-bin/awstats.pl line 9046.
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/%{ <-- HERE mod_gzip_compression_ratio}n/ at /usr/share/awstats/wwwroot/cgi-bin/awstats.pl line 9047.
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/(%{ <-- HERE ratio}n)/ at /usr/share/awstats/wwwroot/cgi-bin/awstats.pl line 9048.

I was able to correct locally by escaping the left brace on lines 9043-9048.

Operating Systems: Undetected Android

Advanced Web Statistics 7.5 (build 20160301)

Following User Agents not parsed for Operating System - defaulted to Unknown:

Mozilla/5.0_(Android_6.0.1;_Mobile;_rv:47.0)_Gecko/47.0_Firefox/47.0
Mozilla/5.0_(Android_6.0.1;_Mobile;_rv:46.0)_Gecko/46.0_Firefox/46.0
Mozilla/5.0_(Android_6.0.1;_Mobile;_rv:46.0.1)_Gecko/46.0.1_Firefox/46.0.1

http status code 103

Hello
We are using awstats for streaming logs, progresive download/pseudostreaming, both lighty and apache.

We are having some troubles with status code 103 that is returned by apache. If code 103 is included in awstats ValidHTTPCodes it will be seen as valid hit but every 103 code return as %b the entire video file size.

So it looks to me if someone click a few times on the video timeline to look ahead in the video each click is logged as code 103 with the entire video file size. So 5 clicks in the timeline equals as if the entire video is viewed 5 times, in terms of bandwidth. This ends up in exponential bandwidth consumption in the logs that didn't really happen.

I believe this happens only with httpd with h264_mod, the problem is not present with lighttpd with h264_mod.

If code 103 is taken out from the ValidHTTPCodes the 103 traffic is reported as not viewed traffic, ending up in a massive amount of not viewed bandwidth in the reports that looks very bad and not nice at all.

I've googled around but I couldn't find any reference to this problem anywhere.

Is anybody aware of this issue? I really need to solve it, any help is really appreciated.

Many thanks

Robot detection based on hits?

I have noticed an increasing number of bad robots that don't identify themselves as robots.
Typically they will fetch a sites root/home page html file and noting else. This can be seen in the Hosts (IP) report where you can see 1 page and 1 hit or 2 pages and 2 hits etc against a Host/IP. Checking the raw log files confirms this and that the user-agent has no bot indentification in it.
Unfortunately these bad bots are added to the unique visitors count when they should infact be added to unidentified robots count.
I appreciate this is tricky to catch in awstats especially since it could be a visitor coming back and most of the hit files are already in the users browser cache. However its pretty obvious these visits are not real visitors and the volume of them and their regular visits is very large. Something like 30% of visitors on one site I look after and similar on a couple of others.
This completely distorts the stats giving the impression of far more real visitors than there actually are.

Is there anyway to modify awstats to incoporate a configurable conf file option to say a page must have x amount of hits on it to be considered a real visitor otherwise its an unidentified robot? Most pages these days will have at least half a dozen or more file hits on them so the data is already in awstats program. How easy that is to implement may be another matter, I don't know.

awstats_buildstaticpages.pl automation (remove -config parameter)

I have some problems with automation of awstats with static html pages. I run awstats database update as prerotate of nginx logs as indicated here:
/usr/share/doc/awstats/examples/awstats_updateall.pl now -awstatsprog=/usr/lib/cgi-bin/awstats.pl

and this works like a charm, it takes every possible config from /etc/awstats/, but when I would like to generate html with buildstaticpages.pl i have to every path and every config name

/usr/share/awstats/tools/awstats_buildstaticpages.pl -update -config=domainname.com -dir=/home/stats/ -awstatsprog=/usr/lib/cgi-bin/awstats.pl

this creates a file: awstats.domainname.com.html in /home/stats/.

Is it possible to take those parameters from /etc/awstats? I guess it would be similar to awstats_updateall.pl. Maybe it would be possible to provide whole html file (as an example it could be index.html in folder /home/stats/domainame.com/index.html

Bug in latest 7.6 and NCSA Log Format: URL-Strings gets truncated after the first blank!

There is a problem with NCSA LogFormat 4 in combination with URLs that have blanks.
The URL-Strings gets truncated after the first blank although it is included inside quotes!

Example:

LogFormat=4
(#LogFormat = "%host %other %logname %time1 %methodurl %code %bytesd")

172.30.22.5 - tom.smith [03/Jan/2016:10:39:06 +0100] "GET /Download/__Omnia__Behandlung elektronischer Geschäftsstücke__Ergänzung 2016.pdf HTTP/1.1" 200 96063

It tracks truncated as "Page-URL":
/Download/__Omnia__Behandlung

instead correct as File under "Downloads":
/Download/__Omnia__Behandlung elektronischer Geschäftsstücke__Ergänzung 2016.pdf

So all statistics for Page-URL and Downlads counted wrong!

404 detail page doesn't update after first parse

I'm currently running awstats 7.6 (7.6-3.1.el7) in a CentOS 7.3.1611 machine that parses nginx (1:1.10.2-1.el7) logs.

The problem that' I'm having is that after the first initial parse of the logs the 404 hits detail page (example) the hit counter stops updating.

The AWStats summary front page continues counting it right. It's just the details that stops updating.

A quick workaround is to delete /var/lib/awstats directory contents and let awstats re-parse everything. However, it only works for the first time.

I suspected that it may be related to SELinux, but setting it to permissive doesn't help.
No errors are shown when updating manually.

Any ideas what it could be or how can I debug a little more?

GeoIp2 replace GeoIP

GeoIP .dat file will not be used anymore, the application shall handle .csv or .mmdb files

We will be discontinuing updates to the GeoLite Legacy databases as of April 1, 2018. You will still be able to download the April 2018 release until January 2, 2019. GeoLite Legacy users will need to update their integrations in order to switch to the free GeoLite2 or commercial GeoIP databases by April 2018.

For more information, please visit our Support Center.

In addition, in 2019, latitude and longitude coordinates in the GeoLite2 databases will be removed.* Latitude and longitude coordinates will continue to be provided in GeoIP2 databases. Please check back for updates.

XML Parsing Error allhosts.xml

My guess is that the host name from geoIP is not encoded prior to output...
The second quote should end just after "F in "TELE"

"XML Parsing Error: not well-formed".

201.xxx.xxx.xxxBrazilTELEF�NICA BRASIL S....Unknown49117.94 KB06 Feb 2017 - 15:52

----------------------------^
`

Alternative robots.pm search_engines.pm

I have created alternative robots.pm and search_engines.pm.

These have been corrected and re-tested today ( I made mistakes in previous versions).

Where (and how) is the best place to put them for anyone wanting to trial them? I'm not famliar with Github but have put files in fork visualperception/awstats for now if you want to try them out.
https://github.com/visualperception/awstats/tree/develop/wwwroot/cgi-bin/lib

The biggest change is that search_engines.pm now contains all google countries split by country code and normal or image searches.
Fact is that the vast majority of search engine visits come from google and having a split by country is useful.
I offer these files for possible inclusion into official release. Some people may find them useful.

I tested against a 1.4GB stats file (4 years logs from one site, just over 6 millions rows ) and compared timings with the standard search_engines.pm and robots.pm unedited from awstats 7.5. The times from the two runs were almost the same so the there is no performance penalty using these files. In fact the new files ran just a tad faster but bear in mind the new files are tuned to my site. Tune them to your own site and they should work just as well. i.e. put your sites most commonly used bots and engines at top of lists, especially google country code for your site.

"DirTemp" option

All temp-files are written to the "DirData" directory.
We would like to reduce I/O - so it would be great if we could write the temp-files ('.tmp','.tmp.bis') to a "NFS" mounted harddisk on a different machine.
So maybe you could add a "DirTemp" option?

LogFormat syntax for AWS ELB

Greetings,

I'm trying to figure out the correct LogFormat syntax to use for webserver logs from an Amazon elastic load balancer. They have this format.

2016-10-11T00:07:37 Name-of-ELB source-IP:port dest-ip:port 0.000048 0.002299 0.000022 200 200 0 1271 "GET https://www.mysite.com:443/content HTTP/1.1" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0" iphers TLSv1.2

I am attempting to use:
LogFormat="%time5 %other %host %other %host_r %other %other %other %other %code %other %other %bytesd %methodurl %other %other"
I have also tried removing the "T" in the timestamp and using %time2. When I run:

perl awstats.pl -update -config=mysite

AWStats did not find any valid log lines that match your LogFormat parameter, in the 50th first non commented lines read of your log.
Your log file /home/opswat/logs/www.opswat.com/access_log must have a bad format or LogFormat parameter setup does not match this format.

If I shorten the log file to just the first 49 lines, I don't get this error, just that 49 corrupted records were found and nothing else.

Thanks in advance for any tips.

.txt files are listed as "download"

Text files (.txt) are listed as download. Is there a reason for that?

Normally .txt fiels are served by the browser as any other page and displayed w/o the need for downloading.

geoip plugin error on HTML output

I'm running log processing and HTML cgi output on two different servers.
First server has geoip Geo/IP.pm module loaded, second server has not (it does not need it to publish awstats results trhough cgi).
Up to awstat 7.0, all was running nice. Now, with awstats 7.6, also HTML cgi tries to load those plugins.
I've looked into the source, and apparently it should not load those plugins.
I'm missing something, or it is a bug?

<html><body>
<br /><span style="color: #880000">
Error: Plugin load for plugin 'geoip' failed with return code: Error:
Can't locate Geo/IP.pm in @INC (@INC contains: ./lib ./plugins /etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl . ./lib ./plugins) at (eval 7) line 1.
Can't locate Geo/IP/PurePerl.pm in @INC (@INC contains: ./lib ./plugins /etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl . ./lib ./plugins) at (eval 8) line 1.
Error: Need Perl module Geo::IP or Geo::IP::PurePerl
</span><br />
<br /><b>Setup ('./awstats.conf' file, web server or permissions) may be wrong.</b><br />
Check config file, permissions and AWStats documentation (in 'docs' directory).
</body></html>

Regards,

Tonino

SRWare Iron Browser

https://www.srware.net/en/software_srware_iron_download.php

Could this Browser be added to browsers please. I tried to add it myself but I'm not a perl programmer and get errors when I try and run it.
The UA for Iron on windows is:

older one:
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2500.0 Iron/47.0.2500.0 Safari/537.36"

or latest is:

"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2800.0 Iron Safari/537.36"

The difficulty is that Iron is a version of Chromium (not Chrome which is Googles version of Chromium) and it has Chrome and Safari in its UA so it would need to be in awstats.pl and checked before Chrome and Safari.

And also that they seem to have removed the Iron version number ( which I think is always the same as the Chromium verion number that it is based on). I'm not sure if in future it will always be identified as in the second UA above or whether it may use the format with version number as in the first UA above.

I think that currently it would be logged as being a version of Chrome which it isn't (although very similar but without the google tracking that chrome has built into it).

Feature request: Regex replace filter

Hi,

If I understand correctly, URLWithQueryWithoutFollowingParameters works with ? parameters, but it is not helpful for so-called "clean" URLs.

For example, if there is url something like:

http://my.domain.com/some/api/get_info_since/3234324

Which means that every request with that version tag/counter will be treaded as unique URL with it's own statistics, right?

It would be nice to have, for example this way to filter/clean URLS:

RegexReplaceFilter=URL,(get_info_since)/d+,$1/_counter_

I am not sure about how/would captures work, but you probably get the point.

Multiple filters could supported, and maybe, for better performance, some conditioning could be applied:

RegexReplaceFilter1=URL,(get_info_since)/d+,$1/_counter_
RegexReplaceCodeFilter1=200
RegexReplaceCodition1=VHOST,my-webapp-with-clean-urls.com

I wish I could help implement it, if maintainer would approves, but I am not experienced in Perl...

upgrade v7.6 to v7.7 - Windows paths to plugins failed

Was running AWStats 7.6 on Windows Server 2012 R2 and had the various Maxmind plugins installed.
Copied the v7.7 files over the top and site started throwing error:

Error: Plugin init for plugin 'geoip' failed with return code: Error opening C:Perl64LibGeoGeoIP.dat at C:/Perl64/site/lib/Geo/IP/PurePerl.pm line 243. (A module required by plugin might be missing). 

Setup ('C:\inetpub\wwwroot\AWStats\cgi-bin/awstats.DefaultWebSite.conf' file, web server or permissions) may be wrong.
Check config file, permissions and AWStats documentation (in 'docs' directory).

Checked perms on the various GeoIP .dat files - all good. Even forced the IUSR account on all of them again. No change.
Updated ActiveState Perl from 5.16 to 5.24 and updated all packages in PPM - no change.
Updated PurePerl.pm from v1.25 to v1.26 - no change.

What fixed it was, in my awstats.conf file I had to change all the backslashes to forward-slashes in the paths to the Maxmind plugins, e.g.

LoadPlugin="geoip GEOIP_STANDARD C:\Perl64\Lib\Geo\GeoIP.dat"
change to
LoadPlugin="geoip GEOIP_STANDARD C:/Perl64/Lib/Geo/GeoIP.dat"

Being a Windows machine, I should not be using forward slashes in paths.

Any idea what changed in Awstats from 7.6 to 7.7 that would've caused this problem?

Nested includes fail

In a configfile, I use Include to include another configfile, which in turn also uses Include.

This will give me this warning:

Perl versions before 5.6 cannot handle nested includes

And the nested include is actually not included.
But my perl reports:
This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-thread-multi

Awstats 7.4 installed from an Opensuse Leap 42.1 repository.

Unable to read config file from non-standard location

I am using awstats 7.3 in a non-standard directory as I have no root access. It has been running fine for more than a year. Just today I found that I wasn't able to view my site analysis from the URL. I got the following error:
Error: Couldn't open config file "awstats.xyz.conf" nor "awstats.conf", after searching in path "., /etc/awstats, /usr/local/etc/awstats, /etc, /etc/opt/awstats": No such file or directory

My config file is located in the same directory as awstats.pl. The cron update from command line works fine. To get it to work from the browser, I had to replace line #16860 in awstats.pl from:
$DIR ||= '.';
to
$DIR ||= "Absolute Path to Config File";

Something has probably changed on my server side that is causing browser access to fail.

Alias /awstatsicons "/usr/local/awstats/wwwroot/icon/"

Hi,

Seems to be that the URL location is wrong in images path.....

http://www.myawstas.local/icon/other/awstats_logo6.png
http://www.myawstas.local/icon/other/vu.png
http://www.myawstas.local/icon/other/vk.png

In your recommendation for httpd.conf modifications you says:

Alias /awstatsicons "/usr/local/awstats/wwwroot/icon/"

But the images does not appear. I workaround this with:

Alias /icon "/usr/local/awstats/wwwroot/icon/"

Thanks for your support

country name not correct

Hi, for country name Azerbaijan. in Awstat it is Azerbaidjan
but it has to be change to
Azerbaijan.

Illegal division by zero at /usr/share/awstats/wwwroot/cgi-bin/awstats.pl line 14250.

Greetings, I'm having truncated output with error message:
Illegal division by zero at /usr/share/awstats/wwwroot/cgi-bin/awstats.pl line 14250.

Steps to reproduce:
download tar.gz to /tmp/ and extract
cp /tmp/awstats/awstats.ljkpbTxWy1DjUzu63TUA3D.conf /etc/awstats
/usr/share/awstats/wwwroot/cgi-bin/awstats.pl -month=10 -year=2015 -output=main -config=ljkpbTxWy1DjUzu63TUA3D
(your path to awstats could vary)

Tested in 7.1, 7.4, 7.5 with same result.

Any help appreciated .-)

http://cipisek.stable.cz/awstats.tar.gz

Feature request: support Apache time to server request / execution time

Hi,

Apache has %D, %T and %{ms|us|s}T log fields to output time it took to server request. It would be very useful to have support for these fields to catch "expensive" queries.

For example, to enable this feature, AWStats user could:

  • append LogFormat with %extra1 field, configure Apache appropriately.
  • set ExecutionTimeField=extra1 (by default not used),
  • set ExecutionTimeUnit=us for use with Apache's %D field for logging in microseconds.

Alternatively, there could be %exec_us, %exec_ms, %exec_s predefined fields, without reusing %extraN ones, if that's considered better way.

It that's set up, reports could have additional fields such as Mean execution time, Median execution time, Total execution time, shown together with Viewed, Average size, etc, where appropriate.

Maybe ShowExecutionTimeStats=0|1 would be useful, but simply not setting ExecutionTimeField should be enough keep this feature disabled, as Apache does not provide this data by default.

Also, ExecutionTimeDisplayUnits=us|ms|s|human_readable #default: same as ExecutionTimeUnits would be useful to control how these values are printed.

Awststs handles old logging issues.

Awststs handles the old log problem using logresolving. pl to handle the old log, and after generating the TXT file, the time of using awstats_buildstaticpage.pl to generate HTML is still current, not displaying the old time statistics, what to do?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.