Giter Site home page Giter Site logo

gigablast / open-source-search-engine Goto Github PK

View Code? Open in Web Editor NEW
1.5K 137.0 439.0 221.29 MB

Nov 20 2017 -- A distributed open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. From gigablast dot com, which has binaries for download. See the README.md file at the very bottom of this page for instructions.

License: Apache License 2.0

C++ 96.70% C 1.20% Shell 0.01% PHP 0.03% Makefile 0.10% HTML 1.42% Perl 0.03% Python 0.52%

open-source-search-engine's Introduction

open-source-search-engine

An open source web and enterprise search engine and spider/crawler. As can be seen on http://www.gigablast.com/ .

RUNNING GIGABLAST

See html/faq.html for all administrative documentation including the quick start instructions.

Alternatively, visit http://www.gigablast.com/faq.html

CODE ARCHITECTURE

See html/developer.html for all code documentation.

Alternatively, visit http://www.gigablast.com/developer.html

CONTACT

Contact me for feature requests or help in general. I will work for free for good use cases. [email protected].

BUILD DEPENDENCIES

On Debian (and derivatives)

sudo apt-get install make g++ libssl-dev zlib1g-dev

On RedHat/CentOS

sudo yum install gcc-c++ openssl-devel

open-source-search-engine's People

Contributors

alc-privacore avatar appchecker avatar bxmvva4v avatar coconutpilot avatar compunixaustralia avatar emmanuelcharon avatar gigablast avatar isj-privacore avatar miketung avatar onlyjob avatar revbooyah avatar shijuraj avatar vonbetz avatar yeeler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

open-source-search-engine's Issues

Ignored common words are still used in query

When I use words like "of" and "the", it is displayed on the result page that the words were ignored in the query. But they are not. Here is a json dump of the search for "president of the united states" (without the quotes):

{
.......
"hits":2006856,
"moreResultsFollow":1,
"queryInfo":{
"fullQuery":"president of the united states",
"queryLanguageAbbr":"en",
"queryLanguage":"English",
"ignoredWords":"of the",
"queryNumTermsTotal":14,
"queryNumTermsUsed":14,
"queryWasTruncated":0,
"terms":[
{
"termNum":0,
"termStr":"president of",
"termFreq":100534420,
"termHash48":44956827856246,
"termHash64":14011306347530890614,
"prefixHash64":0
},
{
"termNum":1,
"termStr":"of the",
"termFreq":9737859040,
"termHash48":123155680403364,
"termHash64":7072744895489056676,
"prefixHash64":0
},
{
"termNum":2,
"termStr":"the united",
"termFreq":294197960,
"termHash48":91399911757557,
"termHash64":1422947407184123637,
"prefixHash64":0
},
{
"termNum":3,
"termStr":"united states",
"termFreq":710222520,
"termHash48":191581014902229,
"termHash64":6822863541504493013,
"prefixHash64":0
},
{
"termNum":4,
"termStr":"president",
"termFreq":430031140,
"termHash48":91995424033508,
"termHash64":4221372221153741540,
"prefixHash64":0
},
{
"termNum":5,
"termStr":"of",
"termFreq":45728419800,
"termHash48":39771817336989,
"termHash64":16181754707936239773,
"prefixHash64":0
},
{
"termNum":6,
"termStr":"the",
"termFreq":76506859100,
"termHash48":190173198946691,
"termHash64":297427748605399427,
"prefixHash64":0
},
{
"termNum":7,
"termStr":"united",
"termFreq":1192870560,
"termHash48":221756994843023,
"termHash64":2744884254900449679,
"prefixHash64":0
},
{
"termNum":8,
"termStr":"states",
"termFreq":987782600,
"termHash48":278184483105896,
"termHash64":567450262555077736,
"prefixHash64":0
},
{
"termNum":9,
"termStr":"pres",
"termLang":"en",
"synonymOf":"president",
"termFreq":40200020,
"termHash48":35130463599487,
"termHash64":1918850046700141439,
"prefixHash64":0
},
{
"termNum":10,
"termStr":"theus",
"termLang":"en",
"synonymOf":"united states",
"termFreq":14713580,
"termHash48":75409745269453,
"termHash64":5073661864954843853,
"prefixHash64":0
},
{
"termNum":11,
"termStr":"state",
"termLang":"en",
"synonymOf":"states",
"termFreq":1789026260,
"termHash48":88693902074019,
"termHash64":10442247379913990307,
"prefixHash64":0
},
{
"termNum":12,
"termStr":"stating",
"termLang":"en",
"synonymOf":"states",
"termFreq":45197320,
"termHash48":215859612922028,
"termHash64":9107057256109486252,
"prefixHash64":0
},
{
"termNum":13,
"termStr":"stated",
"termLang":"en",
"synonymOf":"states",
"termFreq":147076020,
"termHash48":9772481073821,
"termHash64":9482620262886363805,
"prefixHash64":0
}
]
},

about the spider crawls URL

Tested on one website, the crawl speed is good. But many websites regard frequent visiting as kinda of attack and ban the IP address for a couple hours or even days. My question is if Gigablast has some certain rules to make this not happen?

Cache and user administration

Hi again,

is there any chance to disable caching of parsed webpages? I think that this could save me a lot of disk space.

My second question: Is there any user administration? As far as I can see everybody can query my version of Gigablast. Since it is accessable through the internet some hackers etc. could perform a DDoS attack oder something like that. A login and password needed to search via the API would be helpful...

Thanks

streambuf.h:59:21: error: ‘_G_wchar_t’ does not name a type

types.h:954:21: warning: narrowing conversion of ‘4294967295u’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
In file included from ./iostream.h:31:0,
from ./plotter.h:61,
from Stats.cpp:6:
./streambuf.h:59:21: error: ‘_G_wchar_t’ does not name a type
#define _IO_wchar_t _G_wchar_t
^
./streambuf.h:91:5: note: in expansion of macro ‘_IO_wchar_t’
_IO_wchar_t _fill;
^
./streambuf.h:59:21: error: ‘_G_wchar_t’ does not name a type
#define _IO_wchar_t _G_wchar_t
^
./streambuf.h:183:5: note: in expansion of macro ‘_IO_wchar_t’
_IO_wchar_t fill() const { return _fill; }
^
./streambuf.h:59:21: error: ‘_G_wchar_t’ does not name a type
#define _IO_wchar_t _G_wchar_t
^
./streambuf.h:184:5: note: in expansion of macro ‘_IO_wchar_t’
IO_wchar_t fill(IO_wchar_t newf)
^
./streambuf.h: In member function ‘void ios::init(streambuf
, ostream
)’:
./streambuf.h:471:40: error: ‘_fill’ was not declared in this scope
_strbuf=sb; _tie = tie_to; _width=0; _fill=' ';
^
make: *** [Stats.o] Error 1

Please help.

Add url do not follow links

I downloaded gigablast and followed the instructions but when I add an url it doesn't follow links and only index the url entered. I've tried several combinations of parameters but none worked.

Is this the expected behavior or I am missing something here ?

Thanks in advance for any help !

GB shutting down very frequently after throwing segment fault

Hi,

Need urgent help. GB is shutting down very frequently after throwing segment fault. Could anyone help me figure out what's going on. Following is the error dump:

1438312553066 000 loop: sigbadhandler. disabling handler from recall.
1438312553066 000 gb: seg fault. printing stack trace. use 'addr2line -e gb' to decode the hex below.
1438312553085 000 addr2line -e gb 0x5537e6
1438312553085 000 addr2line -e gb 0x55452f
1438312553085 000 addr2line -e gb 0x6ef240
1438312553085 000 addr2line -e gb 0x551cd6
1438312553085 000 addr2line -e gb 0x496d04

Feature Request: Approximate max. RAM Allocation Limit

It would be awesome to have people install their own private search engines at book shelves in the form of a Raspberry Pi. People might have different Pi-s for different topic domains, different site collections and then just choose the link to the proper Pi from their company intranet HTML-page. By serving the search engine as a Tor Hidden Service, people can make their Pis available online even from home, from behind various fire-walls, link their home Pi to their Facebook page or Twitter tweats, at least for those, who use the Tor Browser for browsing the web. That really opens up possibilities not just from privacy and security point of view, but from simple feature's point of view. It can take the content creation to a totally new level and not just for the politically minded, but also for the "innocent and manipulatable".

Small companies might add the sites of their clients and potential clients to a custom instance of the Gigablast/Raspberry_Pi and if that one, single, Pi becomes too small, there's always the Raspberry Pi clusters, which can be increased as needed. The Pi is superior to an ordinary PC due to its smaller energy requirements, smaller electricity bills, specially for applications that are not in use most of the time, that just stand there and wait for the queries, not to mention the ability to invest to computing equipment gradually.

Question : "Faceted" Search... regarding categorization of information

Is there a way I can categorize the information either through specifying in the crawler (/indexer) or search output is displayed with "categories or subcategories?"

e.g. if i searched for ... "Stalingrad", it'll be categorized under "Geography" etc.
e.g. also able to have some comparison with dmoz.org directory?

otherwise how can i confine the information to it's own search repository by specifying the links to crawl?

i would like to have vertical niche search engine that.. when i specify category of "Business", it will only return business related websites... or "Finance" or "Sports" etc... not within the search keywords but by faceting the categories...

How would you advise me going about doing this?

gigablast display search result 0

Hello....
i have set 3 server and 3 mirror server cluster.
now when i am search some words like "Mendipathar College" so it displayed result 0.
So what settings i need to change.
help me....

Thanks
Dipen Patel

pdftohtml causing Segfault

System: Centos 7 linux. Appears to be a common problem with pdftohtml in general. May be caused when multiple pages are being translated to html from a pdf document. Found numerous forums mentioning this on several unrelated sites.

Inconsistent document count

A search for "to" says:
Results 1 to 10 of exactly 489,130,275 from an index of about 928,603,725 pages

A search for "to to to" says:
Results 1 to 10 of exactly 489,142,590 from an index of about 928,603,725 pages

Notice the difference in the "exactly" number. Why is there a difference? The two queries should be the same because duplicated words are ignored. If duplicated words are in effect not ignored (istr that there is a bug on that) then why is there a small difference? I would have expected either no difference or a factor of 3.Not 12315.

Mem.h:213:6: error: from previous declaration ‘void operator delete(void*) throw ()

types.h:954:30: warning: narrowing conversion of ‘255’ from ‘int’ to ‘char’ inside { } is ill-formed in C++11 [-Wnarrowing]
Mem.cpp: In function ‘void operator delete(void_)’:
Mem.cpp:166:34: error: declaration of ‘void operator delete(void_)’ has a different exception specifier
void operator delete ( void ptr ) {
^
In file included from Mem.cpp:3:0:
Mem.h:213:6: error: from previous declaration ‘void operator delete(void
) throw ()’
void operator delete ( void p ) ;
^
make: *
* [Mem.o] Error 1

Please help.

Failure to Compile on Raspberry Pi Raspbian

I removed the string "-m32" from the Makefile, but I guess that the -mtune=native probably compensates it.
Raspbian is essentially a Deiban Linux for Raspberry Pi.

g++ -D_REENTRANT_ -D_CHECK_FORMAT_STRING_ -I. -mtune=native -ftree-vectorize -g -Wall -pipe -fno-stack-protector -Wno-write-strings -Wstrict-aliasing=0 -Wno-uninitialized -DPTHREADS -Wno-unused-but-set-variable  -O2 -c Loop.cpp 
g++ -D_REENTRANT_ -D_CHECK_FORMAT_STRING_ -I. -mtune=native -ftree-vectorize -g -Wall -pipe -fno-stack-protector -Wno-write-strings -Wstrict-aliasing=0 -Wno-uninitialized -DPTHREADS -Wno-unused-but-set-variable  -c Log.cpp 
Log.cpp: In member function 'bool Log::logR(int64_t, int32_t, char*, bool, bool)':
Log.cpp:246:45: error: no matching function for call to 'Log::logLater(int64_t&, int32_t&, char*&, NULL)'
   return logLater ( now , type , msg , NULL );
                                             ^
Log.cpp:246:45: note: candidate is:
In file included from gb-include.h:69:0,
                 from Log.cpp:1:
Log.h:139:7: note: bool Log::logLater(int64_t, int32_t, char*, va_list)
  bool logLater ( int64_t now , int32_t type , char *formatString , 
       ^
Log.h:139:7: note:   no known conversion for argument 4 from 'int' to 'va_list {aka __va_list}'
Log.cpp: In member function 'bool Log::logLater(int64_t, int32_t, char*, va_list)':
Log.cpp:478:22: error: invalid cast from type 'va_list {aka __va_list}' to type 'char*'
  char *pap = (char *)ap;
                      ^
Log.cpp:527:33: error: invalid cast from type 'va_list {aka __va_list}' to type 'char*'
  int32_t apsize = pap - (char *)ap;
                                 ^
In file included from Log.cpp:1:0:
Log.cpp:544:31: error: invalid cast from type 'va_list {aka __va_list}' to type 'char*'
  memcpy_ass ( s_ptr , (char *)ap , apsize );
                               ^
gb-include.h:21:37: note: in definition of macro 'memcpy_ass'
 #define memcpy_ass(xx,yy,zz) {bcopy(yy,xx,zz); }
                                     ^
Makefile:603: recipe for target 'Log.o' failed
make[1]: *** [Log.o] Error 1
make[1]: Leaving directory '/opt/mmmv/storage_1/mmmv_gb'
Makefile:224: recipe for target 'gb32' failed
make: *** [gb32] Error 2

real    23m49.116s
user    22m34.020s
sys 0m36.250s
pi@raspberrypi /opt/mmmv/storage_1/mmmv_gb $ sync
pi@raspberrypi /opt/mmmv/storage_1/mmmv_gb $ uname -a
Linux raspberrypi 4.1.7+ #817 PREEMPT Sat Sep 19 15:25:36 BST 2015 armv6l GNU/Linux
pi@raspberrypi /opt/mmmv/storage_1/mmmv_gb $ date
Tue Nov 24 14:30:00 UTC 2015
pi@raspberrypi /opt/mmmv/storage_1/mmmv_gb $ echo $CFLAGS 
-mtune=native -ftree-vectorize
pi@raspberrypi /opt/mmmv/storage_1/mmmv_gb $ echo $CXXFLAGS 
-mtune=native -ftree-vectorize
pi@raspberrypi /opt/mmmv/storage_1/mmmv_gb $

Initialization of Compulsory Files Fails

Copy/Paste of the console session:

ts2@s8lm1:~/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master$ ls
Abbreviations.cpp           Images.h                Msg5.o                  RequestTable.h
Abbreviations.h             Images.o                Msg6b.cpp               RequestTable.o
Abbreviations.o             Indexdb.cpp             Msg6b.h                 rescue.cpp
Accessdb.cpp                Indexdb.h               Msg8b.cpp               Revdb.cpp
Accessdb.h                  Indexdb.o               Msg8b.h                 Revdb.h
Address.cpp                 IndexList.cpp           Msg8b.o                 Revdb.o
Address.h                   IndexList.h             Msg9b.cpp               rmbots.cpp
Address.o                   IndexList.o             Msg9b.h                 S99gb
addtest.cpp                 IndexReadInfo.cpp       Msg9b.o                 SafeBuf.cpp
Ads.cpp                     IndexReadInfo.h         Msgaa.cpp               SafeBuf.h
Ads.h                       IndexReadInfo.o         Msgaa.h                 SafeBuf.o
AdultBit.cpp                IndexTable2.cpp         MsgC.cpp                SafeList.h
AdultBit.h                  IndexTable2.h           MsgC.h                  Sanity.h
animate.cpp                 IndexTable.cpp          MsgC.o                  Scores.cpp
antiword                    IndexTable.h            Msge0.cpp               Scores.h
antiword-dir                init.gb.conf            Msge0.h                 Scraper.cpp
AutoBan.cpp                 injectme3               Msge0.o                 Scraper.h
AutoBan.h                   injectmedemo            Msge1.cpp               Scraper.o
AutoBan.o                   injector.cpp            Msge1.h                 SearchInput.cpp
badcattable.dat             iostream.h              Msge1.o                 SearchInput.h
BigFile.cpp                 ipconfig.cpp            Multicast.cpp           SearchInput.o
BigFile.h                   ip.cpp                  Multicast.h             Sections.cpp
BigFile.o                   ip.h                    Multicast.o             Sections.h
Bits.cpp                    ip.o                    mysynonyms.txt          Sections.o
Bits.h                      Iso8859.cpp             numwords.cpp            seektest.cpp
Bits.o                      Iso8859.h               openssl                 seo.h
blaster2.cpp                Iso8859.o               PageAddColl.cpp         SiteGetter.cpp
Blaster.cpp                 jointest.cpp            PageAddColl.o           SiteGetter.h
Blaster.h                   jpegtopnm               PageAddUrl.cpp          SiteGetter.o
Blaster.o                   Json.cpp                PageAddUrl.o            sitelinks.txt
bmptopnm                    Json.h                  PageBasic.cpp           sleepandlog.cpp
Cachedb.cpp                 Json.o                  PageBasic.o             sort.cpp
Cachedb.h                   keepalive.cpp           PageCatdb.cpp           sort.h
Cachedb.o                   Lang.cpp                PageCatdb.o             sort.o
camsort.cpp                 Lang.h                  PageCrawlBot.cpp        Speller.cpp
catcountry.dat              LangList.cpp            PageCrawlBot.h          Speller.h
Catdb.cpp                   LangList.h              PageCrawlBot.o          Speller.o
Catdb.h                     LangList.o              PageDirectory.cpp       Spider.cpp
Catdb.o                     Lang.o                  PageDirectory.o         Spider.h
Categories.cpp              Language.cpp            PageEvents.cpp          Spider.o
Categories.h                Language.h              PageGet.cpp             SpiderProxy.cpp
Categories.o                LanguageIdentifier.cpp  PageGet.o               SpiderProxy.h
CatRec.cpp                  LanguageIdentifier.h    PageHosts.cpp           SpiderProxy.o
CatRec.h                    LanguageIdentifier.o    PageHosts.o             Stats.cpp
CatRec.o                    Language.o              PageIndexdb.cpp         Statsdb.cpp
character-sets              LanguagePages.cpp       PageInject.cpp          Statsdb.h
check_unicode.cpp           LanguagePages.h         PageInject.h            Statsdb.o
Clusterdb.cpp               LanguagePages.o         PageInject.o            Stats.h
Clusterdb.h                 libc.a                  PageLogView.cpp         Stats.o
Clusterdb.o                 libcrypto.a             PageLogView.o           StopWords.cpp
Collectiondb.cpp            libgcc.a                PageNetTest.cpp         StopWords.h
Collectiondb.h              libiconv64.a            PageNetTest.h           StopWords.o
Collectiondb.o              libiconv.a              PageOverview.cpp        streambuf.h
Conf.cpp                    libiconv.la             PageParser.cpp          Strings.cpp
Conf.h                      libjpeg.so.62           PageParser.h            Strings.h
Conf.o                      libm.a                  PageParser.o            Summary.cpp
control.deb                 libnetpbm.so.10         PagePerf.cpp            Summary.h
convert.cpp                 libpng12.so.0           PagePerf.o              Summary.o
copyright.head              libpthread.a            PageReindex.cpp         superMergeTest.cpp
copyright.tail              libssl.a                PageReindex.h           supported_charsets.cpp
CountryCode.cpp             libstdc++.a             PageReindex.o           supported_charsets.txt
CountryCode.h               libtiff.so.4            PageResults.cpp         Syncdb.cpp
CountryCode.o               libz64.a                PageResults.h           Syncdb.h
create_ucd_tables.cpp       libz.a                  PageResults.o           Syncdb.o
DailyMerge.cpp              libz.so.1               PageRoot.cpp            Synonyms.cpp
DailyMerge.h                LICENSE                 PageRoot.o              Synonyms.h
DailyMerge.o                Linkdb.cpp              Pages.cpp               Synonyms.o
DataFeed.cpp                Linkdb.h                Pages.h                 Tagdb.cpp
DataFeed.h                  Linkdb.o                Pages.o                 Tagdb.h
Datedb.cpp                  LinkedList.h            PageSockets.cpp         Tagdb.o
Datedb.h                    linkspam.cpp            PageSockets.o           TcpServer.cpp
Datedb.o                    linkspam.h              PageSpam.cpp            TcpServer.h
Dates.cpp                   linkspam.o              PageSpam.o              TcpServer.o
Dates.h                     Log.cpp                 PageStats.cpp           TcpSocket.h
Dates.o                     Log.h                   PageStatsdb.cpp         test2.cpp
diffbot-widget              Log.o                   PageStatsdb.o           test_convert.cpp
Diff.cpp                    Loop.cpp                PageStats.o             Test.cpp
Diff.h                      Loop.h                  PageSubmit.cpp          testfloats.cpp
Dir.cpp                     Loop.o                  PageThesaurus.cpp       Test.h
Dir.h                       looptest.cpp            PageThreads.cpp         test_hash.cpp
Dir.o                       main.cpp                PageThreads.o           test_norm.cpp
DiskPageCache.cpp           main.o                  PageTitledb.cpp         Test.o
DiskPageCache.h             Make.depend             PageTitledb.o           test_parser2.cpp
DiskPageCache.o             Makefile                PageTurk.cpp            test_parser.cpp
dlstubs.c                   malloc.c                PageTurk.h              test_unicode.cpp
dlstubs.o                   matches2.cpp            Parms.cpp               Tfndb.cpp
dmozparse.cpp               matches2.h              Parms.h                 Tfndb.h
Dns.cpp                     matches2.o              Parms.o                 Thesaurus.cpp
Dns.h                       Matches.cpp             parse_iana_charsets.pl  Thesaurus.h
Dns.o                       Matches.h               pdftohtml               Threads.cpp
DnsProtocol.h               Matches.o               Phrases.cpp             Threads.h
dnstest.cpp                 membustest.cpp          Phrases.h               Threads.o
Domains.cpp                 Mem.cpp                 Phrases.o               threadtest.cpp
Domains.h                   Mem.h                   PingServer.cpp          thunder.cpp
Domains.o                   Mem.o                   PingServer.h            tifftopnm
dumpcore.cpp                MemPool.cpp             PingServer.o            Timedb.cpp
Entities.cpp                MemPool.h               Placedb.cpp             Timedb.h
Entities.h                  MemPoolTree.cpp         Placedb.h               Timer.h
Entities.o                  MemPoolTree.h           Placedb.o               Title.cpp
Errno.cpp                   memtest.cpp             pngtopnm                Titledb.cpp
Errno.h                     mergetest.cpp           pnmscale                Titledb.h
Errno.o                     MetaContainer.cpp       Pops.cpp                Titledb.o
errnotest.cpp               MetaContainer.h         Pops.h                  Title.h
Events.h                    Mime.cpp                Pops.o                  Title.o
Facebook.cpp                Mime.h                  porter.cpp              TopTree.cpp
Facebook.h                  Mime.o                  Pos.cpp                 TopTree.h
fastIndexTable.cpp          mixfile.cpp             Posdb.cpp               TopTree.o
fctypes.cpp                 mmseg.h                 Posdb.h                 treetest.cpp
fctypes.h                   monitor.cpp             Posdb.o                 TuringTest.cpp
fctypes.o                   Monitordb.cpp           Pos.h                   TuringTest.h
File.cpp                    Monitordb.h             Pos.o                   TuringTest.o
File.h                      Monitordb.o             postalCodes.txt         Turkdb.cpp
File.o                      Msg0.cpp                PostQueryRerank.cpp     types.h
filterquerylogs.cpp         Msg0.h                  PostQueryRerank.h       ucdata
Flags.cpp                   Msg0.o                  PostQueryRerank.o       UCNormalizer.cpp
Flags.h                     Msg13.cpp               ppmtojpeg               UCNormalizer.h
gb                          Msg13.h                 Process.cpp             UCNormalizer.o
gb-1.0.spec                 Msg13.o                 Process.h               UCPropTable.cpp
gb.conf                     Msg17.cpp               Process.o               UCPropTable.h
gb.conf.saving              Msg17.h                 Profiler.cpp            UCPropTable.o
gb.deb.rules                Msg17.o                 Profiler.h              UCWordIterator.cpp
gbfilter.cpp                Msg1.cpp                Profiler.o              UCWordIterator.h
gb-include.h                Msg1f.cpp               Proxy.cpp               UdpProtocol.h
gb.pem                      Msg1f.h                 Proxy.h                 UdpServer.cpp
gbtitletest.cpp             Msg1f.o                 Proxy.o                 UdpServer.h
geneaology.cpp              Msg1.h                  pstotext                UdpServer.o
generateSuperMergeCode.cpp  Msg1.o                  QAClient.cpp            UdpSlot.cpp
GeoIP.c                     Msg20.cpp               QAClient.h              UdpSlot.h
GeoIPCity.c                 Msg20.h                 qa.cpp                  UdpSlot.o
GeoIPCity.h                 Msg20.o                 qa.o                    udptest.cpp
GeoIPCity.o                 Msg22.cpp               quarantine.cpp          Unicode.cpp
GeoIP.h                     Msg22.h                 Query.cpp               Unicode.h
GeoIP_internal.h            Msg22.o                 Query.h                 Unicode.o
GeoIP.o                     Msg24.cpp               Query.o                 UnicodeProperties.cpp
geo_ip_table.cpp            Msg28.cpp               RdbBase.cpp             UnicodeProperties.h
geo_ip_table.h              Msg28.h                 RdbBase.h               UnicodeProperties.o
geo_ip_table.o              Msg2a.cpp               RdbBase.o               unifiedDict.txt
getsample.cpp               Msg2a.h                 RdbBuckets.cpp          uniq2.cpp
giftopnm                    Msg2a.o                 RdbBuckets.h            Url.cpp
gigablast.cbp               Msg2b.cpp               RdbBuckets.o            Url.h
gigablast.layout            Msg2b.h                 RdbCache.cpp            urlinfo.cpp
hash.cpp                    Msg2.cpp                RdbCache.h              Url.o
hash.h                      Msg2.h                  RdbCache.o              Users.cpp
hash.o                      Msg2.o                  Rdb.cpp                 Users.h
HashTable.cpp               Msg30.cpp               RdbDump.cpp             Users.o
HashTable.h                 Msg30.h                 RdbDump.h               ValidPointer.cpp
HashTable.o                 Msg35.cpp               RdbDump.o               ValidPointer.h
HashTableT.cpp              Msg35.h                 Rdb.h                   Vector.cpp
HashTableT.h                Msg35.o                 RdbList.cpp             Vector.h
HashTableT.o                Msg36.cpp               RdbList.h               Version.cpp
HashTableX.cpp              Msg36.h                 RdbList.o               Version.h
HashTableX.h                Msg37.cpp               RdbMap.cpp              Version.o
HashTableX.o                Msg37.h                 RdbMap.h                Weights.cpp
hashtest2.cpp               Msg39.cpp               RdbMap.o                Weights.h
hashtest3.cpp               Msg39.h                 RdbMem.cpp              Wiki.cpp
hashtest.cpp                Msg39.o                 RdbMem.h                Wiki.h
Highlight.cpp               Msg3a.cpp               RdbMem.o                Wiki.o
Highlight.h                 Msg3a.h                 RdbMerge.cpp            wikititles.txt.part1
Highlight.o                 Msg3a.o                 RdbMerge.h              wikititles.txt.part2
Hostdb.cpp                  Msg3.cpp                RdbMerge.o              wiktionary-buf.txt
Hostdb.h                    Msg3e.cpp               Rdb.o                   Wiktionary.cpp
Hostdb.o                    Msg3e.h                 RdbScan.cpp             Wiktionary.h
hosts.conf                  Msg3.h                  RdbScan.h               wiktionary-lang.txt
hosts.cpp                   Msg3.o                  RdbScan.o               Wiktionary.o
html                        Msg40Cache.cpp          rdbtest2.cpp            wiktionary-syns.dat
HttpMime.cpp                Msg40Cache.h            rdbtest.cpp             Words.cpp
HttpMime.h                  Msg40.cpp               RdbTree.cpp             Words.h
HttpMime.o                  Msg40.h                 RdbTree.h               Words.o
HttpRequest.cpp             Msg40.o                 RdbTree.o               Xml.cpp
HttpRequest.h               Msg42.cpp               README.md               XmlDoc.cpp
HttpRequest.o               Msg42.h                 readRec.cpp             XmlDoc.h
HttpServer.cpp              Msg4.cpp                Rebalance.cpp           XmlDoc.o
HttpServer.h                Msg4.h                  Rebalance.h             Xml.h
HttpServer.o                Msg4.o                  Rebalance.o             XmlNode.cpp
iana_charset.cpp            Msg51.cpp               reindex2.cpp            XmlNode.h
iana_charset.h              Msg51.h                 Repair.cpp              XmlNode.o
iana_charset.o              Msg51.o                 Repair.h                Xml.o
iconv.h                     Msg5.cpp                Repair.o                zconf.h
Images.cpp                  Msg5.h                  RequestTable.cpp        zlib.h
ts2@s8lm1:~/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master$ date
Tue Sep 29 19:35:41 EEST 2015
ts2@s8lm1:~/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master$ uname -a
Linux s8lm1 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04) x86_64 GNU/Linux
ts2@s8lm1:~/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master$ gb
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/catcountry.dat length of 128 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/catcountry.dat file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/badcattable.dat length of 129 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/badcattable.dat file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/cd_data.dat length of 132 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/cd_data.dat file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/cdmap.dat length of 130 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/cdmap.dat file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/combiningclass.dat length of 139 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/combiningclass.dat file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/kd_data.dat length of 132 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/kd_data.dat file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/kdmap.dat length of 130 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/kdmap.dat file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/lowermap.dat length of 133 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/lowermap.dat file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/properties.dat length of 135 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/properties.dat file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/scripts.dat length of 132 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/scripts.dat file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/uppermap.dat length of 133 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/ucdata/uppermap.dat file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-1.txt length of 137 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-1.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-10.txt length of 138 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-10.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-13.txt length of 138 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-13.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-14.txt length of 138 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-14.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-15.txt length of 138 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-15.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-16.txt length of 138 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-16.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-2.txt length of 137 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-2.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-3.txt length of 137 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-3.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-4.txt length of 137 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-4.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-5.txt length of 137 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-5.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-6.txt length of 137 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-6.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-7.txt length of 137 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-7.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-8.txt length of 137 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-8.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-9.txt length of 137 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/8859-9.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/Default length of 134 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/Default file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/Example length of 134 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/Example file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/MacRoman.txt length of 139 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/MacRoman.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/UTF-8.txt length of 136 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/UTF-8.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/Unicode length of 134 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/Unicode file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/cp1250.txt length of 137 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/cp1250.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/cp1251.txt length of 137 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/cp1251.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/cp1252.txt length of 137 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/cp1252.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/cp437.txt length of 136 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/cp437.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/cp850.txt length of 136 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/cp850.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/cp852.txt length of 136 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/cp852.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/fontnames length of 136 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/fontnames file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/fontnames.russian length of 144 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/fontnames.russian file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/koi8-r.txt length of 137 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/koi8-r.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/koi8-u.txt length of 137 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/koi8-u.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/roman.txt length of 136 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/antiword-dir/roman.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/libjpeg.so.62 length of 127 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/libjpeg.so.62 file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/libnetpbm.so.10 length of 129 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/libnetpbm.so.10 file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/libpng12.so.0 length of 127 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/libpng12.so.0 file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/mysynonyms.txt length of 128 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/mysynonyms.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/wikititles.txt.part1 length of 134 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/wikititles.txt.part1 file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/wikititles.txt.part2 length of 134 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/wikititles.txt.part2 file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/wiktionary-buf.txt length of 132 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/wiktionary-buf.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/wiktionary-lang.txt length of 133 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/wiktionary-lang.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/wiktionary-syns.dat length of 133 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/wiktionary-syns.dat file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/sitelinks.txt length of 127 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/sitelinks.txt file missing.
disk: Provdied filename /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/unifiedDict.txt length of 129 is bigger than 127.
db: /opt/2dot7TiB_k8vaketas/ts2/mittevarundatav/_home/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master/unifiedDict.txt file missing.
db: Missing files. See above. Exiting.
Failed to start gb. Exiting.
ts2@s8lm1:~/m_local/bin_p/Gigablast/kompil/open-source-search-engine-master$

Merging search results from different collections

Again not an issue, just another query.

Does anyone know how to merge search results from different collections. I have created multiple collections to have better control over data, but I need to merge the search results from different collections.

Thanks

License?

Hi Matt:
this is piece of code is brilliant... but short of a license, it cannot be used?
What is the open source license for your code?

Github has set up this site to help: http://choosealicense.com/

My 2 cents: I like it simple and permissive.

Cheers

Philippe

error2 getting real firstip of 0

I'm getting the following error when trying to spider a web server on my intranet. I have successfully spidered a similar machine at a different physical location. I'm wondering if the intranet at this location has something in the network configuration that is not playing nice with GB.

Short version: GB is working at one site but not another.

Here is the appropriate section of log000
1416528426221 000 got parm update request. size=17.
1416528426221 000 updating parm "spidering enabled" (bcse[-1]) (collnum=43) from "0" -> "1"
1416528426221 000 sending parm update reply
1416528426312 000 got a reviving url for coll main (43) to crawl http://myhost.mydomain.com/
1416528426388 000 Nameserver (XXX.XX.XX.X) got internal IP. Assuming domain name does not exist.
1416528426389 000 error2 getting real firstip of 0 for http://myhost.mydomain.com/. Not adding new spider req
1416528426389 000 coll=main collnum=43 ip=0.0.0.0 firstip=0.0.0.0 fakesreqfirstip=158.213.162.65 spidered=Nov-21-2014(00:07:06)(1416528426) scheduledtime=Nov-21-2014(00:06:59)(1416528419) firsttime=1 uh48=68325440934686 parentlang=00(xx) dh32=0151897686 sh32=0000000001 addlistsize=00000 thumbnail=none contentinjected=0 urlinjected=0 isaddurl=1 oldurlfilternum=5 oldpriority=85 errcnt=0 url=http://myhost.mydomain.com/ : Fake firstIp
1416528426389 000 added reply k.n1=0x41a2d59e3e2441a2 k.n0=0xd31e000000000001 uh48=68325440934686 parentDocId=0 firstip=158.213.162.65 percentChangedPerDay=0.00% spideredTime=Nov 21 00:07:06 2014 UTC(1416528426) siteNumInlinks=0 pubDate=(0) ch32=0 crawldelayms=-1ms httpStatus=0 langId=Unknown(0) errCount=1 errCode=Fake firstIp(32911)
1416528426673 000 hit spider queue rebuild timeout for main (43)
1416528427179 000 rebuild complete for main. Added 1 recs to waiting tree, scanned 191 bytes of spiderdb.

Doesn't recognize new gTLD's

Domains.cpp has a hardcoded list of TLDs. The list is incomplete.
The new gTLDs includes .wiki .football .bar etc. So maintaining a hardcoded list seems a dead-end (imho).

Duplicate search terms not removed

Shouldn't the query have duplicate search terms removed?

See this example:

1445594081385 000 query: set called = to to to to to to to to to
1445594081385 000 query: got query to to to to to to to to to (len=26)
1445594081385 000 query: set called = to to to to to to to to to
1445594081385 000 query: Getting search results for q=to to to to to to to to to
1445594081385 000 db: cache is disabled until we cache scoring tables
1445594081385 000 query: [125e55cc] Not found in cache. Lookup took 0 ms.
1445594081385 000 query: msg40: [125e55cc] Getting up to 30 (docToGet=36) docids
1445594081385 000 query: term #0 "to to" (12937974145121)
1445594081385 000 query: term #1 "to" (166537280794640)
1445594081385 000 query: term #2 "to" (166537280794640)
1445594081385 000 query: term #3 "to" (166537280794640)
1445594081385 000 query: term #4 "to" (166537280794640)
1445594081385 000 query: term #5 "to" (166537280794640)
1445594081385 000 query: term #6 "to" (166537280794640)
1445594081385 000 query: term #7 "to" (166537280794640)
1445594081385 000 query: term #8 "to" (166537280794640)
1445594081385 000 query: term #9 "to" (166537280794640)
1445594081385 000 query: msg3a: [125e5a68] getting termFreqs.
1445594081387 000 cache: warning. cache for tfcache can have 294117 ptrs but buf mem can only hold 36766 objects
1445594081388 000 query: term #0 "to to" termid=12937974145121 termFreq=40539100 termFreqWeight=0.536
1445594081388 000 query: term #1 "to" termid=166537280794640 termFreq=50615601540 termFreqWeight=1.000
1445594081388 000 query: term #2 "to" termid=166537280794640 termFreq=50615601540 termFreqWeight=1.000
1445594081388 000 query: term #3 "to" termid=166537280794640 termFreq=50615601540 termFreqWeight=1.000
1445594081388 000 query: term #4 "to" termid=166537280794640 termFreq=50615601540 termFreqWeight=1.000
1445594081388 000 query: term #5 "to" termid=166537280794640 termFreq=50615601540 termFreqWeight=1.000
1445594081388 000 query: term #6 "to" termid=166537280794640 termFreq=50615601540 termFreqWeight=1.000
1445594081388 000 query: term #7 "to" termid=166537280794640 termFreq=50615601540 termFreqWeight=1.000
1445594081388 000 query: term #8 "to" termid=166537280794640 termFreq=50615601540 termFreqWeight=1.000
1445594081388 000 query: term #9 "to" termid=166537280794640 termFreq=50615601540 termFreqWeight=1.000

Sometimes returns invalid JSON

When requesting search results in JSON format we sometimes see page summaries that contain characters that are not allowed in JSON, causing JSON parsers to fail because the document is invalid..

Example:
{
"title":"2014 Best Zam Pow! on Spotify",
"contentType":"html",
"sum":"ÿ³Qtñw� \fpUÈ(Éͱãââ² 2l òS*í¸��¸ 1� $Ìe£�6ÓF�b� `+Ð ...",
"url":"open.spotify.com/user/zacjohnson/playlist/1Cu97dhjfZUWn5cazcH3wb",
"hopCount":1,
"size":" 94k",
"sizeInBytes":95841,
"docId":169188079495,
"docScore":127725.328125,
"site":"open.spotify.com/user/zacjohnson/",
"spidered":1425141913,
"firstIndexedDateUTC":1425141913,
"contentHash32":1433108553,
"language":"English",
"langAbbr":"en"
}

syllable division in PDF files

Hello,

indexing PDF files does not correct syllable division at the end of a line.

For example:
"... by doing so, the author tells the reader a re-
markable story...."

This will be indexed as it is seen by the spider. Queries looking for "the reader a remarkable story" (e.g. without the syllable division) will fail. That phrase is simply not indexed.

Is there any chance, that syllable division is ignored / undone by the spiders?

Thanks a lot,

Zara

Indexes css files

This is an example dump from one of my spiderdbs:

offset=2995650 k=0x06bbcb5bd359a2a01fcffb999d515001 REQ uh48=232381933952975 recsize=142 parentDocId=265429100712 firstip=91.203.187.6 hostHash32=0xb6cc6794 domHash32=0x306d52b9 siteHash32=0xb6cc6794 siteNumInlinks=0 addedTime=Mar 3 11:33:58 2015 UTC(1425382438) parentFirstIp=91.203.187.14 parentHostHash32=0xa6f718a4 parentDomHash32=0xdf624a2 parentSiteHash32=0xa6f718a4 hopCount=1 ufn=-1 priority=-1 ISNEWOUTLINK PARENTHASADDRESS shardnum=88 url=http://static.v5.skyrock.net/css/common.css?en94Zsw requestage=4753434s hadReply=0 errcount=1 errcode=32825(Doc is a dup)

offset=18886075 k=0x2b1c5e408fea089ae039fbc9321c6401 REQ uh48=158235329486905 recsize=132 parentDocId=265828240946 firstip=64.94.28.43 hostHash32=0x8a8058ba domHash32=0xd9e1fd38 siteHash32=0x8a8058ba siteNumInlinks=0 addedTime=Apr 27 10:50:49 2015 UTC(1430131849) parentFirstIp=64.94.28.43 parentHostHash32=0x8a8058ba parentDomHash32=0xd9e1fd38 parentSiteHash32=0x8a8058ba hopCount=1 ufn=-1 priority=-1 SAMEDOM SAMEHOST SAMESITE WASPARENTINDEXED HASCONTACTINFO shardnum=88 url=http://sircloud.sirweb.org/css/styles.css requestage=5021s hadReply=0 errcount=0 errcode=0

As you can see, GB indexes CSS files, both if they have URL parameters or not.

I don't think it should?

Cannot allocate memory

I am getting the following errors when I try to start the engine. The server is a google compute vm with 15GB of ram so that shouldn't be an issue.

1419012010533 000 Gigablast Version: Sep 25 2014 22:41:10
1419012010533 000 Working directory is /var/gigablast/data0/
1419012010533 000 Using /var/gigablast/data0/hosts.conf
1419012010533 000 Process ID is 18117
1419012010534 000 Detected local ip 127.0.0.1
1419012010534 000 Detected local ip xxxxx
1419012010534 000 Running as host id #0
1419012010766 000 Loading /var/gigablast/data0/wiktionary-syns.dat
1419012010766 000 system malloc(92274696,wkt-synt) availShouldBe=3998707552: Cannot allocate memory (wkt-synt) (ooms suppressed since last log msg = 0)

Read/Edit data file

Again not an issue.

How can we read and edit data files. I am trying to find out way I can manipulate data in order to get rid of unwanted urls.

Thanks.

Master Crashing when using Spider

The new Master file, with the merges from ia and diffbot-testing, crash when using the spider to load pages into the index at relatively high speed. Configuration used was 4 node non-mirrored cluster on Centos 7 machine running 24 cores, 64gb memory, and engine directory on Samsung 850 Pro 2TB SSD drive. This does not happen with diffbot-testing on the same machine using the same configuration, etc. Diffbot-testing is solid.

Another language

When I am using engine on www.gigablast.com, it is not possible to receive the result for request in another language (Ukrainian). It gives mistake HTTP 500. Is it possible for me in my servers to develop the search engine to serch in another language (east European Ukrainian)?

No logging when log size reaches 2 gigabyte

Minor issue: When a log file reaches 2 gigabyte nothing further is written to it. System keeps running but without anything logged.

Suggestion: Check log size periodically and rotate log when needed.

Error no gb.conf file

Did all compile procedures from http://www.gigablast.com/admin.html#quickstart

After run ./gb 0 got this error:

1383184149750 000 WARN disk error open(/home/webdev/gb/gb.conf) : No such file or directory
1383184149750 000 WARN disk open: No such file or directory
1383184149750 000 WARN conf Could not open /home/webdev/gb/gb.conf: No such file or directory.

But the file exist there

Compile in Cygwin

I have a question that if it can be compiled in Cygwin. I noticed that you left some comments about it at the Makefile, but still not sure how to do it. Could you please give me some advices? Thank you.

Proxy settings

Hi,

I have some questions concerning setup :

  1. How to say to spider: "please only national website like *.de or *.co.uk" in site list ?
  2. What is best practice to import aprox 2 million deep links ?
  3. Proxy issue: I have set 4 nods and try to make proxy to collect results. But it doesn't work like described in hosts.conf.
    Could somebody send me the working example ?

My errors after start ./gb proxy load 0

admin: need working-dir for host in hosts.conf line 121 db: hostdb init failed. Failed to start gb. Exiting.

if I make :

proxy 6010 7010 8080 9010 46.4.442.222 /home-bildero/gigablast/data0/

then get following error:

admin: secondary host /home-bildero/gigablast/data0/ in hosts.conf not in /etc/hosts. Using secondary ethernet (eth1) ip of 46.4.442.222 gb: failed to ps auxww ntpd 2 db: ntpd not running on proxy

Thank you your help

Best

Roman

If spider supports javascript

Hi Matt,
I am just wondering if the embedded spider supports JavasScript? Many people choose Python to make spiders, do you think c++ mixture with python is a good choice? Thank you.

Dependence on GCC Extension that is not Supported by Clang

An excerpt from console:

XmlDoc.cpp:46419:17: error: variable length array of non-POD element type 'SafeBuf'
        SafeBuf hostBin[g_hostdb.m_numHosts];
                       ^
XmlDoc.cpp:50874:12: warning: 'this' pointer cannot be null in well-defined C++ code; pointer may be assumed to
      always convert to true [-Wundefined-bool-conversion]
                                              this, // xd holder (Msg25::m_xd)
                                              ^~~~
7 warnings and 1 error generated.
Makefile:603: recipe for target 'XmlDoc.o' failed
make: *** [XmlDoc.o] Error 

A citation from LLVM home page:

Variable-length arrays

GCC and C99 allow an array's size to be determined at run time. This extension is not permitted in standard C++. However, Clang supports such variable length arrays in very limited circumstances for compatibility with GNU C and C99 programs:

  • The element type of a variable length array must be a POD ("plain old data") type, which means that it cannot have any user-declared constructors or destructors, any base classes, or any members of non-POD type. All C types are POD types.
  • Variable length arrays cannot be used as the type of a non-type template parameter.

If your code uses variable length arrays in a manner that Clang doesn't support, there are several ways to fix your code:

  • replace the variable length array with a fixed-size array if you can determine a reasonable upper bound at compile time; sometimes this is as simple as changing int size = ...; to const int size = ...; (if the initializer is a compile-time constant);
  • use std::vector or some other suitable container type; or
  • allocate the array on the heap instead using new Type[] - just remember to delete[] it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.