Giter Site home page Giter Site logo

benibela / internettools Goto Github PK

View Code? Open in Web Editor NEW
117.0 21.0 33.0 24.91 MB

XPath/XQuery 3.1 interpreter for Pascal with compatibility modes for XPath 2.0/XQuery 1.0/3.0, custom and JSONiq extensions, pattern matching, XML/HTML/JSON parsers and classes for HTTP/S requests

Home Page: http://www.benibela.de/sources_en.html#internettools

Pascal 57.79% XSLT 0.03% Shell 0.01% VBScript 0.01% POV-Ray SDL 5.55% NASL 36.62%
xpath xquery pascal html xml json web interpreter library parser

internettools's People

Contributors

benibela avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

internettools's Issues

How to replace Node innerHTML and save it to new string?

Is it able to replace node innerHTML, like this:

for node in process(everydayHtmlString, '//table[@id="zhuye" or @id="fuye"]') do
  begin
    if node.toNode.getAttribute('id') = 'zhuye' then
    begin
         node.toNode.innerHTML := aStringContainHTML;
    end;
  end;

and what is the proper way to update everydayHtmlString variable to the latest?

Will not install using Package Manager in Lazarus 3.99

Was unable to add this as a package. In the process of installing the package, I got this error:
xquery__regex.pas(696,70) Error: Incompatible type for arg no. 2: Got "<procedure variable type of function(const PChar;const TFLRECaptures):AnsiString(0) of object;Register>", expected "<procedure variable type of function(const PChar;const Int64;const TFLRECaptures):AnsiString(0) of object;Register>"

I am using: Lazarus 3.99 (rev main_3_99-23-gf4e5dd4903) FPC 3.3.1 x86_64-linux-gtk2

declaration of TFLRESizeInt

Hello,

according to Google, you're the only one to use the type TFLRESizeInt, but I cannot see where it is declared. It's being used in xquery__regex.pas -- does this compile for you?

Invalid symbol in hex entity: xxx

Failed when trying to parse this page.

**** Retrieving (GET): http://www.nirsoft.net/utils/htmlastext.html ****
**** Processing: http://www.nirsoft.net/utils/htmlastext.html ****
An unhandled exception occurred at $004222B4:
Exception: Invalid symbol in hex entity: xxx;)

  $004222B4
  $00423297
  $004A2E21
  $004A2BAB
  $004A2A50
  $004A99DE
  $00513404
  $00435575
  $004321E8
  $00431408
  $00430ECE
  $0043B2A2

Removing Or enhance the EInternetException

Hello

I was working on a project using your library and I found about that there's no way of retrieving the response body of an 400+ messages.

I only got EInternetException exception with error code and which link and that's it.

The body of the request in any situation is important because sometimes the body contains some detailed error messages etc..

at this line
https://github.com/benibela/internettools/blob/master/internet/internetaccess.pas#L1097

      else begin
        message := IntToStr(transfer.HTTPResultCode) + ' ' + transfer.HTTPErrorDetails;
        if transfer.HTTPResultCode <= 0 then message := 'Internet Error: ' + message
        else message := 'Internet/HTTP Error: ' + message;
        raise EInternetException.Create(message + LineEnding + 'when talking to: '+url.combined, transfer.HTTPResultCode);
      end;  

so I think the right thing todo there is either
1- Add the data of the body to the EInternetException
2- Remove the Exception and let the user decide by the lastHTTPResultCode (this is what I do in my project)

In case of the full project of yours I think adding a body variable to EInternetException and assign the response body.

How to get query result to html instead of plain text?

If I using following code to extract title in td in the html content:

for td in process(FileContent, '//td') do
        WriteLn(td.toString);

It return plain text inside td, I wanna it return html code inside td instead of inner text

Like that:

td.toString should be:

<td><p>hello <b>this</b> is td1</p></td>

instead of:

hello this is td1

Compiling on FreeBSD

Hello,
Compiling as part of xidel. I'm getting this on FreeBSD 11.3:

-- Building...
Free Pascal Compiler version 3.2.0 [2020/06/14] for x86_64
Copyright (c) 1993-2020 by Florian Klaempfl and others
Target OS: FreeBSD for x86-64
Compiling xidel.pas
Compiling ./internettools/internet/multipagetemplate.pas
Compiling ./internettools/data/extendedhtmlparser.pas
Compiling ./internettools/data/simplehtmltreeparser.pas
Compiling ./internettools/data/simplehtmlparser.pas
Compiling ./internettools/data/htmlinformation.pas
Compiling ./internettools/data/simplehtmltreeparser.pas
Compiling ./internettools/data/xquery.pas
Compiling ./internettools/data/xquery__functions.pas
Compiling ./internettools/data/xquery.pas
Compiling ./internettools/data/xquery__parse.pas
Compiling ./internettools/data/bbutilsbeta.pas
bbutilsbeta.pas(140,5) Error: Record type expected
bbutilsbeta.pas(151,1) Fatal: There were 1 errors compiling module, stopping
Fatal: Compilation aborted

I know nothing about pascal else I'd investigate further.
Cheers,
Roger

EXQParsingException after updating to master

After updating from: last changeset: 4097:2e5994f9cf6d date: Tue Jul 07 19:51:37 2015 +0200
to up-to-day version of internet tools I encountered this error :

Exception class "EXQParsingException" with message "err:XPST0008: Unknown variable: $json"

this code works well in previous version :

ztazeno,scraperVstup,parsujNazev : String;
scraperVstup:='https://api.themoviedb.org/3/search/movie?api_key='+
                          unConstants.theMovidedbAPI +'&query='+
                          pomNazev+'&language='+aktualniJazyk;
parsujNazev:= '$json("results")() ! [.("title"), .("release_date")]'; 
ztazeno:= retrieve(scraperVstup); 
for v in process (ztazeno,parsujNazev) do          // exeption thrown
  ...

ztazeno:

{

    "page": 1,
    "results": [
        {
            "poster_path": "/jVaVTnwNrv6Iiapg9B9Qy7J1Yf2.jpg",
            "adult": false,
            "overview": "Po dlouhá léta se Caesarova vojska snaží dobýt Galskou vesnici, ale marně. Caesarovi vojáci začínají mít obavy, že bojují s Bohy a ty nejde porazit. Aby Caesar dokázal svou moc, rozhodne se s Galy uzavřít dohodu. Pokud zvládnou vykonat dvanáct úkolů, které jim uloží, stanou se svrchovanými vládci Říma. Pokud se jim jediný nepodaří, podrobí se celá vesnice Caesarovi. Naši kamarádi Asterix a Obelix se tak vydávají rozhodnout o osudu svých přátel.",
            "release_date": "1976-09-08",
            "genre_ids": [
                10751,
                16,
                35,
                12
            ],
            "id": 9385,
            "original_title": "Les 12 travaux d'Astérix",
            "original_language": "fr",
            "title": "12 úkolů pro Asterixe",
            "backdrop_path": "/cQ5PU1ZJpOoTBa4nyQxR9lyIUzv.jpg",
            "popularity": 2.063972,
            "vote_count": 88,
            "video": false,
            "vote_average": 7.05
        }
    ],
    "total_results": 1,
    "total_pages": 1

}

Thank you very much for any clarification.

Possible issue with enumerator ?

Hi,
thanks for this great library. 👏
I had something like small issue with finishing enumerator after proceeding the first item:

    i:=0;
    for  v in process(str,'for $pr in jn:parse-json(.)("results")()' +
                          'return [$pr("title"), $pr("release_date") ]') do
      begin
         pomString := (v as TXQValueJSONArray).seq.get(0).toString; 
         pomString3:= (v as TXQValueJSONArray).seq.get(1).toString;
         Memo1.Append((inttostr(i))+' '+(pomString) + ' :-) '+ pomString3);
         i:=i+1;
     end;

after splitting the code enumerator passed through all items:

    p, p3: IXQValue;
    i:=0;
    for  v in process(str,'for $pr in jn:parse-json(.)("results")()' +
                          'return [$pr("title"), $pr("release_date") ]') do
      begin
          p :=(v as TXQValueJSONArray).seq.get(0);
          p3:=(v as TXQValueJSONArray).seq.get(1);
          pomString3:= p3.toString;
          pomString:=p.toString;
          Memo1.Append((inttostr(i))+' '+(pomString) + ' :-) '+ pomString3);
          i:=i+1;
     end;

Configuration proxy with username and password?

Hi,

I'm trying to scrape webpages which works quite good with the internettools package, however I haven't found out how I can use the application behind a proxy which requires a username and password.

Currently I just create a TInternetConfig record like so:
internetConfig.useProxy := True;
internetConfig.proxyHTTPName := '192.168.56.10';
internetConfig.proxyHTTPPort := '80';

And next I tell the internetaccess package to use the above settings, like so:
internetaccess.defaultInternetConfiguration := internetConfig;

The TInterConfig however doesn't provide me with the options to specify an username or a password.

Is that possible at all or not, and if so can you tell me how to accomplish this?

Kind regards,
Rob

Unable to compile on linux

Compile fail with

xquery.namespaces.pas(139,13) Error: Incompatible type for arg no. 2: Got "Class Of INamespace", expected "TClass"

Linux Mint 19.1 x86_64
fpc 3.0.4
Lazarus 2.0.0

Thanks

Compile Internettool in lazarus

hi,

I'm trying to compile internettools in lazarus version 1.7 FPC 3.1.1 SVN: 52808 but got the following error

xquery__regex.pas(47,23) Fatal: Cannot find FLRE used by xquery__regex of package internettools.

i already download FLRE from https://github.com/benibela/flre/ but i don't know how to install it or use it in my lazarus windows

i'm new in lazarus so help me please thanks

Infinity memory consumption

I use latest Lazarus 1.8.2 with FPC 3.0.4 on Windows and create simple test application with the one button:

unit Unit1;

{$mode objfpc}{$H+}

interface

uses
  Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls;

type
  TForm1 = class(TForm)
    Button1: TButton;
    procedure Button1Click(Sender: TObject);
  end;

var
  Form1: TForm1;

implementation

{$R *.lfm}

uses
  simpleinternet;

procedure TForm1.Button1Click(Sender: TObject);
var
  S: string;
begin
  S := process('https://github.com/benibela/internettools', '//div[1]').toString;
end;

end.

After each button click app increases the memory consumption by about 1 Mb, i. e. 50 clicks lead to +45 Mb.

MimeType

Hello,
In Synapse I define MimeType like this :

hsend : THTTPSend;�
hsend.MimeType := 'application/x-www-form-urlencoded';�

Trying to figure out how to do that here with httpRequest using synapseinternetaccess��, tried few things but no luck. Any tips ?

Android build problem

I am try to use internettools in a Lazarus project with LAMW for android (ARMv6) and when i try to build the compiler fail with this code:
FLRE.pas(23075,0) Error: Error while assembling exitcode 1
am just want to use an htmlparser (with templates preferible) to extract data from an html document, is always needed the FLRE library?

Memory leak?

Can you please eliminate this heaptrc warning.
It make me harder to locate my own actual memory leak.
project1.trc.txt

Thanks.

Cannot compile due to lack of files to bootstrap bbutils.pas and bbutilsh.inc

Hi there,

First I tried with the current HEAD version and found out there is no bbutilsh.inc, so I dug around and found the *_generate programs.

Not even after downloading the VideLibri 1.71, that contains a bbutils_template.pas, I could bootstrap: __safemove__() is nowhere to be found.

Is there something I'm missing?

Many thanks in advance for any help!

Cheers,
Gus

Multi Header

Hello,
I think there was an error if you use multiple header

unit w32internetaccess;

function TW32InternetAccess.doTransferRec
....
HttpAddRequestHeadersA(hfile,
pchar(additionalHeaders), length(additionalHeaders[i]),
HTTP_ADDREQ_FLAG_REPLACE or HTTP_ADDREQ_FLAG_ADD)

i use this code :
HttpAddRequestHeadersA(hfile,
PChar(additionalHeaders[i]+#13#10#0), dword(-1),
HTTP_ADDREQ_FLAG_REPLACE OR HTTP_ADDREQ_FLAG_ADD)

[Suggestion] Why do you still use travis and not GitHub actions?

Hi there,

Full disclaimer, this is me tutting my own horn since I did develop the GitHub Action setup-lazarus.

Are you doing anything so special that can only be done by Travis?

If not, take a look at these(maybe it has enough to get you unstuck from travis.com):

  • setup-lazarus - GitHub action to install Lazarus+FPC and some, still incomplete, support for OPM packages
  • lazarus-with-github-actions - The simplest of examples on how to use the setup-lazarus GitHub action.

Cheers,
Gus

Xpath 3.1 update

Hello
I use this tool to learn xpath and python web scraping and I was wondering how much effort is required to update this to Xpath 3.1 please

Memory leaks with fpc trunk?

When using fpc trunk. This simple code produce a memory leaks.
query('doc("https://www.google.com")//a/@href').toString

Call trace for block $0649FEE0 size 32
  $00560DAF  TXQVALUENODE__CREATE,  line 1709 of ./data/xquery_types.inc
  $0056B438  TXQVLIST__ADD,  line 5742 of ./data/xquery.pas
  $00570FAA  TXQUERYENGINE__EXPANDSEQUENCE,  line 7557 of ./data/xquery.pas
  $005517A6  TXQTERMPATH__EVALUATE,  line 2930 of ./data/xquery_terms.inc
  $00543D3E  TXQUERY__EVALUATE,  line 4952 of ./data/xquery.pas
  $00543C72  TXQUERY__EVALUATE,  line 4941 of ./data/xquery.pas
  $0056DB71  TXQUERYENGINE__EVALUATE,  line 6675 of ./data/xquery.pas
  $0056DD19  TXQUERYENGINE__EVALUATEXQUERY3,  line 6725 of ./data/xquery.pas

project1.trc.txt

This is not happen with fpc 3.0.2. I'm not sure where to look at. Maybe an fpc bug. Not sure what to report.

Failed to compile, cause Error: Incompatible type for arg no. 2

Hello, I got this error when failed to compile?

xquery__regex.pas(696,70) Error: Incompatible type for arg no. 2: Got "<procedure variable type of function(const PChar;const TFLRECaptures):AnsiString(0) of object;Register>", expected "<procedure variable type of function(const PChar;const LongInt;const TFLRECaptures):AnsiString(0) of object;Register>"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.