Giter Site home page Giter Site logo

mastino's Introduction

Mastino

A Golang library for scraping web page. It can be used with any language ( php, java, c# ... ) since Mastino uses json as input/output interchange format.

Usage

run commands

git clone https://github.com/angeloLed/mastino.git mastino
cd mastino
go get -u

You can also run Mastino like this:

#compiled
./mastino "{\"url\":\"https://blog.golang.org/error-handling-and-go\", \"tags\": [{\"type\": \"div\",\"attributes\": [{\"key\": \"class\",\"value\": \"code\"}]}]}"

# or not compiled
go run main.go "{\"url\":\"https://blog.golang.org/error-handling-and-go\", \"tags\": [{\"type\": \"div\",\"attributes\": [{\"key\": \"class\",\"value\": \"code\"}]}]}"

Manifest configuration Example of manifest configuration:

{
	"url": "https://google.com",
	"tags": [{
			"type": "a",
			"attributes": [{
				"key": "class",
				"value": "bg1"
			}, {
				"type": "span"
			}]
		},
		{
			"type": "div",
			"attributes": [{
			    "key": "css"
	        }]
    }]
}
key type description
url string target url
tags array each tag that Mastino must try to bite. The key contains all the matching rules. Contains attributes type
type string DOM element
attributes array each attributes of tag to verify. Contais key value
key string the key of attribute
value string the value of attribute

Possibile matching result

  • tag is specified without attribute
  • attribute key is specified without value
  • attribute key is specified and value match with value of DOM

Json Output Mastino returns a JSON with the same structure of the provided input JSON. Each match found is returned us a matches entry inside tags element. matches is an array of strings ( the DOM ).

{
	"url": "https://blog.golang.org/error-handling-and-go",
	"tags": [{
		"type": "div",
		"attributes": [{
			"key": "class",
			"value": "code"
		}],
		"matches": ["<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">"]
	}],
	"error": false,
	"message": ""
}

Using Mastino with Other languages PHP:

$jsonManifest = "{\"url\":\"https://blog.golang.org/error-handling-and-go\", \"tags\": [{\"type\": \"div\",\"attributes\": [{\"key\": \"class\",\"value\": \"code\"}]}]}";
$output = shell_exec("/path/to/mastino {$jsonManifest}");
var_dump(json_decode($output, true));

C# :

using System;
using System.Diagnostics;

class Runshell
{
  static void Main()
  {
    ProcessStartInfo psi = new ProcessStartInfo();
    psi.FileName = "/path/to/mastino";
    psi.UseShellExecute = false;
    psi.RedirectStandardOutput = true;

    psi.Arguments = "{\"url\":\"https://blog.golang.org/error-handling-and-go\", \"tags\": [{\"type\": \"div\",\"attributes\": [{\"key\": \"class\",\"value\": \"code\"}]}]}";
    Process p = Process.Start(psi);
    string strOutput = p.StandardOutput.ReadToEnd();
    p.WaitForExit();
    Console.WriteLine(strOutput);
  }
}

Nodejs:

var sys = require('sys')
var exec = require('child_process').exec;
function puts(error, stdout, stderr) { sys.puts(stdout) }
var manifest = "{\"url\":\"https://blog.golang.org/error-handling-and-go\", \"tags\": [{\"type\": \"div\",\"attributes\": [{\"key\": \"class\",\"value\": \"code\"}]}]}";
exec("/path/to/mastino " + manifest, puts);

Golang:

package main

import "os/exec"

func main() {
    app := "/path/to/mastino"

    manifest := "{\"url\":\"https://blog.golang.org/error-handling-and-go\", \"tags\": [{\"type\": \"div\",\"attributes\": [{\"key\": \"class\",\"value\": \"code\"}]}]}"

    cmd := exec.Command(app, manifest, nil, nil, nil)
    stdout, err := cmd.Output()

    if err != nil {
        println(err.Error())
        return
    }

    print(string(stdout))
}

License

MIT

mastino's People

Contributors

angeloled avatar doomita avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

w1r2p1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.