Giter Site home page Giter Site logo

importsql's Introduction

importSQL

A configurable and re-usable python script to import data from an import.io extractor into an SQL database

Dependencies

You will need to install the following:

  1. python
  2. python-mysqldb (used as an sql client only)
  3. A running SQL server (SQLlite, MySQL etc...)

Running it

To run it you have a few options, but firstly you need to have your table and database ready

Run the following commands to set up the demo table:

mysql>

CREATE DATABASE ILOVEOWLS 
    DEFAULT CHARACTER SET utf8 
    DEFAULT COLLATE utf8_general_ci;
    
USE ILOVEOWLS;

CREATE TABLE OMFGowls
(
field_image varchar(255),
field_name varchar(255),
field_price varchar(255),
field_size varchar(255),
field_colour varchar(255)
) DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;

There is now support for a date field created at Script execution. The user needs to add a date field to the Table.

The above example would look like this:

mysql>

CREATE DATABASE ILOVEOWLS;
USE ILOVEOWLS;

CREATE TABLE OMFGowls
(
field_image varchar(255),
field_name varchar(255),
field_price varchar(255),
field_size varchar(255),
field_colour varchar(255),
date DATE
) DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;

To add the date field to an already existing Table use the following command


ALTER TABLE OMFGowls ADD COLUMN date DATE;

Using a config Script

  • Using this you can set up a configuration that you can set up once and re-use to get the latest data into yours database table

  • Make sure you know your import.io user credentials

  • Create a config.json file that follows this pattern:

{
	// Config for import.io
	"sourceUUID": "94cdc938-c24e-42db-b94f-3fb852c450a9",
	"inputUrls": ["http://owlkingdom.com","http://owlkingdom.com"],
	"ioAPIKey": "[your API key]",

	// Config for mysql
	"table": "OMFGowls",
	"database": "ILOVEOWLS",
	"host": "localhost",
	"port": 3306,
	"username": "root",
	"password": "root",
 
        "addDate": true,
	"mapping": {
		"image": "field_image",
		"name": "field_name",
		"price": "field_price",
		"size": "field_size",
		"colour": "field_colour"
	}
}
  • Put it in the same directory as your importSQL script.
  • RUN IT! importSQL [optional:-U [sql username] -P [sql password] -p [io API key]]

This json file above will grab the owls from Owl Kingdom and put them into your SQL table

mapping

This mapping field defines the mapping between the column names in import.io and the column names in your MySQL database

"mapping": {
   	// import.io	// MySQL
	"image": 	"field_image",
	"name": 	"field_name",
	"price": 	"field_price",
	"size": 	"field_size",
	"colour": 	"field_colour"
}

Getting Data from Crawlers

To get all the data from your crawl from import.io into your SQL database, you can use the -c option to turn on crawler mode. This will get the data from the last crawl snapshot and not directly query the crawler using an input url.

(if you have settings in a config file, they will be loaded, but overwritten by anything you supply on the commandline)

importSQL -c -s "your crawler guid" [optional:-U [sql username] -P [sql password] -p [io API key]]

Using commandline options

This version of importSQL supports multiple inputUrls. Support is already added to config.json but not to the cli options. also addDate is currently missing

  • When using just commandline options, be aware the the script will assume that the column names from import.io match the columns names in mysql

Here are the list of commandline options you can use:

  • -U mysql username (default: root)

  • -P mysql password (default: root)

  • -H mysql host name (default: localhost)

  • -E mysql port number (default: 3306)

  • -t mysql table name

  • -d mysql database name

  • -p import.io APIKey

  • -s source UUID

  • -i input url for data source

  • -c flag to tell if the source you want data from is an uploaded crawl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.