Giter Site home page Giter Site logo

exploresa-gawler's Introduction

exploreSA-Gawler

Background: This github project is part of the exploreSA competition on unearthed.solution Here is my original post on exploreSA forum to explain what our team has done for the data cleaning part. For more information please visit the unearth website there is detailed description page


Table of Contents

Description

Drilling, surface sample and observations data: A dataset with features and labels

The SARIG Data Package is an extract from the Geological Survey of South Australia’s (GSSA) geoscientific database SA Geodata. The data package includes a series of CSV files covering 4 parts:

  • Drill holes data, including the metadata of drill holes, and the geological, petrophysical, lithological data

  • Field observation data, including general observation, lithological and geological structure data.

  • Mineral deposits data, including location, lithology, mineralogy data

  • Rock sample data, including biostratigraphy, geochronology and petrography analysis data

This dataset plays an important role in the whole data pool used for predicting the cores in Gawler area. Since the dataset includes the commodities information and the elements from the geochemistry analysis which suggest the occurrence of the minerals. In other words, the labels for the training dataset is defined. The attributes of these datasets can be cleaned as features in the following machine learning modelling.

Installation and Packages

Step 1. Install Python 3.6 and use pip install packagenames to install the following packages:

  1. set up the jupyter notebook environment

  2. pandas

  3. zipfile

    or optionally, install Anaconda and use conda install packagenames

Step 2. run the ipynb files according to the filenames order: a) execute "1. RS_Data_Processing.ipynb" from the beginning until the dropdown box and the code will download the datafiles from the link: https://unearthed-exploresa.s3-ap-southeast-2.amazonaws.com/Unearthed_5_SARIG_Data_Package.zip if the zip file does not exist in the directory './data' b) select the element you are interested from the dropdown box, then continue to 'Run ALL Below'. The code will generate two csv files: 'rs_chem_site_sample_num.csv'and 'extracted_rs_data.csv' and save them to a subdirectory of './data', named after the element you selected; e.g., 'Fe'. The former includes the 'SITE_NO' and 'SAMPLE_NO' corresponding to the element you selected above, while the latter is the rs data related to this element. c) execute "2. Drillhole_Data_Processing.ipynb", "3. Field_Observation_Data_Processing.ipynb" and "4. MD_Data_Processing.ipynb" will generate the corresponding drillhole data, field observation data and mineral deposit data and save to directory corresponding to the element. d) further merge the data files to get the work datasets.

The idea of this repo is to show people the cleaning data process we have done until now. There are still a lot of work to do. We will keep updating the data preparation and our dat engineering results.

To continue

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.