klareda / klar-eda Goto Github PK
View Code? Open in Web Editor NEWA python library for automated exploratory data analysis
Home Page: https://klareda.github.io/klar-EDA/
License: MIT License
A python library for automated exploratory data analysis
Home Page: https://klareda.github.io/klar-EDA/
License: MIT License
pip3 install -i https://test.pypi.org/simple klar-eda
fails with error because of a missing dependency
It starts failing sklearn, opencv-python, tensorflow, pandas, sphinx, matplotlib and seaborn.
Impacts severely on ease of use and user experience.
The implementation can take one or multiple methods. After the implementations of the method(s), the following things are
achievable :
- Mean Normalisation of features
- Standardization of features
For standardization of features, it is assumed that the data is in Gaussian Distribution
- DataFrame
Processed data according to the method. After Standardization or Normalisation.
I have made some minor grammatical edits to the README.md file. Kindly refer to PR #35.
Add help description to the package, that helps the user understand the purpose of the project, modules, submodules etc.
Currently no description provided, it is as follows:
Help on package klar_eda:
NAME
klar_eda
PACKAGE CONTENTS
preprocessing
visualization
SUBMODULES
preprocess
visualize
FILE
(built-in)
(END)
Please Note -
This is a generic issue and multiple students can work on the same. Notify the mentors once you identify a method (as mentioned above). The mentor will create a separate issue and assign you the same.
Contribution guidelines will be updated soon. Please refer them for guidance before committing any development work.
Hi ! I would like to create a file named contributing.md which will
Add the following -
1.Difference between GIT and GITHUB
2.How to clone,fork repository
3.How to create a branch and then use git push to push to repo
4.Create a PR
5.Squash commits in a single issue into one
6, Updating the forked and local repo as the updations are made in the upstream
I would like to work on this as a part of GSSOC'21
Indentation error on import
PFA logs
>>>from klar_eda.preprocessing import preprocess_csv
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/harsha/opt/anaconda3/envs/klar/lib/python3.7/site-packages/klar_eda/preprocessing.py", line 1, in <module>
from .preprocess.csv_preprocess import CSVPreProcess
File "/Users/harsha/opt/anaconda3/envs/klar/lib/python3.7/site-packages/klar_eda/preprocess/csv_preprocess.py", line 71
if ret == True:
^
IndentationError: unindent does not match any outer indentation level
>>> from klar_eda import visualization
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/harsha/opt/anaconda3/envs/klar/lib/python3.7/site-packages/klar_eda/visualization.py", line 1, in <module>
from .visualize.csv_visualize import CSVVisualize
File "/Users/harsha/opt/anaconda3/envs/klar/lib/python3.7/site-packages/klar_eda/visualize/csv_visualize.py", line 13, in <module>
from ..preprocess.csv_preprocess import CSVPreProcess
File "/Users/harshams/opt/anaconda3/envs/klar/lib/python3.7/site-packages/klar_eda/preprocess/csv_preprocess.py", line 71
if ret == True:
^
IndentationError: unindent does not match any outer indentation level
Hi, I would love to contribute to this project during GSSOC 21. Could you tell me a bit more about the progress on the project and what sort of features you expect us to implement during the contribution period?
Is your feature request related to a problem? Please describe.
Image preprocessing is a broad umbrella which encompasses various kind of techniques, which can be grouped into a submodule. It is always good to segregate related functionalities into a submodule, gives a good user experience as well
Describe the solution you'd like
The module can be divided into sub-modules such as transformation(pixel brightness, geometric), filtering(spatial, frequency), segmentation(edge-based, region-based), morphology(binary, grayscale) and so on.
Input
Raw image
Output
Preprocessed image
Note
The use of standard python libraries is highly recommended.
Please Note -
This is a generic issue and multiple students can work on the same. Notify the mentors once you identify a method (as mentioned above). The mentor will create a separate issue and assign you the same.
Contribution guidelines will be updated soon. Please refer them for guidance before committing any development work.
Description
Document the methods present in the
csv_visualize.py
abiding by the Sphinx Documentation examples
Is your feature request related to a problem? Please describe.
This is a calculator for Exploratory Data Analysis.
We can import dataset as csv and excel format.
We can then put row and column and then there mathematical aspects .
For example mean ,median ,mode, max min , variance.
This also help in making some plots like
line, histogram, scatter, regression ,etc.
Input
CSV or excel file
Output
Numeric or graphical
Note
This is just additional support to library .
Additional context
I will be working on these feature under GSSoC'22
We don't have to go to code to just find the some value again and again.
Description
Document the methods present in the
image_visualize.py
abiding by the Sphinx Documentation examples
Description
a. Write a method to identify the columns of type
date
(this may include iterating over the list of columns and using an appropriate strategy to identify if a column has values of typedate
)
b. Implement another method that should be able to convert the date column into a specific static format (for example - YYYY-MM-DD) and split the date column into separate columns with the following attribute values:
- Date of the month (for example - 28 for '2021-12-28')
- Month (Numerical)
- Year
- Day of the week
c. Appropriate test methods should be implemented in the
date_format_tests
file
Assumptions
The following assumptions can be made during the implementation
- No time is present in the given input date.
- The data frame must contain column names
- A list of input patterns can be assumed. (For example - you can assume the input will be in either of any known formats mentioned).
input_date_format = [ 'DD/MM/YYYY', 'YYYY/DD/MM', 'MM/DD/YYYY', 'YYYY/MM/DD', 'DD-MM-YYYY', 'YYYY-DD-MM', 'MM-DD-YYYY', 'YYYY-MM-DD' ]
Input (Method -1)
None
Output (Method-1)
list of column names with values of type date
Method details
Use the data frame from the
self.df
variable.
Input (Method -2)
An expected format the input date should be converted to
Output (Method-2)
None
Method details
Use the data frame from the
self.df
variable.
Implement a method for the same with appropriate name and parameters in the
csv_preprocess.py
file.
In the implementation use the method
convert_date_format
for converting the date into a specific format & the method-1 mentioned above to get a list of columns with date type.
Note
The use of standard python libraries is highly recommended.
JOIN THE SLACK CHANNEL HERE
if you wish to contribute to this issue.
Description
The method should be able to identify and convert the date into a specific static format.
The functionality of the method can be described as below -
- Take in any type of date as input (for example - 2021-11-13)
- Identify the format (for example - YYYY-MM-DD)
- Convert the date into any desired format (for example - DD/MM/YYYY)
Assumptions
The following assumptions can be made during the implementation
- No time is present in the given input date.
- The input will be only a string.
- A list of input patterns can be assumed. (For example - you can assume the input will be in either of any known formats mentioned).
input_date_format = [ 'DD/MM/YYYY', 'YYYY/DD/MM', 'MM/DD/YYYY', 'YYYY/MM/DD', 'DD-MM-YYYY', 'YYYY-DD-MM', 'MM-DD-YYYY', 'YYYY-MM-DD' ]
Input
A string in any of the formats mentioned above (The contributor is free to add any other formats)
An expected output format the input date should be converted to
Output
A date in string converted into the desired format.
Note
The use of standard python libraries is highly recommended.
I will like to add tech stack logo used in our project this will make our project attractive !! Will start to work on this issue as soon as i get assigned !!
Is your feature request related to a problem? Please describe.
To add contributor's name in README.md file.
@ashish-hacker @harshasridhar @ishaanballal21 @rishabh-me
README is the first file one should read when starting a new project. It's a set of useful information about a project, and a kind of manual. A README text file appears in many various places and refers not only to programming. So i want to make your README file more meaningful and more easier to understand the whole project.
Description
Document the methods present in the
image_preprocess.py
abiding by the Sphinx Documentation examples
Description
Document the methods present in the
csv_preprocess.py
abiding by the Sphinx Documentation examples
Please Note -
This is a generic issue and multiple students can work on the same. Notify the mentors once you identify a method (as mentioned above). The mentor will create a separate issue and assign you the same.
Contribution guidelines will be updated soon. Please refer them for guidance before committing any development work.
remove grammatical mistakes
I will like to add a feature this will update contributor list in read me automatically !! Will start to work on this issue as soon as i get assigend !!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.