-
Husky
-
Lint-Staged
-
Travis CI
-
index.js as main point
-
created great test coverage
-
Extend Travis CI script: when new module released, it should work well with our projects3.0 automatically
-
check if we can use checkFilePath from generator
-
find out, what methods should be removed from utils.js, because they are duplicates from
Note: I didn't test them here(at separated place).
And i also think that projects should evolve in order to get able to use csv_parser
as separated entity correctly.
FoodComposition it is our first dataset that we actually parsed before, when this module was part of sd module repository codebase.
That code was working before. It can be an example of how we calling methods from src
folder.
When data was parsed. It calling methods from our another module - generator module
You can find how we execute this script at package.json
"csv:fc" - FoodComposition,
USFA is a second, separated dataset that we should parse
below is a list to script that executing parser for different CSV files that we have.
"csv:usfa1" - USFA/Derivation_Code_Description
"csv:usfa2" - USFA/Nutrition
"csv:usfa3" - USFA/Product
"csv:usfa4" - USFA/ServingSize
FAO is a third dataset. I think we didn't start to create a parse file for it.
Several quick start options are available:
- Clone the repo:
git clone https://github.com/GroceriStar/food-datasets-csv-parser.git
- Install with npm:
npm install @groceristar/food-datasets-csv-parser
- Install with yarn:
yarn add @groceristar/food-datasets-csv-parser
npm run parseCsv
oryarn parseCsv
: parse from csv to json Food Composition
To split json file you will require sd/generator/writeFile.js
.
Call the function splitObject() with parameters path
(as string),filename
(as string) and a flag
(0 or 1).
Flag=0
means splitted elements are to be name after the name
attribute and if flag=1
then elements will be give named by a number with removed whitespaces and in lowercase to maintain uniformity.
The splitted elements will be stored at the given path
/filename_elements
.
splitObject('path_of_directory','fileName',0) - split files by their name attribute.
splitObject('path_of_directory','fileName',1) - split files by indexing them from 0.
Checkout the folder fileName_elements
in the path_of_directory
to see files or you can use function getFileInfo()
.
To call the function getFileInfo(path,flag,fileName)
you will require sd/src/utils.js
. It can be invoked with 3 parameteres and 2 of them are optional depending on task. First parameter is path
and it is required for functionality. The second and third parameters are flag
and fileName
.
If flag=1
it will return the content of all files present in the path else if fileName
is given then it will return the content of the specified file.
If there is only one parameter that is path
or with flag=0
it will return list of all files present in the directory.
You can combine objects by calling function combineObjects() from writeFile.js. It takes 2 parameters path
and list of keys_to_be_removed
.
combineObject(path, keys_to_be_removed) - This will read all files in the given path and remove the keys given the list of keys_to_be_removed and saves it into a new file in the given path
as name <dirName>_combined.json
.
Example:- combineObject('/abc/pqr/', ['id', 'img'])
If you want to modify the json structure of splitted files and combine them again to a single file then you can call splitObject with a call back function.
Create a folder you want the generated json file(s) to be.
Also create a parser.js file in the created folder.
In csvParser.js call ParseDirectoryFiles()
from csvParser.js with
parameters directoryPath
(the folder to read your csv file(s) from) as string,
and headers (the header of the csv files ) as array of string.
In csvParser.js
ParseDirectoryFiles(directoryPath, headers)
=> csvToJson(directory, file, headers)
=> splitJsonFile(fileName)
=> filewriter(i, fileName, start, stop)
ParseDirectoryFiles
gets a directory path from call, and reads all files in the
directory but will only pass csv files to csvToJson(directory, file, headers)
.
Each csv file is passed into `csvParser()``.
csvToJson ()
--get the file directory path, filename(file) and headers and generate a Json file for the csv files using the headers as keys.
The JSON file generated is stored in variable result.
File Name is passed is to `splitJsonFile(file)`` to keep track of the file being
- variable
numberOfFile
stores the number of JSON files to get from JSON stored in result. So that each JSON file has maximium entry of 10000 stored in variable maxEntries.Filewriter
function is called inside thesplitJsonFile
function
It takes in the child number of the json file( i ) ,the file name( fileName ),the interval the json stored in result should start and stop slicing. The sliced data will be written into the folder calling parserFile
function along side file name being parsed and the child number of the file.
.
├── CCCSVParser.js
├── FoodComposition
│ ├── FoodComposition\ -\ Finland.json
│ ├── FoodComposition\ -\ France.json
│ ├── FoodComposition\ -\ Germany.json
│ ├── FoodComposition\ -\ Italy.json
│ ├── FoodComposition\ -\ Netherlands.json
│ ├── FoodComposition\ -\ Sweden.json
│ ├── FoodComposition\ -\ United\ Kingdom.json
│ ├── FoodComposition.json
│ ├── csv_parser.js
│ └── files.js
├── USFA
│ ├── Derivation_Code_Description
│ │ ├── Derivation_Code_Description1.json
│ │ └── parser.js
│ ├── Nutrition
│ │ ├── Nutrient01.json
│ │ ├── files.js
│ │ └── parser.js
│ ├── Product
│ │ ├── Products01.json
│ │ └── parser.js
│ ├── Readme.md
│ ├── Serving_Size
│ │ ├── Serving_Size1.json
│ │ └── parser.js
│ └── files.js
├── fileSystem.js
├── index.js
├── utils.js
└── writeFile.js
it should be a pretty similar work that we've made with FoodComposition data and USFA data as well. we just have a different dataset, with different headers and files, stored here: https://github.com/ChickenKyiv/awesome-food-db-strucutures/tree/master/FAO
logic is simple - it should have a similar structure as USFA has and similar parser files logic is simple - it should have a similar structure as USFA has and similar parser files
1st generation of parser scripts is related to Food composition and located at folder
example of 2nd gen parser script is here
Where should I write parser for FAO?
For now, use the same logic as we have at this repository,
i.e. at src
folder you can see now 3 folder that are our folders for storing data and parsers from different dataset.
It's our old logic of locating files. Later we'll move all projects our from src
folder.
I created projects3.0
- we'll move there our code later when it will work at least partially.
What we should do in order to create a parser, related to FAO dataset from scratch?
Keep in mind that part of these was actually completed
- create a folder with name FAO
- copy any parser.js from USFA project
- upload to that folder csv files from https://github.com/ChickenKyiv/awesome-food-db-strucutures/tree/master/FAO
- for each "table" - each file with exported data are actually a table from database. For each "table" you should create separated folders, like we did in USFA case.
- for each "table" you should have a separated file parser.js that will be a script that we call for parsing each table
- you can add a line at package.json so you'll be able to call your script(right now it wouldn't work, but soon, we'll finish other changes that will help to make this parsers work)
It looks like these .csv files have many headers. Whereas in the USFA version, you could easily hardcode the headers and pass them as the second argument to parseDirectoryFiles(), here I will need to dynamically obtain the headers from each file.
For this kind of problem we created a new method, that should be tested and used.
it's called getHeaders
and located here
We didn't battle-tested it. So if getHeaders
require changes - it's ok.