Create requirements.txt file with all depencies and upload to S3 Bucket.
Create depencies.sh file :
#!/bin/bash
sudo pip-3.4 install -r https://s3.amazonaws.com/bucket/requirements.txt
Use EMR’s bootstrap
Create configuration.js
[
{
"Classification": "spark-env",
"Properties": {},
"Configurations": [
{
"Classification": "export",
"Properties": {
"PYSPARK_PYTHON": "python34"
},
"Configurations": []
}
]
}
]
Example :
aws emr create-cluster [..config..] --region eu-central-1 --configurations file://configurations.json --bootstrap-action Path="s3://bucket/dependencies.sh"
Go SSH on instance :
ssh -i xxxp.em [email protected]
Display logs :
yarn logs -applicationId <applicationID>