Experiment to extract Spanish verb tenses from "spanish_billion_words" Corpus (https://www.corpusdelespanol.org/) using PySPARK and AWS. For TOP100 frequent Spanish verbs (and selected basic tenses) it resulted in a dataset of ~3.000.000 sentences. Aim of this experiment was to extract short sentences using various verb tenses for the Spanish conjugation practicing (as there are is any such database available).
Feel free to change the list of Spanish verbs (TOP100) and create your own dataset according to your interest. Using AWS the whole process had taken ~20 minutes, occupying cluster 1+7 nodes (cost < 1USD).
Aquí está un baso de frases para practicar de la conjugación del español. TOP100 verbos españoles utilizados más frecuentes. Aproximadamente 3 millones frases con varias formas de los verbos. Todo para descargar (base de datos) o utilizar PySPARk para hacer su experimento que le gusta.