rllc / llc-archives Goto Github PK
View Code? Open in Web Editor NEWArchived sermons, backed by the cloud
Home Page: https://llc-archives.herokuapp.com
License: MIT License
Archived sermons, backed by the cloud
Home Page: https://llc-archives.herokuapp.com
License: MIT License
To extract the MP3 metadata stored in an mp3 file on amazon the files must be (partially) downloaded. This is somewhat expensive and racks up our amazon S3 usage.
Maintain a lastSyncDate
variable or similar to track when the last sync was run. Use the amazon API to filter files by create time
or last modified time
.
Google drive spreadsheets isn't much of a data store. Migrate to using real data storage. Perhaps heroku postgres or google cloud storage.
The majority of traffic is coming from mobile devices. The header takes up a sizable portion of a small resolution screen and provides zero value - remove the header.
Phoenix is storing their MP3 sermons in ID3v1 format, which puts metadata at the very end of the file. Processing is currently relying on metadata to be at the very start of a file.
If the first part of a MP3 file does not contain a valid MPEG frame, download the last 128 bytes to check for metadata.
The free usage tier of postgres limits us to a set number of database rows. As the sync execution audit log is not that useful, only maintain a single record in the table.
s3-us-west-2
This should be changed to s3.amazonaws.com
<Error>
<Code>PermanentRedirect</Code>
<Message>
The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
</Message>
<Bucket>phoenix-archives</Bucket>
<Endpoint>s3.amazonaws.com</Endpoint>
<RequestId>93C8E90247E629B7</RequestId>
<HostId>
ivU1FTm2aEDtMGIrum8Vie2VR9O3BV+Chb9d7iu41z6LDPejxSzexa30Gu1rIhAF1HShXWNSyfQ=
</HostId>
</Error>
This will require running a query to update all existing MP3 file URLs in the database
We need to be able to support multiple congregations. Perhaps end points such as
GET /v1/{congregation_id}/sermons
GET /v1/{congregation_id}/sermons/{sermon_id}
GET /v1/congregations
GET /v1/congregations/{congregation_id}
Build a welcome page.
Rockford uses the convention
mp3 tag field | parsed as |
---|---|
title | date + time |
album | bible text |
artist | minister |
other congregations may not use the same mp3 tag conventions.
Sermons have an event field displayed in the ui. The domain backing this calls the field "comment". Update the domain model with consistent naming.
The logs are getting very noisy. Reduce logging for the mp3 parsing library
Currently we are using the heroku free usage tier, which does not allow an app to run for more than 18 hours in a 24 hour period.
Create a custom html page for errors and maintenance that heroku will serve up in the event that our quota is hit and the app is offline
It would be useful to include the Google analytics hooks on this page as well, so we can track how often people are unable to view sermons due to downtime.
https://devcenter.heroku.com/articles/error-pages#customize-pages
Create a RemoteMp3DiscoveryService strategy implementation that uses the amazon s3 java API to iterate all existing files uploaded. Download only the ID3 tag of each file and extract Mp3 data from each file.
Mp3 meta data synchronization as a service. Congregations would not need to run the spreadsheet-updater locally on their webcast computer. Synchronizing of MP3 data would happen via a quartz/cron job.
InputStream is = new URL(sermon).openStream()
def size = 1024
byte[] buf = new byte[size];
is.read(buf, 0, size);
File targetFile = File.createTempFile("temp", ".mp3");
OutputStream outStream = new FileOutputStream(targetFile);
outStream.write(buf);
def mp3file = new Mp3File(targetFile.absolutePath);
def id3v1Tag = mp3file.hasId3v1Tag() ? mp3file.id3v1Tag : mp3file.id3v2Tag
def sermon = mp3DiscoveryService.extractId3v1TagData(targetFile, id3v1Tag)
sermon.minister == 'John Doe'
sermon.bibletext == '1 Kings 19:9-18'
sermon.date == '04/12/2015'
sermon.time == '18:30'
sermon.notes == ''
Set up a schedule to kick off synchronization of aws S3 buckets with the database.
Allow query parameters to specify various filters to select which sermons to refresh. This would allow one to sync specific files.
e.g.
fromDate=02/03/2015&toDate=02/10/2015
congregation=rllc
As a webcast admin, it would be useful to get status emails with archive changes
Several peopl mentioned it would be nice to have a podcast feed of the sermons. I assume it would feed from each congregations page on heroku.
It chews up a fair amount of RAM and CPU to download and process all files in one shot. Refactor processing to process files one at a time.
Nice To Have : ability to throttle processing at a threshold.
Dates are not being parsed properly in all cases.
Date | File Name |
---|---|
0014-03-16 00:00:00 | 2014/20140316_SRoiko.mp3 |
0014-03-23 00:00:00 | 2014/20140323_JHaapsaari.mp3 |
0014-04-27 00:00:00 | /2014/20140427_NMuhonen.mp3 |
0014-04-27 00:00:00 | /2014/20141027_RNevala.mp3 |
0014-05-04 00:00:00 | /2014/20140504_JHaapsaari.mp3 |
0014-06-20 00:00:00 | /2014/20140620_JHaapsaari.mp3 |
0014-08-10 00:00:00 | /2014/20140810_RNevala.mp3 |
0014-09-14 00:00:00 | /2014/20140914_CKumpula.mp3 |
0014-11-23 00:00:00 | /2014/20141123_JHaapsaari.mp3 |
0015-05-03 00:00:00 | /2015/20150503_NMuhonen.mp3 |
0015-07-12 00:00:00 | /2015/20150712_CKumpula.mp3 |
0015-08-02 00:00:00 | /2015/20150802_JLehtola.mp3 |
Stop polling every hour to find changes. Use the built in amazon notification features of S3.
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
Notes
from ID3v1 tag Comments fieldThe google spreadsheet has a Notes
column. Map the ID3V1 comment field to the Notes column.
The use case for the Notes
column is displaying church calendar events to make specific sermons easier to locate. Examples of Notes
data:
If the church calendar could be parsed, the Notes
field could be auto populated with the church calendar event by matching the date of the recording with event dates.
A few issues have been surfaced regarding minister name parsing
Art Simon
-> Martin Simonson
instead of Arthur Simonson
) as Martin
is closer to Art
than Arthur
is.If a mp3 file does not have a minister in the artist
tag, an attempt is made to parse the minister out of the filename. Rockford assumes a file name in the format YYYYMMDD_JHaapsaari.mp3
. When this happens, the most similar minister is being picked, regardless if the decision is indeterministic. For example a filename of 2014/0629_RSorvala.mp3
could be either from Rodrick Sorvala
or Rory Sorvala
. In a case like this, the minister should be left blank.
Adrian has provided a revised list of ministers. Drop the existing table and import the new list.
According to the README.md versions badge, our dependencies are out of date.
Update to the latest version of the outdated libraries.
Currently when a congregation is selected the name is the abbreviation, e.g. PLLC, it should be the full congregation name, e.g. Phoenix since the title already indicated LLC Archived Sermons.
Mp3DiscoverServiceImpl only supports local directory scanning.
Refactor to an abstract base class such that a Local and Remote Mp3 discovery service can be created easily.
The spring boot fav icon is not ideal. Replace with something church related.
autocorrecting names is adding a lot of complexity to the application and is choosing incorrect names in some cases.
Remove auto correction for the time being.
As an admin user, I would like the ability to edit meta data that is displayed for sermons.
Each congregation will be responsible for uploading their archived sermons to amazon s3 buckets.
Provide documentation on
Also provide instruction on scheduling this command to run periodically. Consider both Windows + Linux environments.
Ideally all columns on the sermons table would be sortable (multi column sort would be fantastic). Search/Filter utilities for bonus points.
Perhaps the angular-datatables library would do what we need?
When webcasters are exporting audio, they hand type the minister's name. This is error prone, leading to misspelled names being displayed on the public facing archives. The spreadsheet-updater is downstream from the actual MP3 creation, as such it has no control over source data.
When parsing minister name from the MP3 tag, compare it to a master list of ministers; pick the minister's name that is the most similar. There are various algorithms for finding string similarity. See Grails' implementation of CosineSimilarity for an example.
Extra nice if the master list is maintainable by LLC (perhaps another google spreadsheet?)
It would be nice to track user events to know what content is being interacted with the most.
for example:
In preparation for #9 , configuration for multiple congregations will need to be supported.
Perhaps this could be as simple as a properties file like:
llc.rockford.shortName=RLLC
llc.rockford.longName=Rockford Laestadian Lutheran Church
llc.rockford.aws.username=rllc-read
llc.rockford.aws.bucket=https://s3-us-west-2.amazonaws.com/
llc.rockford.aws.key.id=<aws-key>
llc.rockford.aws.key.token=<aws-token>
llc.rockford.google.username=<google-username>
llc.rockford.google.password=<google-password>
llc.rockford.google.spreadsheet=RLLC Archived Sermons
llc.rockford.google.worksheet=Sheet1
llc.minneapolis.shortName=MLLC
llc.minneapolis.longName=Minneapolis Laestadian Lutheran Church
llc.minneapolis.aws.username=mllc-read
...
LLC is requesting rendering of ministers as Lastname, Firstname
lastModified
timestamp that is older than the lastExecution
time of the scheduled scan, the file is not picked up.The remote (aws s3) directory should be compared to the locally synced database content and examined for differences.
Create an electronic form to capture congregation requests for S3 credentials
When no mpeg frames are found in the first snippet of a file, revert to parsing the filname for date and minister name.
If the minister is not found in the database it is left blank. This happens for ministers from Finland who are not in the database.
existingSermon.minister = minister ? minister.naturalName : ''
If this is to be a LLC-wide application, we should rename the application from com.rllc.spreadsheet
to something more appropriate.
Using the REST API for traversing sermons and congregations is a bit treacherous without exposing IDs. When sermons are sorted a certain way and an index is used to pick the sermon, the rendered sermon may be incorrect.
Expose id values for Sermon
and Congregation
objects
currently the amazon s3 command line tool is being used to sync the local directory with amazon s3 bucket. While this is very nice as it uses a prebuilt command line tool, it requires 2 applications to run. Consolidate syncing into some kind of awsSyncService
inside of spreadsheet-updater. Perhaps it could even be a wrapper around the s3 process?
If #9 is realized, the s3 command line tool will be the only tooling needed on congregational computers.
Add logo to web app.
Currently the favicon is the spring boot icon. Update to whatever logo is created.
When webcasters are exporting audio, they hand type the bible text. This is error prone, leading to misspelled bible text and inconsistent abbreviations being displayed on the public facing archives. The spreadsheet-updater is downstream from the actual MP3 creation, as such it has no control over source data.
When parsing bible text from the MP3 tag, compare it to a master list of bible text; pick the bible text that is the most similar. There are various algorithms for finding string similarity. See Grails' implementation of CosineSimilarity for an example.
Extra nice if the master list is maintainable by LLC
This is more complicated than minister names, as abbreviations are in play. The ideal solution would create a mapping of all expected variants of books to their preferred convention then use the mapping to resolve the true value.
[
'Matthew' : 'Matt.',
'St Matthew' : 'Matt.',
'Matt' : 'Matt.',
'Ezekiel' : 'Ezek.',
'Ezek' : 'Ezek.',
'Ez' : 'Ezek.'
]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.