Giter Site home page Giter Site logo

llc-archives's People

Contributors

mrmuggles avatar smcadams86 avatar timhillu avatar wforstie avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

mrmuggles

llc-archives's Issues

Limit metadata scanning to new files

Limit metadata scanning

To extract the MP3 metadata stored in an mp3 file on amazon the files must be (partially) downloaded. This is somewhat expensive and racks up our amazon S3 usage.

Suggested Solution

Maintain a lastSyncDate variable or similar to track when the last sync was run. Use the amazon API to filter files by create time or last modified time.

Migrate to a legitimate database

Google drive spreadsheets isn't much of a data store. Migrate to using real data storage. Perhaps heroku postgres or google cloud storage.

Remove bootstrap header

The majority of traffic is coming from mobile devices. The header takes up a sizable portion of a small resolution screen and provides zero value - remove the header.

Support ID3v1 MP3 tags

Problem

Phoenix is storing their MP3 sermons in ID3v1 format, which puts metadata at the very end of the file. Processing is currently relying on metadata to be at the very start of a file.

Possible Solution

If the first part of a MP3 file does not contain a valid MPEG frame, download the last 128 bytes to check for metadata.

See http://id3.org/FAQ

Delete old sync execution records

The free usage tier of postgres limits us to a set number of database rows. As the sync execution audit log is not that useful, only maintain a single record in the table.

AWS links need to be be region agnostic

fileUrl: "https://s3-us-west-2.amazonaws.com/${bucket}/${sermon.file}"
current has a hard coded amazon S3 url pointing to s3-us-west-2 This should be changed to s3.amazonaws.com

<Error>
<Code>PermanentRedirect</Code>
<Message>
The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
</Message>
<Bucket>phoenix-archives</Bucket>
<Endpoint>s3.amazonaws.com</Endpoint>
<RequestId>93C8E90247E629B7</RequestId>
<HostId>
ivU1FTm2aEDtMGIrum8Vie2VR9O3BV+Chb9d7iu41z6LDPejxSzexa30Gu1rIhAF1HShXWNSyfQ=
</HostId>
</Error>

This will require running a query to update all existing MP3 file URLs in the database

Create a REST API to expose sermons

We need to be able to support multiple congregations. Perhaps end points such as

GET    /v1/{congregation_id}/sermons
GET    /v1/{congregation_id}/sermons/{sermon_id}
GET    /v1/congregations
GET    /v1/congregations/{congregation_id}



Allow custom mp3 tag mappings

Rockford uses the convention

mp3 tag field parsed as
title date + time
album bible text
artist minister

other congregations may not use the same mp3 tag conventions.

Update sermon "comment" domain field

Sermons have an event field displayed in the ui. The domain backing this calls the field "comment". Update the domain model with consistent naming.

Add heroku recharging page

Currently we are using the heroku free usage tier, which does not allow an app to run for more than 18 hours in a 24 hour period.

Create a custom html page for errors and maintenance that heroku will serve up in the event that our quota is hit and the app is offline

It would be useful to include the Google analytics hooks on this page as well, so we can track how often people are unable to view sermons due to downtime.

https://devcenter.heroku.com/articles/error-pages#customize-pages

Create a RemoteMp3DiscoveryService

RemoteMp3DiscoveryService

Create a RemoteMp3DiscoveryService strategy implementation that uses the amazon s3 java API to iterate all existing files uploaded. Download only the ID3 tag of each file and extract Mp3 data from each file.

architecture

asdfdas 1

benefits

Mp3 meta data synchronization as a service. Congregations would not need to run the spreadsheet-updater locally on their webcast computer. Synchronizing of MP3 data would happen via a quartz/cron job.

ugly example

        InputStream is = new URL(sermon).openStream()
        def size = 1024
        byte[] buf = new byte[size];
        is.read(buf, 0, size);

        File targetFile = File.createTempFile("temp", ".mp3");
        OutputStream outStream = new FileOutputStream(targetFile);
        outStream.write(buf);

        def mp3file = new Mp3File(targetFile.absolutePath);
        def id3v1Tag = mp3file.hasId3v1Tag() ? mp3file.id3v1Tag : mp3file.id3v2Tag
        def sermon = mp3DiscoveryService.extractId3v1TagData(targetFile, id3v1Tag)

        sermon.minister == 'John Doe'
        sermon.bibletext == '1 Kings 19:9-18'
        sermon.date == '04/12/2015'
        sermon.time == '18:30'
        sermon.notes == ''

Dates are not displaying correctly

Dates seem to be displayed as one day behind the actual date.

For example the file: 2015/20150830_RNikula.mp3

has a date of 08/29/2015 19:00

screen shot 2015-09-05 at 9 06 13 pm

MP3 metadata API enhancements

Allow query parameters to specify various filters to select which sermons to refresh. This would allow one to sync specific files.

e.g.

Time Range

  • fromDate=02/03/2015&toDate=02/10/2015

Congregation

  • congregation=rllc

email status reports

As a webcast admin, it would be useful to get status emails with archive changes

Add podcast feed for sermons.

Several peopl mentioned it would be nice to have a podcast feed of the sermons. I assume it would feed from each congregations page on heroku.

MP3 metadata performance enhancements

It chews up a fair amount of RAM and CPU to download and process all files in one shot. Refactor processing to process files one at a time.

Nice To Have : ability to throttle processing at a threshold.

Dates parsing incorrectly

Dates are not being parsed properly in all cases.

Date File Name
0014-03-16 00:00:00 2014/20140316_SRoiko.mp3
0014-03-23 00:00:00 2014/20140323_JHaapsaari.mp3
0014-04-27 00:00:00 /2014/20140427_NMuhonen.mp3
0014-04-27 00:00:00 /2014/20141027_RNevala.mp3
0014-05-04 00:00:00 /2014/20140504_JHaapsaari.mp3
0014-06-20 00:00:00 /2014/20140620_JHaapsaari.mp3
0014-08-10 00:00:00 /2014/20140810_RNevala.mp3
0014-09-14 00:00:00 /2014/20140914_CKumpula.mp3
0014-11-23 00:00:00 /2014/20141123_JHaapsaari.mp3
0015-05-03 00:00:00 /2015/20150503_NMuhonen.mp3
0015-07-12 00:00:00 /2015/20150712_CKumpula.mp3
0015-08-02 00:00:00 /2015/20150802_JLehtola.mp3

Setup UI build structure

  • determine which framework to use
    ** angular
    ** ember
    ** ???
  • scaffold out initial project structure
  • setup grunt/gulp

Support Comment tag

Populate Notes from ID3v1 tag Comments field

The google spreadsheet has a Notes column. Map the ID3V1 comment field to the Notes column.

Bonus Points

The use case for the Notes column is displaying church calendar events to make specific sermons easier to locate. Examples of Notes data:

  • Mary's Day Services
  • Good Friday
  • New Years Day Service

If the church calendar could be parsed, the Notes field could be auto populated with the church calendar event by matching the date of the recording with event dates.

Revamp minister parsing

A few issues have been surfaced regarding minister name parsing

Parsing from MP3 Metadata

  • incorrect name conversion (Art Simon -> Martin Simonson instead of Arthur Simonson) as Martin is closer to Art than Arthur is.

Falling back to filename

If a mp3 file does not have a minister in the artist tag, an attempt is made to parse the minister out of the filename. Rockford assumes a file name in the format YYYYMMDD_JHaapsaari.mp3. When this happens, the most similar minister is being picked, regardless if the decision is indeterministic. For example a filename of 2014/0629_RSorvala.mp3 could be either from Rodrick Sorvala or Rory Sorvala. In a case like this, the minister should be left blank.

Delete minister database

Adrian has provided a revised list of ministers. Drop the existing table and import the new list.

Full Congregation Name On Tab

Currently when a congregation is selected the name is the abbreviation, e.g. PLLC, it should be the full congregation name, e.g. Phoenix since the title already indicated LLC Archived Sermons.

Add favicon

The spring boot fav icon is not ideal. Replace with something church related.

Remove minister name autocorrect

autocorrecting names is adding a lot of complexity to the application and is choosing incorrect names in some cases.

Remove auto correction for the time being.

Documentation for setting up amazon s3 sync

Amazon S3 Documentation

Each congregation will be responsible for uploading their archived sermons to amazon s3 buckets.
Provide documentation on

  • downloading
  • installing
  • configuring
    the amazon s3 command line tool.

Also provide instruction on scheduling this command to run periodically. Consider both Windows + Linux environments.

  • windows environment detailed
  • linux environment detailed

Auto-Correct minister name

Situation

When webcasters are exporting audio, they hand type the minister's name. This is error prone, leading to misspelled names being displayed on the public facing archives. The spreadsheet-updater is downstream from the actual MP3 creation, as such it has no control over source data.

Proposed Solution

When parsing minister name from the MP3 tag, compare it to a master list of ministers; pick the minister's name that is the most similar. There are various algorithms for finding string similarity. See Grails' implementation of CosineSimilarity for an example.

Bonus Points

Extra nice if the master list is maintainable by LLC (perhaps another google spreadsheet?)

Google Analytics Tracking

It would be nice to track user events to know what content is being interacted with the most.

for example:

  • what do people sort sermons by most frequently
  • what search terms are people using
  • download count of sermons

Allow multiple congregations to be configured

In preparation for #9 , configuration for multiple congregations will need to be supported.
Perhaps this could be as simple as a properties file like:

llc.rockford.shortName=RLLC
llc.rockford.longName=Rockford Laestadian Lutheran Church
llc.rockford.aws.username=rllc-read
llc.rockford.aws.bucket=https://s3-us-west-2.amazonaws.com/
llc.rockford.aws.key.id=<aws-key>
llc.rockford.aws.key.token=<aws-token>
llc.rockford.google.username=<google-username>
llc.rockford.google.password=<google-password>
llc.rockford.google.spreadsheet=RLLC Archived Sermons
llc.rockford.google.worksheet=Sheet1

llc.minneapolis.shortName=MLLC
llc.minneapolis.longName=Minneapolis Laestadian Lutheran Church
llc.minneapolis.aws.username=mllc-read
...

scheduled aws scan not picking up older files, not deleting removed files

Description

  • If a file is uploaded to AWS that has a lastModified timestamp that is older than the lastExecution time of the scheduled scan, the file is not picked up.
  • If a file is removed from AWS, it is not removed from the database

The remote (aws s3) directory should be compared to the locally synced database content and examined for differences.

  • If the file exists on S3 but not in our database, process it
  • if the file does not exist in S3 but is in our database, delete it

Use tag for minister if not found in database

If the minister is not found in the database it is left blank. This happens for ministers from Finland who are not in the database.

existingSermon.minister = minister ? minister.naturalName : ''

rename application package

Description

If this is to be a LLC-wide application, we should rename the application from com.rllc.spreadsheet to something more appropriate.

Suggestions

  • org.llc.spreadsheet
  • org.llc.archive
  • org.llc.webcast

Expose ids for Sermons and Congregations

The Problem

Using the REST API for traversing sermons and congregations is a bit treacherous without exposing IDs. When sermons are sorted a certain way and an index is used to pick the sermon, the rendered sermon may be incorrect.

The solution

Expose id values for Sermon and Congregation objects

sync amazon s3 bucket

Purpose

currently the amazon s3 command line tool is being used to sync the local directory with amazon s3 bucket. While this is very nice as it uses a prebuilt command line tool, it requires 2 applications to run. Consolidate syncing into some kind of awsSyncService inside of spreadsheet-updater. Perhaps it could even be a wrapper around the s3 process?

May be OBE

If #9 is realized, the s3 command line tool will be the only tooling needed on congregational computers.

Create logo

Add logo to web app.

Currently the favicon is the spring boot icon. Update to whatever logo is created.

Auto-Correct bible text

Situation

When webcasters are exporting audio, they hand type the bible text. This is error prone, leading to misspelled bible text and inconsistent abbreviations being displayed on the public facing archives. The spreadsheet-updater is downstream from the actual MP3 creation, as such it has no control over source data.

Proposed Solution

When parsing bible text from the MP3 tag, compare it to a master list of bible text; pick the bible text that is the most similar. There are various algorithms for finding string similarity. See Grails' implementation of CosineSimilarity for an example.

Bonus Points

Extra nice if the master list is maintainable by LLC

Concerns

This is more complicated than minister names, as abbreviations are in play. The ideal solution would create a mapping of all expected variants of books to their preferred convention then use the mapping to resolve the true value.

Example Mapping

[
    'Matthew' : 'Matt.',
    'St Matthew' : 'Matt.',
    'Matt' : 'Matt.',
    'Ezekiel' : 'Ezek.',
    'Ezek' : 'Ezek.',
    'Ez' : 'Ezek.'   
]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.