Comments (14)
Actually, it might be helpful to clarify here (or via quick video/chat meeting this week?) what we mean by V1. I think we should shoot first for something that merely replaces the current Google Sheets workflow but nothing beyond that:
- Organizers should need to do nothing to set things up for analysts (e.g. run the scraper, upload CSVs to Google Docs)
- Analysts can go some site to see the current diffs they need to look at for the current N-day period.
- Analysts can click a link to view diffs in Versionista
- Analysts can fill in a form to annotate a change
- ? (I’m sure I’m missing exactly how we’d want to handle the dictionary stuff, seeing what analysts had marked as significant/insignificant, etc.)
That’s simple and concrete. I think we should be able to achieve it by May 1st… or it’s a sign we probably aren’t coordinating well.
Once we can do that, then we can move on to all the more complex things we want (e.g. PageFreezer, Internet Archiver, fancier/smarter diffs, automated filtering services, etc etc etc). Is the above is v0 or v1 or v-uh-sorry-we're-not-interested-in-that?
from web-monitoring.
Good luck with the boxes, @lightandluck.
from web-monitoring.
OK! Per conversation from Saturday, we’re going to try and break this up into two items:
-
v.0: A simple implementation that can functionally accomplish what’s done in Google Docs now. This is primarily for evaluation—if it works well enough, we may want to shift analysts directly to it, but that is not its main goal. This is pretty close to the set of requirements listed above:
- Data (including raw HTML content) is automatically and continually scraped out of Versionista
- Data can be queried by site (to maintain current process for how work is split up by analyst)
- Analysts can view all versions/diffs over an N-day period (I think, at this point, it’s still OK to link them back to Versionista for diff viewing. That should be done away with by 1.0)
- Analysts can fill in a form to annotate a change
- Annotations can be queries by significance
- This does NOT include:
- Custom diffing separate from Versionista
- Tagging
- Dictionary (we’ll leave that part of the workflow in Google Docs, though it could now be a link to this system instead of a full line from a spreadsheet)
-
v.1: A fully deployed implementation across all projects that can absolutely replace the current Google Docs workflow.
- Showing our own diffs
- Flagging changes for the dictionary
- Probably a much nicer UI
- Maybe also includes…
- Tagging (of pages and maybe of changes [we might be able to do change tagging through annotations…?]; only some people have permissions to make new tags so we can keep a controlled vocabulary). See also #30.
- Sources other than Versionista
Shooting for v0 by the end of April and v1 by the end of May.
I’ve made a v0
and a v1
milestone for web-monitoring-db
(but not properly sorted and tagged all issues yet). We should probably do the same for all the web-monitoring*
projects.
More thoughts/feedback/amendments welcome if this doesn’t seem complete or doesn’t quite jive with everyone.
/cc @lightandluck
from web-monitoring.
Thanks for the write-up and keeping me in the loop! I'm moved in but now comes the unpacking and organizing phase. I'll be able to continue contributing by next week. I'll try to catch myself up in the meanwhile.
from web-monitoring.
Version 0 updated task list, with effort estimates:
- Finish set of essential diffing tools (side by side HTML with changes highlighted, side by side source diff, side by side text with changes highlighted) -- 1 week at current pace
- Finish react front-end of diffing tools -- 2 weeks of work (edgi-govdata-archiving/web-monitoring-ui#82)
- Finish user login -- 3 days of work (edgi-govdata-archiving/web-monitoring-ui#84)
- Finish very basic tasking -- 2 days of work (edgi-govdata-archiving/web-monitoring-ui#79)
- Separate tasked and browse/search views (edgi-govdata-archiving/web-monitoring-ui#96)
- Integrate significant annotations with existing Google spreadsheets -- 1 week of work (edgi-govdata-archiving/web-monitoring-ui#94)
- Make annotations far more comprehensible -- 1 week of work (edgi-govdata-archiving/web-monitoring-ui#92)
These times are not hours of on-the-job effort but actual calendar time, given our likely availability for volunteer development work
from web-monitoring.
👍 I have added a corresponding milestone in the DB project and started to tag PRs and issues.
from web-monitoring.
The purpose of the next web-monitoring dev call should be to nail down the scope and ensure that we all understand it consistently. We can either co-opt the main #dev call for this purpose or hold a continuing call after.
from web-monitoring.
@trinberg @ambergman @dcwalk Would love any quick (or not) input you have here that we can think about before Saturday’s call.
from web-monitoring.
Likely don't have time to add thoughts before the townhall today (sorry!), but I'll try to be available for ~30mins at 6pm for your call and chime in.
from web-monitoring.
Thanks for writing this up, @Mr0grog. This matches my current understanding.
from web-monitoring.
Yeah, good luck with the boxes :))
from web-monitoring.
All the plans we variously enumerated here have gone out the window multiple times over. Should we close this issue or do we need to do a better job keeping it updated?
from web-monitoring.
We have finer granularity than "v0" now. I think we should close this issue and set up more fine-grained milestones.
from web-monitoring.
Closing in favor of #75.
from web-monitoring.
Related Issues (20)
- How to use this tool HOT 5
- Add PFAS pages HOT 3
- Internal Server Error HOT 3
- no Scanner output last week (3/31) HOT 1
- Adding EPA Unconventional Oil and Gas pages HOT 2
- Not receiving Scanner output emails HOT 2
- Adding Georgia DPH COVID pages HOT 7
- Adding ACE pages and some DOI pages HOT 5
- Adding EPA Guidance pages HOT 5
- Tracking/analyzing state voting page changes HOT 2
- Roadmap for the rest of 2020 HOT 6
- Pages to Watch during Trump transition HOT 3
- Something funny with the priority calculation? HOT 2
- Reduce how much data UI needs to load on pages with many versions HOT 6
- Spider `https://www.epa.gov/scientific-integrity` HOT 4
- Evaluate new URLs for pages that redirect HOT 1
- Some archived response data is not in EDGI-owned S3 buckets HOT 1
- repo link HOT 3
- Put this project to rest
- Shut Down & Archive Web Monitoring Projects HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from web-monitoring.