onetsp / recipeparser Goto Github PK
View Code? Open in Web Editor NEWA PHP library for parsing structured recipe data from HTML files.
Home Page: https://onetsp.com/
License: MIT License
A PHP library for parsing structured recipe data from HTML files.
Home Page: https://onetsp.com/
License: MIT License
Haven't needed this yet. But it feels incomplete. Should look for an existing implementation before writing this.
User reported issue:
Recently, I noticed that for recipes that I clip from foodnetwork.com, I end up with a picture of the chef rather than a picture of the recipe. Also, in the past I have noticed that even though the bookmarklet would extract the picture for most recipes, it wouldn't extract it for recipes that have an embedded video (inspecting the element gives me a link to the recpie photo and if I just replace the _med.jpg with _lg.jpg, I can get a better quality image). For example, for the following recipe: http://www.foodnetwork.com/recipes/giada-de-laurentiis/pork-chops-stuffed-with-sun-dried-tomatoes-and-spinach-recipe/index.html, the larger image is at
http://img.foodnetwork.com/FOOD/2008/01/07/EI1103_Pork_Chops_lg.jpg
Hello, big thanks for this library!
I`ve tried to add your functionality to a wordpress plugin, but got an error when have an old php installed on hosting.
Our plugin works fine in php 5.3.3 but have an error on php 5.2.17.
Issue can be found in class_recipe_parser.php on line #110.
Thanks again!
Hey Mike!
First off - thanks so much for open sourcing this awesome library! I'm a dev at relayfoods.com and we are attempting to use it for parsing recipes via a bookmarklet (or pasting the url) to convert a recipe into a shopping list from our product catalog. You can see the service in action here: http://recipe-parser.herokuapp.com/?url=http://allrecipes.com/Recipe/Juicy-Roasted-Chicken/Detail.aspx?evt19=1&referringHubId=662
To get the library running as a service, we took your library and forked it in an attempt to add composer support and PSR-4 formatting for easier consumption by Laravel. The one big downside is that keeping everything in sync is going to be prohibitively difficult.
I wanted to reach out and see if a) you have any thoughts on a better way to leverage your library as-is to be used as a microservice that converts recipe urls into JSON output - or - b) have any desire to add composer / PSR-4 support to your library?
Thanks again!
One other issue we saw today while debugging an issue with Food52.com was that "scheme"-less srcs on images, eg. //images.food52.com/iCTCn3NaUPeL90-Pn1y2lth7ETk=/753x502/ee206bb7-f686-40cc-a3d2-2f2049662d58--DSC_1615.jpg
where getting converted to http://food52.com//images.food52.com/iCTCn3NaUPeL90-Pn1y2lth7ETk=/753x502/ee206bb7-f686-40cc-a3d2-2f2049662d58--DSC_1615.jpg
I made a quick patch to our parsing service that seems to do the trick, but I'm not 100% confident in it. Thought I'd share incase you want to include the patch:
public static function relativeToAbsolute($rel, $base) {
// return if already absolute URL
if (parse_url($rel, PHP_URL_HOST) != '') {
// add a default scheme if not present
// eg. //images.food52.com/iCTCn3NaUPeL90-Pn1y2lth7ETk=/753x502/ee206bb7-f686-40cc-a3d2-2f2049662d58--DSC_1615.jpg
if (parse_url($rel, PHP_URL_SCHEME) == '') {
$rel = "http:$rel";
}
return $rel;
}
Something like:
RecipeParser_Recipe::getArray()
RecipeParser_Recipe::getJson()
Hello,
First of all, I have to congratulate you for the library, it is really awesome!
I found an issue today with some recipes.
I try to parse this webpage: http://www.foodista.com/recipe/H5M86RVB/breakfast-casseroles
The preparation instructions are:
Grease a 9 x 13 inch casserole dish. Line dish with unbaked crescent rolls. Spread cooked meat evenly over rolls. Pour beaten eggs over meat. Place cheese over layer of eggs. Bake at 350 degrees for 45 minutes or until firm. Cool 10 minutes before cutting into squares and serve.
But the library returns in the 'Instructions' response:
[0] => dish.
[1] => dish
[2] => rolls.
[3] => Bake
[4] => Cool
[5] => cutting
[6] => serve
which corresponds to words from content that have links attached to them.
Thank you for your time checking on this.
Regards,
Szabi.
RecipeParser_Text:: iso8601ToMinutes() is homegrown and works for our use, but there's probably a better implementation to use out there that will support additional time formats.
Hey Mike,
We're seeing a parse failure when a recipe url contains a space or %20 in the query params - eg. https://recipe-parser.relayfoods.com/?url=http://m.allrecipes.com/recipe/44868/spicy-garlic-lime-chicken/?internalSource=staff%20picks
You can see the same url working fine with the space removed: https://recipe-parser.relayfoods.com/?url=http://m.allrecipes.com/recipe/44868/spicy-garlic-lime-chicken/?internalSource=staff
Any thoughts on where in the parse stack that might be failing on how to fix?
Thanks!
Hello,
The library crashes if mbstr is not available on server.
I recommend adding this code for each use of mb_convert_encoding functions:
if(function_exists('mb_convert_encoding')
{...}
Sorry for asking: is the project dead ?
Hi Mike!
We've been noticing quite a few problems when trying to parse pages that contain HTML5 tags. Seems that libxml2, which PHP's DomDocument uses under the hood, only has support for up to HTML4. For example parsing fails on http://bsugarmama.com/3-layer-french-vanilla-pudding-cake-chocolate-fudge-frosting/
See the following bug reports:
https://www.drupal.org/node/1333730
https://bugs.php.net/bug.php?id=60021
We've been successfully experimenting with Mastermind's HTML5-PHP parser:
https://github.com/Masterminds/html5-php
http://engineeredweb.com/blog/2013/introducing-html5-parser-serializer-php/
Just wondering if you've run into this issue before and if you think it'd be worth to switch the parser for OneTSP?
Doesn't return ingredient list, we'll work on a parser for this soon.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.