Giter Site home page Giter Site logo

recipeparser's People

Contributors

adamdonahue avatar chiplay avatar mikebrittain avatar oldgarebear avatar onetsp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recipeparser's Issues

FoodNetwork parsing issues with photos

User reported issue:

Recently, I noticed that for recipes that I clip from foodnetwork.com, I end up with a picture of the chef rather than a picture of the recipe. Also, in the past I have noticed that even though the bookmarklet would extract the picture for most recipes, it wouldn't extract it for recipes that have an embedded video (inspecting the element gives me a link to the recpie photo and if I just replace the _med.jpg with _lg.jpg, I can get a better quality image). For example, for the following recipe: http://www.foodnetwork.com/recipes/giada-de-laurentiis/pork-chops-stuffed-with-sun-dried-tomatoes-and-spinach-recipe/index.html, the larger image is at
http://img.foodnetwork.com/FOOD/2008/01/07/EI1103_Pork_Chops_lg.jpg

old php version issue

Hello, big thanks for this library!
I`ve tried to add your functionality to a wordpress plugin, but got an error when have an old php installed on hosting.
Our plugin works fine in php 5.3.3 but have an error on php 5.2.17.
Issue can be found in class_recipe_parser.php on line #110.

Thanks again!

Use with Composer / PSR-4

Hey Mike!

First off - thanks so much for open sourcing this awesome library! I'm a dev at relayfoods.com and we are attempting to use it for parsing recipes via a bookmarklet (or pasting the url) to convert a recipe into a shopping list from our product catalog. You can see the service in action here: http://recipe-parser.herokuapp.com/?url=http://allrecipes.com/Recipe/Juicy-Roasted-Chicken/Detail.aspx?evt19=1&referringHubId=662

To get the library running as a service, we took your library and forked it in an attempt to add composer support and PSR-4 formatting for easier consumption by Laravel. The one big downside is that keeping everything in sync is going to be prohibitively difficult.

I wanted to reach out and see if a) you have any thoughts on a better way to leverage your library as-is to be used as a microservice that converts recipe urls into JSON output - or - b) have any desire to add composer / PSR-4 support to your library?

Thanks again!

"Scheme"-less images failing relativeToAbsolute() conversion

One other issue we saw today while debugging an issue with Food52.com was that "scheme"-less srcs on images, eg. //images.food52.com/iCTCn3NaUPeL90-Pn1y2lth7ETk=/753x502/ee206bb7-f686-40cc-a3d2-2f2049662d58--DSC_1615.jpg where getting converted to http://food52.com//images.food52.com/iCTCn3NaUPeL90-Pn1y2lth7ETk=/753x502/ee206bb7-f686-40cc-a3d2-2f2049662d58--DSC_1615.jpg

I made a quick patch to our parsing service that seems to do the trick, but I'm not 100% confident in it. Thought I'd share incase you want to include the patch:

    public static function relativeToAbsolute($rel, $base) {
        // return if already absolute URL
        if (parse_url($rel, PHP_URL_HOST) != '') {

            // add a default scheme if not present
            // eg. //images.food52.com/iCTCn3NaUPeL90-Pn1y2lth7ETk=/753x502/ee206bb7-f686-40cc-a3d2-2f2049662d58--DSC_1615.jpg
            if (parse_url($rel, PHP_URL_SCHEME) == '') {
                $rel = "http:$rel";
            }
            return $rel;
        }

Resulting recipe ingredients incomplete - importing only words from <a> links

Hello,

First of all, I have to congratulate you for the library, it is really awesome!
I found an issue today with some recipes.

I try to parse this webpage: http://www.foodista.com/recipe/H5M86RVB/breakfast-casseroles

The preparation instructions are:

Grease a 9 x 13 inch casserole dish. Line dish with unbaked crescent rolls. Spread cooked meat evenly over rolls. Pour beaten eggs over meat. Place cheese over layer of eggs. Bake at 350 degrees for 45 minutes or until firm. Cool 10 minutes before cutting into squares and serve.

But the library returns in the 'Instructions' response:

                        [0] => dish.
                        [1] => dish
                        [2] => rolls.
                        [3] => Bake
                        [4] => Cool
                        [5] => cutting
                        [6] => serve

which corresponds to words from content that have links attached to them.

Thank you for your time checking on this.

Regards,
Szabi.

Parse failes when url contains space or %20

Hey Mike,

We're seeing a parse failure when a recipe url contains a space or %20 in the query params - eg. https://recipe-parser.relayfoods.com/?url=http://m.allrecipes.com/recipe/44868/spicy-garlic-lime-chicken/?internalSource=staff%20picks

You can see the same url working fine with the space removed: https://recipe-parser.relayfoods.com/?url=http://m.allrecipes.com/recipe/44868/spicy-garlic-lime-chicken/?internalSource=staff

Any thoughts on where in the parse stack that might be failing on how to fix?

Thanks!

Crash if mbstr not installed on server

Hello,

The library crashes if mbstr is not available on server.

I recommend adding this code for each use of mb_convert_encoding functions:

if(function_exists('mb_convert_encoding')
{...}

DomDocument / libxml2 failing to parse HTML5

Hi Mike!

We've been noticing quite a few problems when trying to parse pages that contain HTML5 tags. Seems that libxml2, which PHP's DomDocument uses under the hood, only has support for up to HTML4. For example parsing fails on http://bsugarmama.com/3-layer-french-vanilla-pudding-cake-chocolate-fudge-frosting/

See the following bug reports:
https://www.drupal.org/node/1333730
https://bugs.php.net/bug.php?id=60021

We've been successfully experimenting with Mastermind's HTML5-PHP parser:
https://github.com/Masterminds/html5-php
http://engineeredweb.com/blog/2013/introducing-html5-parser-serializer-php/

Just wondering if you've run into this issue before and if you think it'd be worth to switch the parser for OneTSP?

  • Emil

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.