Giter Site home page Giter Site logo

application/ld+json about metascraper HOT 8 CLOSED

microlinkhq avatar microlinkhq commented on August 31, 2024
application/ld+json

from metascraper.

Comments (8)

tbell511 avatar tbell511 commented on August 31, 2024 1

That example has all relevant info. All I was saying is I think application/ld+json would be a great fallback to add to the modules. That being said metascraper already does a really good job!

from metascraper.

Kikobeats avatar Kikobeats commented on August 31, 2024

Hey,

using the microlink.io API endpoint:

$ curl -sL https://api.microlink.io/?url=https://www.theverge.com/2017/11/16/16667366/tesla-semi-truck-announced-price-release-date-electric-self-driving

This is the information extracted:

{
  "status": "success",
  "data": {
    "lang": "en",
    "author": "Zac Estrada",
    "title": "This is the Tesla Semi truck",
    "publisher": "The Verge",
    "image": {
      "width": 1200,
      "height": 628,
      "type": "jpg",
      "url": "https://cdn.vox-cdn.com/thumbor/iDt70XEkiWUnA-NhWSnRio8HoHg=/0x75:3840x2085/fit-in/1200x630/cdn.vox-cdn.com/uploads/chorus_asset/file/9699573/Semi_Front_Profile.jpg"
    },
    "description": "500 miles of range and more aerodynamic than a supercar",
    "date": "2017-11-17T04:47:07.000Z",
    "logo": {
      "width": 192,
      "height": 192,
      "type": "png",
      "url": "https://cdn.vox-cdn.com/uploads/chorus_asset/file/7395351/android-chrome-192x192.0.png"
    },
    "url": "https://www.theverge.com/2017/11/16/16667366/tesla-semi-truck-announced-price-release-date-electric-self-driving"
  }
}

Can you specify what information here is missing that you expect to be present?

from metascraper.

plaa avatar plaa commented on August 31, 2024

Any reason this is closed? It seems to me that application/ld+json should be exactly the information that metascraper should return, and fallback to other scraping if that is missing. It's included in the majority of news sites and includes exactly the information metascraper provides.

from metascraper.

Kikobeats avatar Kikobeats commented on August 31, 2024

@plaa Can you point what ld+json can be mapped to metascraper rules that currently are missing?

from metascraper.

plaa avatar plaa commented on August 31, 2024

Based on the messages I understood that metascraper does not read the <script type="application/ld+json"> contents. Is it used?

from metascraper.

Kikobeats avatar Kikobeats commented on August 31, 2024

but that don't clarify my question. What information present on a ld+json script you feel is missing right now using the current implemented rules?

from metascraper.

plaa avatar plaa commented on August 31, 2024

Metadata can be present in several places. Based on what I've googled (admittedly very little), ld+json is one of the best standards for this information, and so far every news site I've checked implements it. If metadata is present in ld+json but not elsewhere, will metascraper find it?

from metascraper.

plaa avatar plaa commented on August 31, 2024

An example of a mainstream news site which contains ld+json information but metascraper is unable to find a correct publish date:
https://www.engadget.com/2015/11/28/raspberry-pi-eric-schmidt/

  <script type="application/ld+json">
    {
      "@context": "https://schema.org",
      "@type": "Article",
      "url": "https://www.engadget.com/2015/11/28/raspberry-pi-eric-schmidt/",
      "author": [
        {
          "@type": "Person",
          "url": "https://www.engadget.com/about/editors/dana-wollman",
          "name": "Dana Wollman"
        }
      ],
      "headline": "You partly have Eric Schmidt to thank for the new $5 Raspberry Pi",
      "datePublished": "2015-11-28 05:36:00.000000",
      "mainEntityOfPage": "True",
      "thumbnailUrl": "https://s.aolcdn.com/hss/storage/midas/5c7b7548394b4467c3c0b2e8fca69e5a/203049438/raspberrypi.jpg",
      "image": [
        {
          "@type": "ImageObject",
          "url": "https://o.aolcdn.com/images/dims?thumbnail=640%2C480&amp;quality=80&amp;image_uri=https%3A%2F%2Fs.aolcdn.com%2Fhss%2Fstorage%2Fmidas%2F5c7b7548394b4467c3c0b2e8fca69e5a%2F203049438%2Fraspberrypi.jpg&amp;client=amp-blogside-v2&amp;signature=a5ce244ef09c9bad8e1bf1cdb6d129480b27b26e",
          "width": "640px",
          "height": "480px"
        }
      ],
      "articleBody": "While many of you were supposed to be eating turkey on Thursday, you were instead geeking out over Raspberry PI's newest computer, the Zero: a pint-sized module that costs just $5. But according to a new interview, that $5 computer was originally supposed to cost around $60 -- and you have partly have Google's Eric Schmidt to thank for that reduced price. In an interview with the Wall Street Journal, Raspberry Pi Foundation founder Eben Upton admitted that the follow-up to the original $35 Pi was originally going to be a more powerful model, whose higher-performing internals would have put the price somewhere between $50 and $60. But then in 2013 Upton had the chance to meet Google chairman Eric Schmidt, whose company had recently awarded a $1 million grant to Raspberry Pi. Schmidt wanted to know what the foundation was up to next. Upton told him. Schmidt was apparently not impressed. \"He said it was very hard to compete with cheap,\" Upton told the Journal. \"He made a very compelling case. It was a life-changing conversation.\"Indeed. Following that heart-to-heart with Schmidt, Upton says he abandoned his plans for the more expensive Pi, which led him instead on the path to the $5 system-on-a-chip we have today. To be sure, it won't be as powerful as the one Upton originally dreamed up, but for many users it will still be enough: Even with a low-end Broadcom BCM2835 processor and just 512MB of RAM, it still promises to be 40 percent faster than the original Pi.[Image credit: Raspberry Pi]",
      "articleSection": "Uncategorized",
      "keywords": [],
      "publisher": [
        {
          "@type": "Organization",
          "name": "Engadget",
          "url": "https://www.engadget.com",
          "logo": [
            {
              "@type": "ImageObject",
              "url": "https://www.engadget.com/assets/images/eng-e-128.png",
              "width": "128px",
              "height": "128px"
            }
          ]
        }
      ],
      "dateModified": "2016-07-14 22:44:05.000000"
    }
  </script>
 ```

from metascraper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.