Giter Site home page Giter Site logo

bnet_scraper's Introduction

BnetScraper Build Status Code Climate

BnetScraper is a Nokogiri-based scraper of Battle.net profile information. Currently this only includes Starcraft2.

Installation

Run gem install bnet_scraper or add gem 'bnet_scraper' to your Gemfile.

Usage

Say you have the URL of a Battle.net account you would like to scrape. To begin, create an instance of BnetScraper::Starcraft2::ProfileScraper, passing it the URL. Calling the #scrape method returns a new BnetScraper::Starcraft2::Profile object with the basic information.

scraper = BnetScraper::Starcraft2::ProfileScraper.new(url: 'http://us.battle.net/sc2/en/profile/2377239/1/Demon/')
profile = scraper.scrape

profile.class.name # => BnetScraper::Starcraft2::Profile
profile.achievement_points # => 3760
profile.account # => 'Demon'

Once you have a BnetScraper::Starcraft2::Profile object, you can easily access other information for scraping thanks to syntactic sugar. This includes leagues, achievements, and match history.

scraper = BnetScraper::Starcraft2::ProfileScraper.new(url: 'http://us.battle.net/sc2/en/profile/2377239/1/Demon/')
profile = scraper.scrape
profile.recent_achievements # Scrapes achievement information, returns array of achievements
profile.match_history # Scrapes recent match history, returns array of matches
profile.match_history[0].class.name # => BnetScraper::Starcraft2::Match

profile.leagues[0].class.name # => BnetScraper::Starcraft2::League
profile.leagues[0].division # Scrapes the 1st league's information page for rank, points, etc

Full Scrape

Interested in grabbing everything about a profile eagerly? You're in luck, because there's a method just for you. Call BnetScraper::Starcraft2#full_profile_scrape with the usual options hash that ProfileScraper would take, and it will eager-load the achievements, matches, and leagues.

profile = BnetScraper::Starcraft2.full_profile_scrape(url: 'http://us.battle.net/sc2/en/profile/2377239/1/Demon/')
profile.class.name # => 'BnetScraper::Starcraft2::Profile'
profile.leagues.first.name # => 'Changeling Bravo'

Alternatively, these scrapers can be accessed in isolation.

Available Scrapers

There are several scrapers that pull various information. They are:

  • BnetScraper::Starcraft2::ProfileScraper - collects basic profile information and an array of league URLs
  • BnetScraper::Starcraft2::LeagueScraper - collects data on a particular league for a particular Battle.net account
  • BnetScraper::Starcraft2::AchievementScraper - collects achievement data for the account.
  • BnetScraper::Starcraft2::MatchHistoryScraper - collects the 25 most recent matches played on the account

All of the scrapers take an options hash, and can be created by either passing a URL string for the profile URL or passing the account information in the options hash. Thus, either of these two approaches work:

scraper1 = BnetScraper::Starcraft2::ProfileScraper.new(url: 'http://us.battle.net/sc2/en/profile/2377239/1/Demon/')
scraper2 = BnetScraper::Starcraft2::ProfileScraper.new(bnet_id: '2377239', account: 'Demon', region: 'na')

All scrapers have a #scrape method that triggers the scraping and storage. The #scrape method will return an object containing the scraped data result.

BnetScraper::Starcraft2::ProfileScraper

This pulls basic profile information for an account, as well as an array of league URLs. This is a good starting point for league scraping as it provides the league URLs necessary to do supplemental scraping.

scraper = BnetScraper::Starcraft2::ProfileScraper.new(url: 'http://us.battle.net/sc2/en/profile/2377239/1/Demon/')
profile = scraper.scrape
profile.class.name # => BnetScraper::Starcraft2::Profile

Additionally, the resulting BnetScraper::Starcraft2::Profile object has methods to scrape additional information without the need of creating another scraper. For example, if you need to pull league information up on a player, you may call BnetScraper::Starcraft2::Profile#leagues and it will scrape and store the information for memoized access.

scraper = BnetScraper::Starcraft2::ProfileScraper.new(url: 'http://us.battle.net/sc2/en/profile/2377239/1/Demon/')
profile = scraper.scrape
profile.leagues.map(&:division) #=> ['Bronze']

BnetScraper::Starcraft2::LeagueScraper

This pulls information on a specific league for a specific account. It is best used either in conjunction with a profile scrape that profiles a URL, or if you happen to know the specific league_id and can pass it as an option.

scraper = BnetScraper::Starcraft2::LeagueScraper.new(league_id: '12345', account: 'Demon', bnet_id: '2377239')
scraper.scrape

# => #<BnetScraper::Starcraft2::League:0x007f89eab7a680
@account="Demon",
@bnet_id="2377239",
@division="Bronze",
@name="Changeling Bravo",
@random=false,
@season="2013 Season 4",
@size="3v3">

BnetScraper::Starcraft2::AchievementScraper

This pulls achievement information for an account. Note that currently only returns the overall achievements, not the in-depth, by-category achievement information.

scraper = BnetScraper::Starcraft2::AchievementScraper.new(
  url: 'http://us.battle.net/sc2/en/profile/2377239/1/Demon/'
)
achievement_information = scraper.scrape
achievement_information[:recent].size # => 6
achievement_information[:recent].first
# => #<BnetScraper::Starcraft2::Achievement:0x007fef52b0b488
@description="Win 50 Team Unranked or Ranked games as Zerg.",
@earned=#<Date: 2013-04-04 ((2456387j,0s,0n),+0s,2299161j)>,
@title="50 Wins: Team Zerg">

achievement_information[:progress]
# => {:liberty_campaign=>1580,
:swarm_campaign=>1120,
:matchmaking=>1410,
:custom_game=>120,
:arcade=>220,
:exploration=>530}

achievement_information[:showcase].size # => 5
achievement_information[:showcase].first
# => #<BnetScraper::Starcraft2::Achievement:0x007fef52abcb08
@description="Finish a Qualification Round with an undefeated record.",
@title="Hot Shot">

BnetScraper::Starcraft2::MatchHistoryScraper

This pulls the 25 most recent matches played for an account. Note that this is only as up-to-date as battle.net is, and will likely not be as fast as in-game.

scraper = BnetScraper::Starcraft2::MatchHistoryScraper.new(
  url: 'http://us.battle.net/sc2/en/profile/2377239/1/Demon/'
)
matches = scraper.scrape
matches.size # => 25
wins = matches.count { |m| m.outcome == :win } # => 15
losses = matches.count { |m| m.outcome == :loss } # => 10

matches.first
# =>  #<BnetScraper::Starcraft2::Match:0x007fef55113428
@date="5/24/2013",
@map_name="Queen's Nest",
@outcome=:win,
@type="3v3">

BnetScraper::Starcraft2::Status

Scraping is only possible if the site is up. Use this if you want to verify the failed scrape is because the site is down:

BnetScraper::Starcraft2::Status.na # => 'Online'
BnetScraper::Starcraft2::Status.fea # => 'Offline'
BnetScraper::Starcraft2::Status.cn #  => nil (China is unsupported)
BnetScraper::Starcraft2::Status.fetch # => [
  {:region=>"North America", :status=>"Online"},
  {:region=>"Europe", :status=>"Online"},
  {:region=>"Korea", :status=>"Online"},
  {:region=>"South-East Asia", :status=>"Online"}
]

BnetScraper::Starcraft2::GrandmasterScraper

This pulls the list of 200 Grandmasters for a given region. Each player is returned as a hash.

scraper = BnetScraper::Starcraft2::GrandmasterScraper.new(region: :na)
players = scraper.scraper
players.size # => 200
players[0].keys # => [:rank, :name, :race, :points, :wins, :losses]

Contribute!

I would love to see contributions! Please send a pull request with a feature branch containing specs (Chances are excellent I will break it if you do not) and I will take a look. Please do not change the version as I tend to bundle multiple fixes together before releasing a new version anyway.

Author

Written by Andrew Nordman, see LICENSE for details.

bnet_scraper's People

Contributors

cadwallion avatar czarneckid avatar fx avatar keikun17 avatar logankoester avatar michaelklishin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bnet_scraper's Issues

Date format are not handled.

Hello there,
seems to be a nice tool you are doing.
I am trying to use it.

It seems that date format are not handled well
It have errors (see details below) for achievments dates:

When 'us' region page format is : mm/dd/yyyy
(Works well here: http://us.battle.net/sc2/en/profile/2377239/1/Demon/achievements/)

In several others region/locale : dd/mm/yyyy
(Fails here, even if local put to en:
http://eu.battle.net/sc2/en/profile/1229243/1/Stephano/achievements/)

Its more like a bnet website issue... (since 'en' locale is set but not handled)
but still as such one cannot scrap achievements properly.

Thanks a lot!
Cheers

C:/Ruby200/lib/ruby/gems/2.0.0/gems/bnet_scraper-0.6.0/lib/bnet_scraper/starcraf
t2/achievement.rb:20:in new': invalid date (ArgumentError) from C:/Ruby200/lib/ruby/gems/2.0.0/gems/bnet_scraper-0.6.0/lib/bnet_scr aper/starcraft2/achievement.rb:20:inconvert_date'
from C:/Ruby200/lib/ruby/gems/2.0.0/gems/bnet_scraper-0.6.0/lib/bnet_scr
aper/starcraft2/achievement.rb:15:in earned=' from C:/Ruby200/lib/ruby/gems/2.0.0/gems/bnet_scraper-0.6.0/lib/bnet_scr aper/starcraft2/achievement.rb:10:inblock in initialize'
from C:/Ruby200/lib/ruby/gems/2.0.0/gems/bnet_scraper-0.6.0/lib/bnet_scr
aper/starcraft2/achievement.rb:9:in each_key' from C:/Ruby200/lib/ruby/gems/2.0.0/gems/bnet_scraper-0.6.0/lib/bnet_scr aper/starcraft2/achievement.rb:9:ininitialize'
from C:/Ruby200/lib/ruby/gems/2.0.0/gems/bnet_scraper-0.6.0/lib/bnet_scr
aper/starcraft2/achievement_scraper.rb:78:in new' from C:/Ruby200/lib/ruby/gems/2.0.0/gems/bnet_scraper-0.6.0/lib/bnet_scr aper/starcraft2/achievement_scraper.rb:78:inextract_recent_achievement'
from C:/Ruby200/lib/ruby/gems/2.0.0/gems/bnet_scraper-0.6.0/lib/bnet_scr
aper/starcraft2/achievement_scraper.rb:65:in block in scrape_recent' from C:/Ruby200/lib/ruby/gems/2.0.0/gems/bnet_scraper-0.6.0/lib/bnet_scr aper/starcraft2/achievement_scraper.rb:64:intimes'
from C:/Ruby200/lib/ruby/gems/2.0.0/gems/bnet_scraper-0.6.0/lib/bnet_scr
aper/starcraft2/achievement_scraper.rb:64:in scrape_recent' from C:/Ruby200/lib/ruby/gems/2.0.0/gems/bnet_scraper-0.6.0/lib/bnet_scr aper/starcraft2/achievement_scraper.rb:41:inscrape'
from C:/Ruby200/lib/ruby/gems/2.0.0/gems/bnet_scraper-0.6.0/lib/bnet_scr
aper/starcraft2/profile.rb:18:in achievements' from C:/Ruby200/lib/ruby/gems/2.0.0/gems/bnet_scraper-0.6.0/lib/bnet_scr aper/starcraft2/profile.rb:22:inrecent_achievements'
from C:/Users/xxx/Desktop/a.rb:17:in `

'

Refactor Inconsistent API

The API for Profile is very erratic. The scraper holds the data, but makes no attempt to structure the data beyond a basic hash with concentration of data. The API needs to be refactored to consider a Profile, League, etc.

Thoughts on improved API:

ProfileScraper returns a Profile object. Profile has:

  • swarm_levels
  • achievements
  • leagues
  • match_history

Calls to these, unless already retrieved, will scrape the relevant page(s). In addition, give ProfileScraper an option to eager-scrape these subsections. This removes the need to have BnetScraper::Starcraft2.full_scrape and provides a better use-case for the library.

Thread Safety Issues?

I'm getting two sets of values returned for this profile URL (http://us.battle.net/sc2/en/profile/273698/1/Omni/) for career games and games this season.

Set 1 (correct): {career_games: '205', games_this_season: '41'}
Set 2 (incorrect): {career_games: '226', games_this_season: '47'}

This is interesting to me because it's always from these two sets; there's no further variation. I haven't seen this happen in a single-threaded model, and it suddenly appeared once I went multi-thread, so I think that might be the likely culprit. Maybe I'm wrong, though.

Code: http://pastebin.com/Zjx2iVFi

Add Grandmaster scraping

The grandmaster page for each realm is accessible and chock full of information for scraping. The API I'm imagining looks something like an array of Grandmasters that contain the points, wins, rank, and a Profile object with the basic information filled in, but not fully scraped.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.