Giter Site home page Giter Site logo

Comments (14)

kjvarga avatar kjvarga commented on August 15, 2024

Ok so your user links would be like:

karl.xyz.com
karl.xyz.com/profile
karl.xyz.com/posts

or whatever. When generating your sitemap you can actually set the host on a per-link basis, so you could do something like:

SitemapGenerator.create do
  User.find_each do |user|
    add '/', :host => "#{user.username}.xyz.com"
    add user_profile_path(user), :host => "#{user.username}.xyz.com"
    add user_posts_path(user), :host => "#{user.username}.xyz.com"
  end
end

It would be nice to have a :url option so you could just do:
add :url => user_posts_url(user)
Or I could just detect that the URL already has a host and not use the default_host in that case. Then it would just be:
add user_posts_url(user)

Is that what you were looking for?

from sitemap_generator.

hurl avatar hurl commented on August 15, 2024

No, the links would be for subdomains. For example:

subdomain.example.com
company-1.example.com
company-2.example.com

and so on, where "example.com" is the root domain.

So, the host would remain the same. What would be needed is the ability to add the subdomain. Such as:

Subdomain.find_each do |sd|
  SitemapGenerator.create "#{sd}.{host}"
end

I think you would have to create a sitemap for each subdomain, as per Google's guidelines.

from sitemap_generator.

kjvarga avatar kjvarga commented on August 15, 2024

Can you take a look at this thread and see if it helps? #24

From what I understand you just need to create a sitemap for each domain, so you can set your default_host to each full domain e.g. http://subdomain.example.com and add links to it like you would any other sitemap.

from sitemap_generator.

kjvarga avatar kjvarga commented on August 15, 2024

See if something like this works...

%w[en fr ru].each do |domain|
  SitemapGenerator::Sitemap.sitemaps_path = "#{domain}/"
  SitemapGenerator::Sitemap.default_host = "http://www.#{domain}.example.com"
  SitemapGenerator::Sitemap.create do
    add '/whatever'
  end
end

from sitemap_generator.

hurl avatar hurl commented on August 15, 2024

This works, partially. The individual site maps are generated correctly, but the index only lists one of the site maps. BTW, this seems to be the problem that user chamnap had, and it does not seem to have been resolved.

Here is my code:

SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new
SitemapGenerator::Sitemap.include_index = false

Listing.find_each do |listing|
  SitemapGenerator::Sitemap.default_host = "https://#{listing.subdomain}.mysite.com"
  SitemapGenerator::Sitemap create do
    add ''
  end
end

from sitemap_generator.

kjvarga avatar kjvarga commented on August 15, 2024

Ok try this with v2.1.1. There were a couple issues. First is that what was happening is the index file was being overwritten. So we have to generate the sitemaps into separate folders or using different names. Also I fixed some issues with multiple calls to create() in a single sitemap config.

SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
# SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new # just generate into tmp/
# SitemapGenerator::Sitemap.include_index = false # turned off for you in v2.1.1

%w(google yahoo apple).each do |subdomain|
  SitemapGenerator::Sitemap.default_host = "https://#{subdomain}.mysite.com"
  SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/#{subdomain}"
  SitemapGenerator::Sitemap.create do
    add '/home'
  end
end

Now works as expected and produces:

+ sitemaps/google/sitemap1.xml.gz             2 links /  822 Bytes /  328 Bytes gzipped
+ sitemaps/google/sitemap_index.xml.gz          1 sitemaps /  389 Bytes /  217 Bytes gzipped
Sitemap stats: 2 links / 1 sitemaps / 0m00s
+ sitemaps/yahoo/sitemap1.xml.gz             2 links /  820 Bytes /  330 Bytes gzipped
+ sitemaps/yahoo/sitemap_index.xml.gz          1 sitemaps /  388 Bytes /  217 Bytes gzipped
Sitemap stats: 2 links / 1 sitemaps / 0m00s
+ sitemaps/apple/sitemap1.xml.gz             2 links /  820 Bytes /  330 Bytes gzipped
+ sitemaps/apple/sitemap_index.xml.gz          1 sitemaps /  388 Bytes /  214 Bytes gzipped
Sitemap stats: 2 links / 1 sitemaps / 0m00s

Check out the namer options if you would rather generate all files in the root of the directory.

from sitemap_generator.

hurl avatar hurl commented on August 15, 2024

Hi Karl, sorry for the delay getting back to you on this.

This solution works, and is better. I haven't tried "namer," but it sounds like that will allow me to have all of the sitemaps in a single directory.

The main thing I would like to see is a single index file that points to all of the sitemap files.

Ideally, it would like like this

aws bucket
      |
      stuff (currently I have many image files here)
      sitemaps directory
           |
           index (single file containing all of the sitemap addresses)
           site maps (any number. I need thousands now, with the ability to scale much larger)

from sitemap_generator.

kjvarga avatar kjvarga commented on August 15, 2024

Ok yeah I wasn't sure about how you intended to structure your sitemaps (everyone seems to need to do it differently :)

The only issue with having all the sitemaps using a single index file is that according to the sitemap specs, all links in the sitemap(s) should have the same domain.

There is a way to do it using the group feature, which would have been perfect but there's an issue with the evaluation scope within create() that is an issue in this case.

If you don't care about separating each domain into it's own file, then you can just add all the links to the sitemap as per usual. I'll see if I can fix this scoping issue.

from sitemap_generator.

kjvarga avatar kjvarga commented on August 15, 2024

Good news, there was no problem using groups :D

SitemapGenerator::Sitemap.verbose = true
SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/"

SitemapGenerator::Sitemap.create do
  %w(google yahoo apple).each do |subdomain|
    group(:filename => subdomain, :default_host => "https://#{subdomain}.mysite.com") do
      add '/home'
    end
  end
end

+ sitemaps/google1.xml.gz             1 links /  676 Bytes /  308 Bytes gzipped
+ sitemaps/yahoo1.xml.gz             1 links /  675 Bytes /  311 Bytes gzipped
+ sitemaps/apple1.xml.gz             1 links /  675 Bytes /  310 Bytes gzipped
+ sitemaps/sitemap_index.xml.gz          3 sitemaps /  549 Bytes /  232 Bytes gzipped
Sitemap stats: 3 links / 3 sitemaps / 0m00s

from sitemap_generator.

hurl avatar hurl commented on August 15, 2024

Thanks again for your quick response.

Are you sure that you may not include subdomains from the same root (host) domain in the same index? According to the spec,

**Note**: A Sitemap index file can only specify Sitemaps that are found on the same site as the Sitemap index file. For example, http://www.yoursite.com/sitemap_index.xml can include Sitemaps on http://www.yoursite.com but not on http://www.example.com or http://yourhost.yoursite.com.

All of my subdomains are hosted from the same root domain (a single Heroku app).

In the end, it does not really matter; I can have an index for each subdomain, if that is necessary. My robot.txt file will have to grow to accommodate.

from sitemap_generator.

kjvarga avatar kjvarga commented on August 15, 2024

Yeah reading it again it's a bit confusing because they compare www.yoursite.com to yourhost.yoursite.com.

This post seems to say it's possible: http://www.google.com/support/forum/p/Webmasters/thread?fid=5ba122cf102db3c500046c02075d9f80&tid=5ba122cf102db3c5&hl=en. You just have to prove ownership of each subdomain by adding the Sitemap line to the robots.txt file for each subdomain. So I guess that would point to your main sitemap index. Seems pretty simple since all the robots.txt files would then be the same? You just have to make sure it's accessible on each subdomain.

Keep me posted on how it works out.

from sitemap_generator.

hurl avatar hurl commented on August 15, 2024

Karl,

That last construct did not work at all. It produced a single index inside which all the urls were mangled. The approach that is working best for me is:

SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/"
SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new
SitemapGenerator::Sitemap.include_index = false

index = 1
Listing.active_set.find_each do |listing|
  SitemapGenerator::Sitemap.default_host = "https://#{listing.subdomain}.mysite.com"
  SitemapGenerator::Sitemap.filename = ('sitemap_' + index.to_s).to_sym
  SitemapGenerator::Sitemap.create do
  end
  index += 1
end

This is very close, and is could work. The sitemaps and index file contents are correct, and they are all together in a single sitemaps directory, like so:

sitemaps
     |
     sitemap_11.xml.gz
     sitemap_1_index.xml.gz
     sitemap_21.xml.gz
     sitemap_2_.index.xml.gz
     sitemap_31.xml.gz
     sitemap_3_index.xml.gz
        "
        "
     and so on...

The problem now is the naming convention for the sitemaps themselves, with the 1 appended. What I'd like to be able to
do is override the name of the sitemap. I've tried the namer method, but can not get it to work.

Bottom line: this will work for me as-is. Getting the namer method to work would be icing on the cake.

from sitemap_generator.

kjvarga avatar kjvarga commented on August 15, 2024

I can give you an example of using a namer, but it would help if you let me know how you want to name them.

Also, how were the URLs "mangled" in the index from the group example I posted above?

If I run exactly this:

SitemapGenerator::Sitemap.verbose = true
SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/"

SitemapGenerator::Sitemap.create do
  %w(google yahoo apple).each do |subdomain|
    group(:filename => subdomain, :default_host => "https://#{subdomain}.mysite.com") do
      add '/home'
    end
  end
end

My index looks like this:

<?xml version="1.0" encoding="UTF-8"?><sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><sitemap><loc>https://s3.amazonaws.com/mysite/sitemaps/google1.xml.gz</loc></sitemap><sitemap><loc>https://s3.amazonaws.com/mysite/sitemaps/yahoo1.xml.gz</loc></sitemap><sitemap><loc>https://s3.amazonaws.com/mysite/sitemaps/apple1.xml.gz</loc></sitemap></sitemapindex>

from sitemap_generator.

kjvarga avatar kjvarga commented on August 15, 2024

So there was a small bug in the code when both the filename and sitemaps_namer options are used. That's probably why you had issues. It's fixed in v2.1.3.

Here's an example using the namer. Working under 2.1.3. You could use the listing.id in place of i when you generate your sitemaps.

SitemapGenerator::Sitemap.verbose = true
SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/"

i = 0
%w(google yahoo apple).each do |subdomain|
  basename = "sitemap#{i+=1}"
  SitemapGenerator::Sitemap.create(
      :default_host   => "https://#{subdomain}.mysite.com",
      :filename       => basename,
      :sitemaps_namer => SitemapGenerator::SitemapNamer.new("#{basename}_")) do
  end
end

+ sitemaps/sitemap1_1.xml.gz             1 links /  671 Bytes /  305 Bytes gzipped
+ sitemaps/sitemap1_index.xml.gz          1 sitemaps /  384 Bytes /  212 Bytes gzipped
Sitemap stats: 1 links / 1 sitemaps / 0m00s
+ sitemaps/sitemap2_1.xml.gz             1 links /  670 Bytes /  308 Bytes gzipped
+ sitemaps/sitemap2_index.xml.gz          1 sitemaps /  384 Bytes /  213 Bytes gzipped
Sitemap stats: 1 links / 1 sitemaps / 0m00s
+ sitemaps/sitemap3_1.xml.gz             1 links /  670 Bytes /  307 Bytes gzipped
+ sitemaps/sitemap3_index.xml.gz          1 sitemaps /  384 Bytes /  212 Bytes gzipped
Sitemap stats: 1 links / 1 sitemaps / 0m00s

from sitemap_generator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.