Comments (14)
Ok so your user links would be like:
karl.xyz.com
karl.xyz.com/profile
karl.xyz.com/posts
or whatever. When generating your sitemap you can actually set the host on a per-link basis, so you could do something like:
SitemapGenerator.create do
User.find_each do |user|
add '/', :host => "#{user.username}.xyz.com"
add user_profile_path(user), :host => "#{user.username}.xyz.com"
add user_posts_path(user), :host => "#{user.username}.xyz.com"
end
end
It would be nice to have a :url option so you could just do:
add :url => user_posts_url(user)
Or I could just detect that the URL already has a host and not use the default_host in that case. Then it would just be:
add user_posts_url(user)
Is that what you were looking for?
from sitemap_generator.
No, the links would be for subdomains. For example:
subdomain.example.com
company-1.example.com
company-2.example.com
and so on, where "example.com" is the root domain.
So, the host would remain the same. What would be needed is the ability to add the subdomain. Such as:
Subdomain.find_each do |sd|
SitemapGenerator.create "#{sd}.{host}"
end
I think you would have to create a sitemap for each subdomain, as per Google's guidelines.
from sitemap_generator.
Can you take a look at this thread and see if it helps? #24
From what I understand you just need to create a sitemap for each domain, so you can set your default_host to each full domain e.g. http://subdomain.example.com and add links to it like you would any other sitemap.
from sitemap_generator.
See if something like this works...
%w[en fr ru].each do |domain|
SitemapGenerator::Sitemap.sitemaps_path = "#{domain}/"
SitemapGenerator::Sitemap.default_host = "http://www.#{domain}.example.com"
SitemapGenerator::Sitemap.create do
add '/whatever'
end
end
from sitemap_generator.
This works, partially. The individual site maps are generated correctly, but the index only lists one of the site maps. BTW, this seems to be the problem that user chamnap had, and it does not seem to have been resolved.
Here is my code:
SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new
SitemapGenerator::Sitemap.include_index = false
Listing.find_each do |listing|
SitemapGenerator::Sitemap.default_host = "https://#{listing.subdomain}.mysite.com"
SitemapGenerator::Sitemap create do
add ''
end
end
from sitemap_generator.
Ok try this with v2.1.1. There were a couple issues. First is that what was happening is the index file was being overwritten. So we have to generate the sitemaps into separate folders or using different names. Also I fixed some issues with multiple calls to create() in a single sitemap config.
SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
# SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new # just generate into tmp/
# SitemapGenerator::Sitemap.include_index = false # turned off for you in v2.1.1
%w(google yahoo apple).each do |subdomain|
SitemapGenerator::Sitemap.default_host = "https://#{subdomain}.mysite.com"
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/#{subdomain}"
SitemapGenerator::Sitemap.create do
add '/home'
end
end
Now works as expected and produces:
+ sitemaps/google/sitemap1.xml.gz 2 links / 822 Bytes / 328 Bytes gzipped
+ sitemaps/google/sitemap_index.xml.gz 1 sitemaps / 389 Bytes / 217 Bytes gzipped
Sitemap stats: 2 links / 1 sitemaps / 0m00s
+ sitemaps/yahoo/sitemap1.xml.gz 2 links / 820 Bytes / 330 Bytes gzipped
+ sitemaps/yahoo/sitemap_index.xml.gz 1 sitemaps / 388 Bytes / 217 Bytes gzipped
Sitemap stats: 2 links / 1 sitemaps / 0m00s
+ sitemaps/apple/sitemap1.xml.gz 2 links / 820 Bytes / 330 Bytes gzipped
+ sitemaps/apple/sitemap_index.xml.gz 1 sitemaps / 388 Bytes / 214 Bytes gzipped
Sitemap stats: 2 links / 1 sitemaps / 0m00s
Check out the namer
options if you would rather generate all files in the root of the directory.
from sitemap_generator.
Hi Karl, sorry for the delay getting back to you on this.
This solution works, and is better. I haven't tried "namer," but it sounds like that will allow me to have all of the sitemaps in a single directory.
The main thing I would like to see is a single index file that points to all of the sitemap files.
Ideally, it would like like this
aws bucket
|
stuff (currently I have many image files here)
sitemaps directory
|
index (single file containing all of the sitemap addresses)
site maps (any number. I need thousands now, with the ability to scale much larger)
from sitemap_generator.
Ok yeah I wasn't sure about how you intended to structure your sitemaps (everyone seems to need to do it differently :)
The only issue with having all the sitemaps using a single index file is that according to the sitemap specs, all links in the sitemap(s) should have the same domain.
There is a way to do it using the group
feature, which would have been perfect but there's an issue with the evaluation scope within create() that is an issue in this case.
If you don't care about separating each domain into it's own file, then you can just add all the links to the sitemap as per usual. I'll see if I can fix this scoping issue.
from sitemap_generator.
Good news, there was no problem using groups :D
SitemapGenerator::Sitemap.verbose = true
SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/"
SitemapGenerator::Sitemap.create do
%w(google yahoo apple).each do |subdomain|
group(:filename => subdomain, :default_host => "https://#{subdomain}.mysite.com") do
add '/home'
end
end
end
+ sitemaps/google1.xml.gz 1 links / 676 Bytes / 308 Bytes gzipped
+ sitemaps/yahoo1.xml.gz 1 links / 675 Bytes / 311 Bytes gzipped
+ sitemaps/apple1.xml.gz 1 links / 675 Bytes / 310 Bytes gzipped
+ sitemaps/sitemap_index.xml.gz 3 sitemaps / 549 Bytes / 232 Bytes gzipped
Sitemap stats: 3 links / 3 sitemaps / 0m00s
from sitemap_generator.
Thanks again for your quick response.
Are you sure that you may not include subdomains from the same root (host) domain in the same index? According to the spec,
**Note**: A Sitemap index file can only specify Sitemaps that are found on the same site as the Sitemap index file. For example, http://www.yoursite.com/sitemap_index.xml can include Sitemaps on http://www.yoursite.com but not on http://www.example.com or http://yourhost.yoursite.com.
All of my subdomains are hosted from the same root domain (a single Heroku app).
In the end, it does not really matter; I can have an index for each subdomain, if that is necessary. My robot.txt file will have to grow to accommodate.
from sitemap_generator.
Yeah reading it again it's a bit confusing because they compare www.yoursite.com to yourhost.yoursite.com.
This post seems to say it's possible: http://www.google.com/support/forum/p/Webmasters/thread?fid=5ba122cf102db3c500046c02075d9f80&tid=5ba122cf102db3c5&hl=en. You just have to prove ownership of each subdomain by adding the Sitemap line to the robots.txt file for each subdomain. So I guess that would point to your main sitemap index. Seems pretty simple since all the robots.txt files would then be the same? You just have to make sure it's accessible on each subdomain.
Keep me posted on how it works out.
from sitemap_generator.
Karl,
That last construct did not work at all. It produced a single index inside which all the urls were mangled. The approach that is working best for me is:
SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/"
SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new
SitemapGenerator::Sitemap.include_index = false
index = 1
Listing.active_set.find_each do |listing|
SitemapGenerator::Sitemap.default_host = "https://#{listing.subdomain}.mysite.com"
SitemapGenerator::Sitemap.filename = ('sitemap_' + index.to_s).to_sym
SitemapGenerator::Sitemap.create do
end
index += 1
end
This is very close, and is could work. The sitemaps and index file contents are correct, and they are all together in a single sitemaps directory, like so:
sitemaps
|
sitemap_11.xml.gz
sitemap_1_index.xml.gz
sitemap_21.xml.gz
sitemap_2_.index.xml.gz
sitemap_31.xml.gz
sitemap_3_index.xml.gz
"
"
and so on...
The problem now is the naming convention for the sitemaps themselves, with the 1 appended. What I'd like to be able to
do is override the name of the sitemap. I've tried the namer method, but can not get it to work.
Bottom line: this will work for me as-is. Getting the namer method to work would be icing on the cake.
from sitemap_generator.
I can give you an example of using a namer, but it would help if you let me know how you want to name them.
Also, how were the URLs "mangled" in the index from the group
example I posted above?
If I run exactly this:
SitemapGenerator::Sitemap.verbose = true
SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/"
SitemapGenerator::Sitemap.create do
%w(google yahoo apple).each do |subdomain|
group(:filename => subdomain, :default_host => "https://#{subdomain}.mysite.com") do
add '/home'
end
end
end
My index looks like this:
<?xml version="1.0" encoding="UTF-8"?><sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><sitemap><loc>https://s3.amazonaws.com/mysite/sitemaps/google1.xml.gz</loc></sitemap><sitemap><loc>https://s3.amazonaws.com/mysite/sitemaps/yahoo1.xml.gz</loc></sitemap><sitemap><loc>https://s3.amazonaws.com/mysite/sitemaps/apple1.xml.gz</loc></sitemap></sitemapindex>
from sitemap_generator.
So there was a small bug in the code when both the filename and sitemaps_namer options are used. That's probably why you had issues. It's fixed in v2.1.3.
Here's an example using the namer. Working under 2.1.3. You could use the listing.id
in place of i
when you generate your sitemaps.
SitemapGenerator::Sitemap.verbose = true
SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/"
i = 0
%w(google yahoo apple).each do |subdomain|
basename = "sitemap#{i+=1}"
SitemapGenerator::Sitemap.create(
:default_host => "https://#{subdomain}.mysite.com",
:filename => basename,
:sitemaps_namer => SitemapGenerator::SitemapNamer.new("#{basename}_")) do
end
end
+ sitemaps/sitemap1_1.xml.gz 1 links / 671 Bytes / 305 Bytes gzipped
+ sitemaps/sitemap1_index.xml.gz 1 sitemaps / 384 Bytes / 212 Bytes gzipped
Sitemap stats: 1 links / 1 sitemaps / 0m00s
+ sitemaps/sitemap2_1.xml.gz 1 links / 670 Bytes / 308 Bytes gzipped
+ sitemaps/sitemap2_index.xml.gz 1 sitemaps / 384 Bytes / 213 Bytes gzipped
Sitemap stats: 1 links / 1 sitemaps / 0m00s
+ sitemaps/sitemap3_1.xml.gz 1 links / 670 Bytes / 307 Bytes gzipped
+ sitemaps/sitemap3_index.xml.gz 1 sitemaps / 384 Bytes / 212 Bytes gzipped
Sitemap stats: 1 links / 1 sitemaps / 0m00s
from sitemap_generator.
Related Issues (20)
- Ping failed for Bing: #<OpenURI::HTTPError: 410 Gone> HOT 2
- Feature Request: Remove unwanted options HOT 6
- Is it possible to override the default link options? HOT 3
- How to remove default priority, lastmod and changefreq from all urls HOT 1
- GoogleStorageAdapter - Can't create sitemap on buckets with uniform bucket-level access enabled HOT 3
- Allow disable acl for S3 bucket HOT 1
- Rails on Heroku with S3 sitemap hosting - google didn't like it HOT 2
- 【Worked】Ping failed for Bing: #<OpenURI::HTTPError: 410 Gone> HOT 4
- Is it possible to add additional metadata fields to the xml HOT 5
- Feature request : Make `default_host` optional / Allow full URL instead of path HOT 2
- Entity escaping in XML output
- google search console couldn't fetch sitemap but index is ok HOT 7
- Google search correct ping url HOT 2
- Readme install instructions fail HOT 1
- [BUG] Mobile sitemap link in Readme.md is giving 404
- TimeZone is ignored
- sitemap.xml.gz file not found on server
- Testing with frozen time does not work
- should sitemap show up automatically in google search console? HOT 2
- Should we drop default `lastmod` of `Time.now` as Google says "it needs to consistently match reality"? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sitemap_generator.