harvesting a site before it goes live?

4 posts / 0 new
Last post
holmesg
harvesting a site before it goes live?

Looking at the OAI-PMH Server - Configuration, I am wondering how to configure it to safely allow harvesting of a development instance of a site, before it goes live.

 

In other words, I want the harvested data to have URLs that point to the future hostname, not the current hostname, of the CWIS site.

 

Does that make sense?  What can I do to achieve that?

ealmasy
Re: harvesting a site before it goes live?

Are you talking about the base URL, URL fields mapped to OAI-PMH fields, full record page URLs mapped to OAI-PMH fields, or something else?  Just not clear about which URLs are at issue.

holmesg
Re: harvesting a site before it goes live?

I'm concerned about any way that someone using the harvested data might be led back to the development CWIS site, rather than the future production CWIS site.

The data in the resources themselves should be fine. But if someone follows a URL to view the resource listing back on the CWIS site that I am configuring, then I want them to come to the production hostname, not the development hostname.

 

So, I guess I am most concerned about the URL in the "fullRecordLink".

Also, the "identifier" seems to be constructed with the development site hostname as part of it ... so, if we were to start allowing harvesting before launch, then after the launch the records then harvested would appear to the harvester to be not replacements, but rather duplicates? Is that a concern?

So I'm just wondering if there is any way to configure this site so that the data that it makes available to harvesters now will look the same as if it had already been launched and was residing at the production hostname.

ealmasy
Re: harvesting a site before it goes live?

The fullRecordLink value is part of the OAI-SQ extension, so it won't be used by anyone just harvesting and storing regular OAI-PMH data.

The identifier is constructed using the values configured on the OAI-PMH Server Configuration page.  Initially those are automatically populated with the site name and host name, but you can change them to whatever you want.

think this addresses your concerns, but if not please let me know.  It's certainly a valid consideration, and if necessary we'll make changes or enhancements to the OAI-PMH Server plugin and release a new version to address it.