Again, isn’t that the site’s prerogative?
I think there should at least be a recognized way to opt-out that archive.org actually follows. For years they told people to put
User-agent: ia_archiver
Disallow:
in robots.txt, but they still archived content from those sites. They refuse to publish what IP addresses they pull content down from, but that would be a trivial thing to do. They refuse to use a UserAgent that you can filter on.
If you want to be a library, be open and honest about it. There’s no need to sneak around.
Technically, each time that it is viewed it is a republication from copyright perspective. It’s a digital copy that is redistributed; the original copy that was made doesn’t go away when someone views it. There’s not just one copy that people pass around like a library book.