Well, they seem to be willing to try it, given the filecoin foundation is subsidising it:
http://blog.archive.org/2021/04/01/filecoin-foundation-grants-50000-fil-to-the-internet-archive/
Well, they seem to be willing to try it, given the filecoin foundation is subsidising it:
http://blog.archive.org/2021/04/01/filecoin-foundation-grants-50000-fil-to-the-internet-archive/
Interesting. They’ve offered this sort of thing at AWS etc for years. The problem is, it is centralising. I mean, how many places can you send the drives to? How can the locations of these places remain anonymous? How can the hosts deny knowledge of knowingly storing or hosting that data?
Imo, this actually highlights the inherent weaknesses of filecoin. It is another distributed market for storage, but it isn’t really moving the game forward in how the data is replicated, distributed or accessed.
I’m sure there is a market for decentralised what AWS does, but I wonder about how they will compete with the efficiencies of scale, robustness, flexibility and resilience of what centralised solutions already deliver. Is it worth it?
Oh yeah, I knew there was some collaboration going on. What I mean is, it potentially not quite the complete fit yet. Stepping stone maybe though.
I recall the convo that David had with Brewser though, when he told him he was hoping to put IA out of business
That should be the ultimate aim though right?
Yes, a stepping stone perhaps. I suppose it helps them towards their goal. Free storage, is free storage, after all. Ofc, it would be an ongoing cost, rather than being a one off cost.
Love the brewser comment! So true!
To be fair, it’s always been David’s stated aim to put himself out of business too, so it’s evens.
I can’t remember that in the bnktothefuture pitch!
Seriously though, an automated network that retains history would be awesome. I’m sure there will always be some fettling and improvements to be made somewhere around the system though!
If we set right limits to node size (or better some algorithm that sets the limits according to network age and orher parameters) “archiving The Archive” could be perfect for stabilising network growth. Do it in small parts or with very variabe speed, stop it when there is enough upload of other data and speeed it up when thre is large number of nodes waiting to join.
If the wallet for putting clearnet data on safenet is public, how would it be possible to ensure that the tokens are only used for the intended purpose?
I’m not 100% sure if the safe:// address for the same content is always the same even when uploaded by different owners (I know it was supposed to be at some point in development). If not the following ideas would not work.
Maybe you would need to pay for the upload yourself, and then you can provide the safe:// of the data along with the http:// of the same data. The owner of the wallet first checks if the data has already been uploaded, if not it generates the safe:// for the data from the content at the http:// and compares it to the safe://. If they match you know it’s the right data, and the wallet owner pays the uploader.
Hmm, another idea, don’t know if safenet supports it:
safe dog
s the address to make sure it doesn’t exist yet.If for some reason the data is actually not uploaded, anyone can request an “upload certificate” from the wallet owner and then upload that data.
These ideas allow people to donate to a wallet they trust to use the tokens for the intended purpose. The owners of these centralised upload servers would keep a list of all things they have uploaded in a way that people who use the system could automatically verify that the tokens are indeed all used for their intended purpose.
Okay so I was thinking on a site by site basis like each site has its own wallet. An over arching archive wallet is a neat idea even when funds are empty and it incurs a cost the internet archive could archive data at a cheaper cost or at least offset the cost by farming with their equipment.
Yeah just ideas here too but fun and those ideas could be impactful if sorted out.
Well, we could actually nest wallets. One big public archive wallet that could be used for any site, and then within that, ring-fenced wallets that could only be spent archiving specific sites. So could be both!
That would make donation to the cause so much easier and manageable! I dig it.
I would have thought that promoting a tightly controlled state-sponsored propaganda outfit with a history of repression, outright lying and paedophilia was somewhat at odds with the ethos of the project - said he in a gentle aside
Apart from Attenborough, what else would you want to preserve from that bastion of the establishment? The one that tries to pass of utter corruption as “cronyism”
How is that promoting? Surely you’d want information that could be deemed lies, preserved as much as information you hold to be the truth? Perhaps only historical preservation can reveal the difference?
A good point Jim, it was an aside rather than a dig
We’ll keep the Attenborough stuff anyway - and every last bit of output from John Peel too.
Just came to suggest the IA as stumbled over its being more than just websites… there seems to be a lot of content that is free books, movies, software, music.
Edit: and just reflecting that for a lot of these perhaps the idea of switching costs from perpetual storage to engaging with what is new and uploading more volume as a result, might be very appealing. I notice Gutenberg is slow and wonder that any host of data will worry about their ability to keep up with the costs of storage, where Safe Network will allow a simple sell of do it once and do it well… and worry not.
For me, the Internet Archive is where I can listen to just about every Grateful Dead gig they ever played and a vast array of other bands, especially Little Feat.- for free, secure in the knowledge that these artists put this material up there for their fans.
https://www.nature.com/sdata/policies/repositories
Scientific Data mandates the release of datasets accompanying our Data Descriptors, but we do not ourselves host data. Instead, we ask authors to submit datasets to an appropriate public data repository. Data should be submitted to discipline-specific, community-recognized repositories where possible.
Repositories for primary data deposition listed on this page meet our requirements for data access, preservation, resource stability, and suitability for use by all researchers with the appropriate types of data.
We provide an archive of our recommended repository list, which is available for use under the CC-BY licence. Recommended repositories and standards that are indexed by FAIRsharing can also be viewed and filtered via the Scientific Data FAIRsharing collection.
10,727,990,597 source files from 162,481,510 projects. Holy moly
Great podcast interview with Max Roser from Our World In Data.
A lot of data sets are mentioned
https://www.gapminder.org/
https://www.worldbank.org/en/publication/wdr/wdr-archive
https://www.carbonbrief.org/
https://www.nature.com/sdata/
https://data.worldbank.org/
https://stats.oecd.org/
Some interesting dialog about data licensing:
MR: It used to be the case that this [World Bank] data was licensed under very restrictive permissions, only available if you order a DVD and so on. And Hans diagnosed them with, what did he call it? ‘Database Hugging Disorder,’ DHD. And he cured them of that.
RW: Interesting. Who was licensing it through a DVD? You mean Gapminder, or the World Bank?
MR: Oh, the World Bank and other UN organizations. Back in the day, they weren’t making their data available in this way. And it’s still the case for one very important data source, the International Energy Agency. That’s a partner organization of the OECD.
They produce some of the most important data in the world. They produce the global statistics on energy and climate change. And the world needs to have access to these data sources. But if you want to have access to the full data of the IEA, you pay licenses that are costing several thousands of euros. And that also means that institutions like us, but also journalists, can’t straightforwardly rely on their data and communicate that. And so we are in a situation where the best statisticians on energy produce these figures, and then they’re locked away behind a paywall. And instead of using these figures, the world relies on the data from BP, from the gas and oil multinational. They’re producing the energy stats. And so we have largely publicly funded data at the IEA that isn’t available for the public. And we have a private oil company that is producing the data that everyone relies on.
Everyone tries to work their way around this issue. So you have researchers that can’t share each other’s work. We had several of these issues where we would have access to some data, we would analyze the data, and then we can’t make it publicly available. If you make it publicly available, even in a chart or so, you get several emails from several people that ask whether you can possibly share that information with them, and you can’t, because the licenses don’t actually allow this, so that every other researcher is doubling down on this effort, and everyone is trying to do the same analysis, and is trying to avoid these restrictions with the IEA.
An amazingly informative resource here:
https://www.archivematica.org/en/
Archivematica uses METS, PREMIS, Dublin Core, the Library of Congress BagIt specification and other recognized standards to generate trustworthy, authentic, reliable and system-independent Archival Information Packages (AIPs) for storage in your preferred repository.
Compatible with hundreds of formats
In the Format Policy Registry (FPR), Archivematica implements its default format policies based on an analysis of the significant characteristics of file formats. The FPR also offers an editable, flexible framework for format identification, package extraction, transcription and normalization for preservation and access.
Memory institutions have dedicated voluminous resources over the past couple of decades to implement various software platforms to manage digital objects. For this reason, we believe in leveraging the strength of other tools and integrating with them wherever possible.
- dSpace
- CONTENTdm
- Islandora
- LOCKSS
- AtoM
- DuraCloud
- OpenStack
- Archivists’ Toolkit
- Arkivum
- ArchivesSpace