Public Datasets on Safe Network

Southside · October 7, 2023, 5:24pm

The above message was sponsored by the Cats Protection League

doggirljuno · December 18, 2023, 7:58am

Its not sexy

Data is sexy <3

19eddyjohn75 · May 5, 2024, 6:25pm

happybeing · July 6, 2024, 11:35am

Lots of LinkedData, he’s a new one:

MaxSan · July 7, 2024, 11:36pm

Lets put Annas Archive on the network.

MaxSan · July 8, 2024, 7:08pm

This just turned into a priority. Obviously we can’t yet due to nobody having tokens… But what a way to seed the network
…

Knosis · July 9, 2024, 5:51am

NEWS
04 March 2024
Clarification 05 March 2024

Millions of research papers at risk of disappearing from the Internet

An analysis of DOIs suggests that digital preservation is not keeping up with burgeoning scholarly knowledge.

By * Sarah Wild

Old documents and books stored on shelves in a library's archive.

A study identified more than two million articles that did not appear in a major digital archive, despite having an active DOI.Credit: Anna Berkut/Alamy

More than one-quarter of scholarly articles are not being properly archived and preserved, a study of more than seven million digital publications suggests. The findings, published in the Journal of Librarianship and Scholarly Communication on 24 January1, indicate that systems to preserve papers online have failed to keep pace with the growth of research output.

https://archive.is/0lT15#selection-975.0-1185.110

zettawatt · July 9, 2024, 9:55pm

I had never heard of this site before. Thanks for sharing. 873TB, that’s a lot of nanos.

MaxSan · July 9, 2024, 9:56pm

Well I wouldnt think any single user would pay for that… Depending on the cost of data I guess lol… its yet to be known.

happybeing · September 3, 2025, 1:18pm

Wikipedia is only 20GB and they provide torrents:

cc @aatonnomicc @neo

aatonnomicc · September 3, 2025, 1:42pm

OMG I’m thinking to combine that with tarchive upload

happybeing · September 3, 2025, 2:28pm

Once the data is up it would be good to put a web front end on it for viewing, then search etc. I’m not sure if any of the current projects such as Colony (@zettawatt) would be good for this? Be nice for it to be viewable as a straight website, which I assume the data is already set for.

In which case uploading using tarchive might be premature because uploading the files to an Archive should allow browsing using dweb off-the-bat. Unless @Traktion has something ready for this?

Just uploading the data is good, but having it available as a demo app or website would be powerful marketing, and not just for early adopters.

Once dweb is an app, everyday folk will be able to browse websites that are published using Archive. And if you upload with dweb they will have versioning too, which for Wikipedia would be a very-cool-indeed-showcase.

storage_guy · September 3, 2025, 3:03pm

Even better, just get Wikipedia to use Autonomi for their backend storage for the normal web version but have it accessible via dweb as well?

zettawatt · September 3, 2025, 3:48pm

That’s what Colony was built to do If someone uploads it, I can index it for searching. Whether dweb app, Colony app, anttp proxy, etc. user still has to install an app somewhere.

FYI, I’m still rsyncing the Project Gutenberg full archive. I’m at book number 63000 of 76000. Once done, I’m going to upload all of the books to Autonomi and index them in Colony for searching. Pretty much every public domain book will then be there and will be the stress test to see how the local search engine works against large datasets.

happybeing · September 3, 2025, 3:59pm

That’s very cool. I can’t wait, but I’m used to it

One more thing about using Archive for Wikipedia is that it may be that not many files will change in each update, in which case after the first 20GB updates will be cheap. Depends on how Wikipedia generate their files for this though. Be good to test it out.

Good luck with Gutenberg!

happybeing · September 3, 2025, 4:01pm

Also, [puts down his PIMMs], Colony will soon have dweb built in so you’d be able to search and view Wikipedia with that alone.

zettawatt · September 3, 2025, 4:09pm

I’m working on adding anttp as well

Any updates and where you are with the dweb library? I’ve been holding off on bolting it on until you give the go ahead and had the crate released.

happybeing · September 3, 2025, 4:16pm

You should be fine with the branch I posted.

The changes I want to do before publishing won’t affect you: mainly eliminating and tidying code.

I was hoping to spend time on it this week but I might not be able to so I suggest you try the branch (see dweb topic). I’m really looking forward to seeing this too.

I think it’s pretty simple for you to do. A single call to start the server, which you can do from a Tauri command, and then dweb-open via REST or Rust API.

Toivo · September 3, 2025, 4:48pm

oh man, that’s a treasure for a sentence.

Traktion · September 3, 2025, 7:08pm

They’re just tar files with the last being a tarindex, so anyone is free to parse that format.

I could break out the main code into a library, if there is interest. However, I don’t think anyone has used chunk_streamer lib yet and I’m aware maidsafe are working on some sort of archive format too (and streaming). So, don’t want to duplicate effort, etc.

Tarchive support will remain in AntTP regarless though!

Topic		Replies	Views
How much do you plan to upload to the SAFE network? Community	45	2561	July 3, 2019
Crowdsourcing Some Market Research [Help Required!] Community	24	1651	October 27, 2023
Incentivising public uploads Community	28	843	August 31, 2020
The Perpetual Web? Beginners	37	1511	December 24, 2022
Apps to capitalise on ease of publishing via the Safe Network Apps on-topic-only	32	1222	February 26, 2021

Public Datasets on Safe Network

Millions of research papers at risk of disappearing from the Internet

Related topics