Public Datasets on Safe Network

The above message was sponsored by the Cats Protection League

1 Like

Its not sexy

Data is sexy <3

1 Like
1 Like

Lots of LinkedData, he’s a new one:

3 Likes

Lets put Annas Archive on the network.

7 Likes

This just turned into a priority. Obviously we can’t yet due to nobody having tokens… But what a way to seed the network

2 Likes

Millions of research papers at risk of disappearing from the Internet

An analysis of DOIs suggests that digital preservation is not keeping up with burgeoning scholarly knowledge.

By * Sarah Wild

Old documents and books stored on shelves in a library's archive.

A study identified more than two million articles that did not appear in a major digital archive, despite having an active DOI.Credit: Anna Berkut/Alamy

More than one-quarter of scholarly articles are not being properly archived and preserved, a study of more than seven million digital publications suggests. The findings, published in the Journal of Librarianship and Scholarly Communication on 24 January1, indicate that systems to preserve papers online have failed to keep pace with the growth of research output.

https://archive.is/0lT15#selection-975.0-1185.110

6 Likes

I had never heard of this site before. Thanks for sharing. 873TB, that’s a lot of nanos.

Well I wouldnt think any single user would pay for that… Depending on the cost of data I guess lol… its yet to be known.

Wikipedia is only 20GB and they provide torrents:

cc @aatonnomicc @neo

7 Likes

OMG I’m thinking to combine that with tarchive upload :slight_smile:

7 Likes

Once the data is up it would be good to put a web front end on it for viewing, then search etc. I’m not sure if any of the current projects such as Colony (@zettawatt) would be good for this? Be nice for it to be viewable as a straight website, which I assume the data is already set for.

In which case uploading using tarchive might be premature because uploading the files to an Archive should allow browsing using dweb off-the-bat. Unless @Traktion has something ready for this?

Just uploading the data is good, but having it available as a demo app or website would be powerful marketing, and not just for early adopters.

Once dweb is an app, everyday folk will be able to browse websites that are published using Archive. And if you upload with dweb they will have versioning too, which for Wikipedia would be a very-cool-indeed-showcase.

7 Likes

Even better, just get Wikipedia to use Autonomi for their backend storage for the normal web version but have it accessible via dweb as well?

4 Likes

That’s what Colony was built to do :smiley: If someone uploads it, I can index it for searching. Whether dweb app, Colony app, anttp proxy, etc. user still has to install an app somewhere.

FYI, I’m still rsyncing the Project Gutenberg full archive. I’m at book number 63000 of 76000. Once done, I’m going to upload all of the books to Autonomi and index them in Colony for searching. Pretty much every public domain book will then be there and will be the stress test to see how the local search engine works against large datasets.

10 Likes

That’s very cool. I can’t wait, but I’m used to it :laughing:

One more thing about using Archive for Wikipedia is that it may be that not many files will change in each update, in which case after the first 20GB updates will be cheap. Depends on how Wikipedia generate their files for this though. Be good to test it out.

Good luck with Gutenberg!

3 Likes

Also, [puts down his PIMMs], Colony will soon have dweb built in so you’d be able to search and view Wikipedia with that alone. :exploding_head:

3 Likes

I’m working on adding anttp as well :smiley:

Any updates and where you are with the dweb library? I’ve been holding off on bolting it on until you give the go ahead and had the crate released.

1 Like

You should be fine with the branch I posted.

The changes I want to do before publishing won’t affect you: mainly eliminating and tidying code.

I was hoping to spend time on it this week but I might not be able to so I suggest you try the branch (see dweb topic). I’m really looking forward to seeing this too.

I think it’s pretty simple for you to do. A single call to start the server, which you can do from a Tauri command, and then dweb-open via REST or Rust API.

2 Likes

:rofl: oh man, that’s a treasure for a sentence.

3 Likes

They’re just tar files with the last being a tarindex, so anyone is free to parse that format.

I could break out the main code into a library, if there is interest. However, I don’t think anyone has used chunk_streamer lib yet and I’m aware maidsafe are working on some sort of archive format too (and streaming). So, don’t want to duplicate effort, etc.

Tarchive support will remain in AntTP regarless though!

5 Likes