We need someone to download an entire Russian-language website consisting of millions of individual pages, preferably in JSON format:
[login to view URL]
Example: [login to view URL]
Russian speaker preferable (this site is in Russian and we require the JSONs to display the Russian text (not the translated English).
1. JSONs for all files from [login to view URL]
a. We need the JSONS for all document IDs on the website. We expect there to be upwards of 100 million individual JSON entries
b. It would be ideal to have some sort of validation to ensure we are getting the entire site, and not some incomplete portion
2. Programming code used to make requests from the API
a. This can be delivered at the end of the project.
b. We may require some explanation of the code so we can use it later on to fill in missing entries or redownload corrupted files.
Method of Delivery:
- Shared cloud storage to be agreed upon. JSON files should be split into batches of between 500MB and 1GB for easier processing.
OK, as we agreed on. I will make the script and will upload data into some file sharing system like dropbox. Once I have some chunk of file will upload it. Thx, let's do the great work!