Hello friends! I need an expert opinion on the following problem.
In an university project I have find a solution to do the following tasks:
- I need to scrape google search results. Mainly I want 100.000 keywords and the first 50 search results (5 pages) exported (only the URLs and the ranking of the domain with the current keyword).
- Then I have to check the
PR of all 5.000.000 websites (duplicates included and only the first page).
- After that I need the HTML-code of all 5.000.000 websites (duplicates included and only the first page).
- After the job is finished, I want the results to be exported (txt, csv, ...) in order to examine the results elsewhere
- This job should be done automatically and periodically without user interaction (24/7 would be the best case)
QUESTIONS
* Is this possible with scrapebox or can you recommend other tools? I also can write my own java program for this purpose.
* Should I use private or public proxies to finish this task? How many of the proxies do I need?
Thanks in advance for your expert opinion.
tinkerbellO