wget usage

Started by walkingcarcass January 22, 2007 09:05 AM

0 comments, last by gumpy 17 years, 11 months ago

Author

116

January 22, 2007 09:05 AM

Hi. I'm trying to download the following from a website: http://site.com/features/X/a.html http://site.com/features/X/b.html http://site.com/features/X/c.html http://site.com/features/Y/d.html http://site.com/features/Z/e.html etc. I also want to download embedded images and all links for these pages, but not following links outside site.com/features/* Man pages for wget led me to try this:

Quote: wget -r -l2 http://site.com/features

But this simply gets site.com/features/index.html and site.com/robots.txt. Their robots.txt disallows several areas of the site but not features/ The other thing is I have a text file listing http://site.com/features/X/ http://site.com/features/Y/ http://site.com/features/Z/ etc But using -i list.txt still just gets files called index.html. The other thing is I have an html file which links to one file in each of /features/*/ but if I use recursive downloading like this:

Quote: wget -x -r -l2 --convert-links -p -Dsite.com pagewithlinks.html

Then it tries to download masses of stuff from elsewhere in the site, and doing -Dsite.com/features causes only only pagewithlinks.html to be downloaded. I'm betting the last approach is the most promising. Anyone know how to restrict recursive downloading to links within site.com/somedirectory/* ?

spraff.net: don't laugh, I'm still just starting...

gumpy

795

January 22, 2007 10:20 AM

i haven't done this in ages, but try something like:

wget -r -p -np -l 2 -k -x http://whatever.com/whatever.htm

the -np is "no parent" and the -k is short for --convert-links.

This space for rent.

wget usage

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

wget usage

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines