the other day i wanted to speed up the annoying task of repetitive dowloads from a website. i thought, since machines are built for completing repetitive tasks fast, my computer should be able to download some files on it's own.
after launching wget with the corresponding parameters
wget -r -l 1 -H -nc http://www.thewebsite.com/index.html
i noticed that wget would fail because of the design of the website: when you arrived at the index page, there is a pull down menu in a (hidden) form, which will let you choose an entry. every entry will redirect you to the file which it had as value:
<form name="jump"><br />
<select name="menu" onChange="location=document.jump.menu.options[document.jump.menu.selectedIndex].value;" value="GO"><br />
<option value="first.html">first</option><br />
<option value="second.html">second</option>
the file you are redirected to had the interesting contents i wanted to download automatically - but wget will not recognize the value field in the select element as a local link, and thus will not follow it.
so i would have to prepare a html file which is comprehensive also for wget. i copied the options-part of the source code into
my favourite text editor and replaced the tags
<option value="first.html">first</option> analogous this schema
<a
href="http://www.thewebsite.com/first.html">first</a>
.
i saved the modified code to
~/Sites/mylinkslist.html and invoked wget with the following arguments:
wget -r -l 1 -H -nc http://localhost/mylinkslist.html
this worked perfectly fine and wget downloaded all the files i wanted automatically.
wget arguments shortly explained:
-nc, --no-clobber do not download file twice
-r, --recursive follow all the links on a webpage
-l, --level=NUMBER how many link-levels to follow
-H, --span-hosts go to foreign hosts when recursive
some more useful wget arguments:
-A, --accept=LIST accepted file extensions, comma-separated
-np, --no-parent do not ascend to parent directory in recursive mode
-c, --continue resume a partially-downloaded file
w¡nd0w$ user can find a win-ified version of wget
here: http://gnuwin32.sourceforge.net/packages/wget.htm.
those who own a real computer (i.e. a
mac) use
fink or apt-get to install wget on their system.