1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Other Need to download every file from a website (offline browser?)

Discussion in 'Software' started by ShakeyJake, 21 Apr 2021.

  1. ShakeyJake

    ShakeyJake My name is actually 'Jack'.

    Joined:
    5 May 2009
    Posts:
    840
    Likes Received:
    49
    Hello all.

    The school in which I work uses a website who have always stressed that all materials and resources in there are completely free to use. However, they are closing up shop soon. What I'd like to do is take our own copy of every file that they host. Doing this manually would take weeks, it's huge and not neatly spread out (multiple clicks to get to one page that has one file, multiple clicks to the next, etc). The resources are mostly ppt, pdf and videos in whatever format.

    Is there an automated way to do this? I'd thought about wget, or maybe an offline browser?

    I don't actually need the site itself (and wouldn't want it due to copyright/branding even on free materials) but I'd accept the site as collateral if it meant I could get to the resources and I'd just delete the HTML files after. I have at my disposal a Windows laptop, several Linux desktops and a Linux server (that can be accessed from any machine or even directly) with plenty free space.

    Thanks,
    Jack
     
  2. faugusztin

    faugusztin I *am* the guy with two left hands

    Joined:
    11 Aug 2008
    Posts:
    6,941
    Likes Received:
    266
    MLyons likes this.
  3. GaryP

    GaryP RIP Tel

    Joined:
    31 Aug 2009
    Posts:
    4,957
    Likes Received:
    509
    Seconded . Have used httrack a few times for manuals and plans for a local building firm.
     
  4. wyx087

    wyx087 Homeworld 3 is happening!!

    Joined:
    15 Aug 2007
    Posts:
    11,144
    Likes Received:
    371
    +1 for HTTrack Website Copier

    Used it at uni many years ago to download intranet materials at the time of my graduation. It's just a main notes page that links to simple HTML pages by lecturers that links to PDF's/PPT's. I think I just pointed it to the main notes index and let it run.

    The finished product was an index.html for the main notes page (the starting page) and it functions identical to the intranet. All sub-directory links point to my own version which then points to PDF's it downloaded.

    Anything that's hidden behind log-ins won't be downloaded. Anything that goes up in terms of URL wasn't saved.
     
  5. ShakeyJake

    ShakeyJake My name is actually 'Jack'.

    Joined:
    5 May 2009
    Posts:
    840
    Likes Received:
    49
    Awesome, thanks all. Got httrack running on a spare pc now. It's working great but unfortunately nothing seems to make it run any faster. Too many small files I suspect.

    Shame we don't do rep any more but consider this a thank you!
     

Share This Page