1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Other Scanning Old Magazines - a Worklog

Discussion in 'Photography, Art & Design' started by Gareth Halfacree, 3 Mar 2025.

  1. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,625
    Likes Received:
    9,228
    Wanted to keep some notes about my ongoing process of digitising old magazines for upload to the Internet Archive, and figured I might as well do it somewhere public.

    Current equipment:
    • Desktop (Ryzen 2700X, 32GB, 1TB SSD/6TB HDD, Ubuntu 20.04)
    • Fujitsu Fi-7260 (A4 documents canner with dual-element automatic document feeder)
    • Fujitsu SP-1425 (A4 flatbed scanner with dual-element automatic document feeder)
    • Other SP-1425 (my old one, has a bad element in the ADF which stripes the scan)
    • CZUR ET24 Pro (webcam-onna-stick billed as a "book scanner," kinda crap)
    • Ruler, chopping board, sharp knife
    • The GIMP (bitmap image editor)
    • Simple Scan (multi-page document scanner)
    • Scan Tailor Experimental (Linux port of a Windows tool for tweaking book scans)
    • ImageMagick (open-source image handling toolkit)
    • moz-jpeg (Mozilla's JPEG library)
    • Tesseract OCR (optical character recognition)
    • pdftk (CLI PDF toolkit)
    • bash (shell)
    The current workflow:
    1. Slice the spine off a magazine with the ruler and knife with a stack-cutter guillotine.
    2. Separate the pages.
    3. If it's a PC Pro, it's got fold-outs and card inserts; they'll need separate handling.
    4. Load some pages into the ADF.
    5. Scan pages in Simple Scan, 300dpi full colour range.
    6. Realise the scans are skewed, curse, delete the pages, put the stack back in the ADF and try to get them to feed straight this time. Repeat as necessary, particularly with gloss onion-skin paper like PC Pro advertisers use for their catalogues.
    7. If you've now reached a fold-out or card insert, either slice into individual pages or scan on the flatbed.
    8. Continue 4-7 until you've run out of magazine.
    9. Scan the sliced-off spine in The GIMP.
    10. Process the images through ImageMagick to correct the poor output of the new scanner.
    11. Load images into Scan Tailor and run through the workflow there:
      1. Correct orientation if necessary.
      2. Set page split (in this case, it's whole-page).
      3. Correct for geometric distortion (normally just rotating, but sometimes more).
      4. Select content. This takes by far the longest time, 'cos the automatic content selection algorithm doesn't work on busy magazine pages.
      5. Remove all margins, set most pages to affine scaling except inserts/foldouts which are smaller than the rest of the pages, which get fixed-ratio scaling to they're not visibly distorted. Set left-hand pages to be right-aligned, vice-versa.
      6. Export modified images.
    12. You've now got a folder of 5-15GB of scan PNGs, 6-20GB of tweaked TIFFs. Process with a script I wrote that:
      1. Converts the TIFFs to JPEGs at 65% quality (looks great at 300dpi, keeps the filesize reasonable.) (ImageMagick)
      2. Optimises the JPEGs (shaves 5-10% off the filesize losslessly.) (moz-jpeg)
      3. OCRs the JPEGs and turns them into PDFs with a hidden but searchable and selectable text layer. (Tesseract OCR)
      4. Combines all the individual PDFs into one big PDF and deletes the temporary files. (pdftk)
    13. Repeat until out of magazines.
    The script is parallelised through GNU Parallel at each step, meaning it runs a job per page across as many CPU cores as I have - effectively linear scaling, until you hit memory and disk bandwidth limits. My 2700X has eight cores and 16 threads - so I run 16 workers, which I benchmarked as faster than eight but not twice as fast 'cos they're not real cores. I'd love a CPU with more cores, but even then you're only looking at about five minutes per magazine - it's the manual stuff that takes the time.
     
    Last edited: 1 Apr 2025
    Byron C likes this.
  2. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,625
    Likes Received:
    9,228
    Worklog:

    PC Pro Issue 16
    First one scanned from a pile I collected over the weekend. It's a tough one: 368 pages including the covers, several fold-outs, and the catalogue pages are tissue-thin.

    Found a problem with my cutting process: it ends up slicing the spine at an angle, so the back pages are smaller than the front pages. Only a few mill, but still. Some middle pages suffered during the process, too, and the advertisers don't respect the gutter so some are missing a letter or two from the beginning or end of lines. Annoying.

    pcp16.jpg

    Scaled everything with a fixed ratio. Final file is 440MB at 65% quality - 85% was 781MB(!), 70% was 480MB and looked identical to 65%.


    PC Pro Issue 17
    Second scan, bit larger than Issue 16. Even more foldouts, plus a Compaq bookmark with a ribbon on it.

    Had more difficulties cutting the spine on this one, and it's even more slanted. Switched from a sharpened folding knife to a Stanley with replaceable blades to see if that helps. Had real trouble getting the pages to feed straight, for some reason.

    pcp17.jpg

    Scaled using affine for most pages, to remove white borders. 426MB at 65%.


    PC Pro Issue 37 (WIP)
    Bloody hell: this one's the size of Issue 16 and Issue 17 put together. Bigger, even: 797 pages.

    Should not have tried to cut this whole: it went badly. Pages near the middle and end have lost the most material yet, though it again only really affects adverts and catalogue inserts. The cuts have gone wobbly towards the top edge of the middle pages, too. I blame myself entirely: I was rushing at this point, as I'd been messing around all day - mostly to prove to the missus that the magazines weren't going to live in the outbuilding forever. Too late now, but it's a lesson learned - sorry, Issue 37.

    Plan for the next one: do what I should have been doing all along, and slice a few pages out then remove them instead of leaving them in place until the entire spine is removed. Obviously, a guillotine would be better - but you find me a hand-powered one that'll do 300+ sheets neatly. A bandsaw would work, too, but I'm probably not to be trusted with a bandsaw.

    As more evidence I was getting tired and not thinking straight, I had to delete and re-scan about 140 pages, after re-sorting the discarded sheets, 'cos I had started putting the stacks in upside-down - so instead of scanning page 101-102-103 I was scanning 103-102-101. Quicker to sort by rescanning than trying to sort in Simple Scan!

    upload_2025-3-3_10-57-58.png (!)

    Finished scanning this one late last night; still needs to be processed through Scan Tailor and turned into a PDF.


    PC Pro Issue 91
    Decided to try my new slowly-slowly cutting approach - and maybe do what I suggested below and focus on scanning the magazines before doing the processing, rather than taking a single magazine through to finished PDF before starting on the next.

    proproslice.jpg

    Wish me luck!

    EDIT: Oh, yeah, this is way better. I was an idiot for trying to do it all in one go before. The edges are clean as you like, and I'm minimising page loss. At least I only got through three before figuring that out!

    This went so well. I'm utterly thrilled. Sure, a few pages are a bit skewed - but the bulk went through the scanner practically ready to dump to a PDF.

    scanning.jpg

    327 pages, 6GB as PNGs. Onto the next!


    PC Pro Issue 126
    Later than my personal taste, but it's rude not to scan it if I've got it. Again, the new slicing process worked way better, and it fed through wonderfully.

    I got really confused, though, 'cos I seemed to be missing a bunch of pages in the middle. Went back to the paper... nope, somebody messed with the flatplan and didn't update the numbering: there's Page 228, then 17 pages of unnumbered catalogues, then claims to be at Page 286. Bizarre. I wondered why I had fewer PNGs than numbered pages!

    EDIT: Scanned, went well apart from the confusing page numbering. Oh, and an Acer tri-spread fold-out, which was SUPER ANNOYING 'cos my scanner isn't that big. Sliced it up and scanned it in three parts (so six, 'cos double-sided) and my scanner isn't accurate enough to mean it actually lines up again afterwards. Good enough, but I'm not happy with it. That'll do for tonight, I think. 294 pages plus two oversized pages for the Acer spread, 4.2GB.


    PC Pro Issue 34
    The new cutting process continues to deliver. However, Software Warehouse remains the bane of my life: they pay for, like, a hundred-page catalogue, but the cheapest possible onion skin paper. Today the inevitable happened: the paper jammed in the ADF and crumpled. Twice. Had to do it ten sheets at a time to get it to feed. Flattened the crumpled pages out and put them on the flatbed - they're wrinkled but readable.

    684 PNGs, 10.8GB.



    PC Pro Issue 32
    Had to replace the scanner's page separator - it's not designed for this kind of volume. Even then, this was a nightmare - could not get pages to scan straight. No crumpling or eating this time, thankfully. It's taken me, like, two soddin' hours just to cut and scan, though, I haven't even processed it. Gah.



    Misc. Notes
    What I should be doing, of course, is scanning all the magazines to PNG to get that job over and done with and then processing the PNGs into TIFFs and the final PDFs at my leisure. Trouble is, the scanning's the dull part; getting the PDF is the fun bit!


    Progress:
    • Issue 16: Done, small content loss at the edges of later catalogue pages.
    • Issue 17: Done, small content loss at the edges of later catalogue pages.
    • Issue 37: Scanned, larger content loss at the edges of later catalogue pages.
    • Issue 91: Scanned, new cutting technique, no content loss.
    • Issue 126: Scanned, no content loss.
    • Issue 34: Scanned, no content loss, two sheets of the Software Warehouse catalogue are crumpled due to ADF misfeeds - they cheaped out on the paper, it's waffer-theen.
    • Issue 88: Cleaned, scanned.
    • Issue 66: Cleaned, scanned, one Software Warehouse page got eated, cut's a little wobbly but again it only affects the catalogues who have never heard the term "gutter."
    • Issue 14: Cleaned.
    • Issue 15: Cleaned.
    • Issue 58: Cleaned.
    • Issue 32: Cleaned, scanned.
    • Issue 48: Cleaned.
    • Issue 28: Cleaned.
    • Issue 56: Cleaned.
    • Issue 31: Cleaned.
    • Issue 53: Cleaned.
    • Issue 27: Cleaned.
    • Issue 38: Cleaned.
    • Issue 49: Cleaned.
    • Issue 12: Cleaned.
    • Issue 65: Cleaned, scanned, trimmed a little too much off but it's fine.
    • Issue 47: Cleaned, scanned.
    • Issue 72: Cleaned.
    • Issue 73: Cleaned.
    • Issue 36: Cleaned, scanned, some water damage to the rear corner of later pages.
    • Issue 55: Cleaned
    • Issue 24: Cleaned
    • Issue 89: Cleaned, scanned.
    • Issue 90: Cleaned, scanned on the new scanner, tiny stripe on every other page owing to muck on one of the platens.
    • Issue 23: Cleaned
    • Issue 75: Cleaned
    • Issue 76: Cleaned
    • Issue 29: Cleaned
    • Issue 33: Cleaned
    • Issue 74: Cleaned
    • Issue 42: Cleaned
    • Issue 116: Cleaned, scanned, recovered after the desktop crashed(!)
    • Issue 122: Cleaned, scanned.
    • Issue 60: Cleaned, scanned.
    • Issue 30: Cleaned
    • Issue 51: Cleaned
    • Issue 61: Cleaned
    • Issue 21: Cleaned
    • Issue 35: Cleaned
    • Issue 21: Cleaned
    • Issue 54: Cleaned
    • Issue 44: Cleaned
    • Issue 40: Cleaned
     
    Last edited: 1 Apr 2025
  3. bawjaws

    bawjaws Multimodder

    Joined:
    5 Dec 2010
    Posts:
    4,633
    Likes Received:
    1,211
    What do you do with the sliced up magazines once scanned? Recycling bin?
     
  4. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,625
    Likes Received:
    9,228
    Yup. Breaks my heart - well, it's the taking the knife to 'em that breaks my heart - but needs must: better there's a usable scan on the Internet Archive for everyone to enjoy (or there will be, that's a job for Future Gareth) than an intact physical copy sat in the corner of my office for eternity.

    Archivists doing this will store the pages, but they're working with a lot more storage space! These particular mags aren't in the best condition, either: the good ones are mildewed, while the worst ones have visible mould growth that I'm going to have to sort out at some point.
     
    The_Crapman and bawjaws like this.
  5. bawjaws

    bawjaws Multimodder

    Joined:
    5 Dec 2010
    Posts:
    4,633
    Likes Received:
    1,211
    Yeah that is absolutely fair enough. I remember binning my near-complete collection of Custom PC (remember that?) and what sticks in my mind is a) how much shelf space the magazines took up and b) how bloody heavy a hundred or so magazines is!
     
    Gareth Halfacree likes this.
  6. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,625
    Likes Received:
    9,228
    Oh, my god. Why was I trying to cut through the entire magazine at once? What a tool. Cutting it gradually is easier, not really any slower 'cos I can combine it with the separating-pages step, and the edges are clean. Even the right-to-the-edge onion-skin catalogue pages have come out nice!

    I feel very stupid, and wish I could go back in time so I don't rip edges in the last three issues. Gah.
     
    IanW likes this.
  7. Yaka

    Yaka Multimodder

    Joined:
    26 Jun 2005
    Posts:
    2,490
    Likes Received:
    519
    how long does one magazine take to scan?
     
  8. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,625
    Likes Received:
    9,228
    The first one I scanned took about two and a half hours, start to finish - from paper to PDF.

    It's quicker now I'm doing a better job of it, though I haven't taken either of the latest scans throughto to PDF stage yet - just raw PNGs.

    I *could* do it a lot more quickly if I just went straight from scanner to PDF - but it looks so much nicer tidied up through Scan Tailor!
     
  9. Krikkit

    Krikkit All glory to the hypnotoad! Super Moderator

    Joined:
    21 Jan 2003
    Posts:
    24,091
    Likes Received:
    786
    What an endeavour, beautiful result though!
     
    Yaka and Gareth Halfacree like this.
  10. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,625
    Likes Received:
    9,228
    Cleaning! I've been avoiding this, 'cos grim, and concentrating on the better copies.

    Some of the magazines have visible mould growing on the page edges - like, fluffy white stuff, not just a few black spots. It's what happens when you keep papers in tightly-packed closed-backed bookcases for a decade or three: lack of air circulation combined with condensation equals mould.

    mould.jpg

    Now, I want to scan these. But I don't want to fill my office with mould spores. So, I had a crack at cleaning one:
    1. Take the magazine outside. I can't stress this step enough.
    2. Ensure magazine is dry. If it isn't, let it dry.
    3. Vacuum the magazine with a soft brush to remove all loose mould.
    4. Wipe the covers, spine, inside covers, and all three page edges with a cloth liberally splashed with isopropyl alcohol. Resist taking a swig; it's not that kind of alcohol.
    5. Allow the isoprop to evaporate.
    6. Check inside for any other growth.
    I now have an Issue 38 ("Biggest ever issue - over 700 pages!") which still smells of mildew but is *not* fuzzy. Happy with that!

    So, I may end up putting the scanning on hold in order to focus on cleaning the mags. I'm going to need a dry, preferably sunny, definitely not windy day for it, tho'...

    EDIT:
    Did seven and a supplement. Had to stop, though, 'cos it's gone dark and cold and my hands hurt.

    EDIT EDIT:
    This is one of the bad 'uns, post-treatement - Issue 88:

    pcpro88.jpg

    Happy with that! Still got a lot of work ahead of me, though - and that's before I get tempted to go visit @slugs again for the half I left behind...
     
    Last edited: 4 Mar 2025
    IanW likes this.
  11. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,625
    Likes Received:
    9,228
    Oh, and I was amused by this little bit of synchronicity:

    Imagepipe_0.jpg
    Not that I'm using any of the recommended software, mind!
     
    IanW, Arboreal and David like this.
  12. David

    David μoʍ ɼouმ qᴉq λon ƨbԍuq ϝʁλᴉuმ ϝo ʁԍɑq ϝμᴉƨ

    Joined:
    7 Apr 2009
    Posts:
    19,059
    Likes Received:
    7,989
    Really liking that you're doing this, G. I have fond memories of the OG PC pro contributors' content - John Honeyball, Steve Cassidy, Tim Danton etc..

    I may take a trip down memory lane.

    [edit] I'll also have a hunt in the loft to see if any of my old mags survived the last couple of house moves. I used to have boxes of PC Pro and a lesser quantity of Computer Shopper, but don't hold your breath - this is a hail mary and I suspect they're long gone.
     
    IanW and Gareth Halfacree like this.
  13. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,625
    Likes Received:
    9,228
    Big cleaning session at lunch - though I ended up just brushing the mould off 'cos some idiot forgot to charge the vac.

    (It me.)

    I'm now up to 18 cleaned issues awaiting slicing and scanning, plus a thin supplement which I'm hoping will be mentioned in the contents page of one of 'em because it was loose and I have no idea which one it goes with. That's the whole of one of the bags, which now lets me properly estimate how many magazines are left: there's another bag of the same size, so that's another 18-ish, then a slightly larger bag of not-so-mouldy ones, call that 20, then two-thirds of another bag, so say 15. That's, what, 70 issues to scan? Fewer than I thought - it's hard to estimate 'cos the later ones are thin and the earlier ones are brick-thick!

    If I can keep up the rate of two a day, I'd be done in a little over a month.
     
  14. yuusou

    yuusou Multimodder

    Joined:
    5 Nov 2006
    Posts:
    3,315
    Likes Received:
    1,399
    Remember to wear a mask. Don't want any of them nasties in your lungs.
     
  15. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,625
    Likes Received:
    9,228
    Yeah, I'm wearing an FFP3... now. Kinda didn't bother at the start, when I was concentrating on the mags without visible mould, and now I've got a bit of a cough. Funny that. I'm also running an air purifier in the office while I'm doing the slicing and scanning (literally sat on the same desk) - probably not that effective, but I had it already so it's no fuss to switch it on.
     
  16. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,625
    Likes Received:
    9,228
    BUGGERING SOFTWARE BUGGERING WARE BUGGERING HOUSE.

    softwarewarehouse.jpg

    Well, guess that page ain't getting scanned.
     
  17. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,625
    Likes Received:
    9,228
    Was having a 'mare trying to get this issue scanned: turns out the page separator in the scanner's going. It's a consumable part, and this poor Fujitsu was never built for this kind of volume.

    However, I have a lot of magazines left... and I've only got one more page separator. And I don't know where. And, of course, you can't buy 'em anywhere any more. Agh!

    (There's also one in the broken scanner, I guess - part-used, but it's there in a pinch.)
     
  18. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,625
    Likes Received:
    9,228
    Two chuffin' hours to slice and scan one issue. Remind me why I'm doing this to myself?
     
  19. David

    David μoʍ ɼouმ qᴉq λon ƨbԍuq ϝʁλᴉuმ ϝo ʁԍɑq ϝμᴉƨ

    Joined:
    7 Apr 2009
    Posts:
    19,059
    Likes Received:
    7,989
    You're a hopeless romantic?
     
    Gareth Halfacree likes this.
  20. Nealieboyee

    Nealieboyee Packaging Master!

    Joined:
    14 Aug 2009
    Posts:
    3,869
    Likes Received:
    511
    Gareth why don't you use your phone to scan them with an app like Genius Scan? Automatically finds the edges or corners. I've scanned whole books like this and it's literally a case of paging through the magazine. Each page takes as long as the app needs to find the edges, which is about two seconds. Even if the page isn't flat, the app will flatten it for you.
     

Share This Page