1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Other Scanning Old Magazines - a Worklog

Discussion in 'Photography, Art & Design' started by Gareth Halfacree, 3 Mar 2025.

  1. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,017
    Likes Received:
    8,045
    Issue 47 scanned, went smooth. Separated every page by hand, though of course none were stuck together this time. It's so tempting to just grab a nice neat stack instead of page-by-page then try to tidy up again... Gah.

    I am, however, concerned about the bigger issues. I've got some in the same bag that are twice the size of this one, and I was already using all 32GB of my actual RAM *and* 22GB of swap... The poor thing was working so hard that MOC died and my music stopped. (I thought the damn thing had crashed again!)

    I'm sorely tempted to upgrade to 64GB, but that's more money thrown at the problem - plus if I'm running out of RAM now on these issues, I'll still run out of RAM on the once that are twice as thick. And 128GB seems... excessive, both in price and capacity.

    I wonder if Simple Scan supports batch-scanning, continuing an existing increment? Experiment time!

    EDIT:
    It does not. My options are "overwrite" or "cancel." Bah.

    I *could* do it myself: if I scanned PC Pro 47 in two batches, for instance, I could have called the first batch "pcpro047-0" and the second "pcpro047-1". Then Scan Tailor would create the files "pcpro047-0-1.png", "pcpro047-0-2.png," and so on for the first batch, and "pcpro047-1-1.png," "pcopro047-1-2.png" and so on for the second. Which should load into Scan Tailor in the right order. Probably.
     
    Last edited: 1 Apr 2025
  2. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,017
    Likes Received:
    8,045
    Okay, I'm fiddling with the processing again. Yes, with yet another ImageMagick one-liner. This time it's "convert -define trim:edges=east,west -fuzz 15% -trim -sigmoidal-contrast 5x40% +repage".

    What does that do? It does nice things... I think?

    zz1.jpg
    zz2.jpg
    zz3.jpg
    zz4.jpg

    And that Matrox advert, the one where I'd lost the gridlines?

    zz5.jpg

    It's not as contrasty as before, but the gridlines are visible. That's not nothing!

    It's also more... honest about the pages, I think? Like...

    zz2-crop.jpg

    It's clean and bright, but there's still texture there. Mind you...

    zz2-crop-2.jpg

    ...poor Jon's still looking a little florid.
     
    IanW likes this.
  3. wyx087

    wyx087 Multimodder

    Joined:
    15 Aug 2007
    Posts:
    12,251
    Likes Received:
    804
    Hey there, what dose it say about Homeworld? can I have a read?
     
  4. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,017
    Likes Received:
    8,045
    You're in luck: I've just shut up the office, but before I did I threw those pages into a PDF to test them on the Remarkable.

    It's just a short preview, but it's in 'ere if you're curious.
     
    wyx087 likes this.
  5. wyx087

    wyx087 Multimodder

    Joined:
    15 Aug 2007
    Posts:
    12,251
    Likes Received:
    804
    Thanks very much.

    Very impressive quality. Surprisingly, text select and search works, somewhat.
     
  6. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,017
    Likes Received:
    8,045
    Yeah, I run them through Tesseract OCR as part of the scan-to-PDF workflow. It's not perfect, but good enough for keyword search. (Tends to ignore text on dark-coloured backgrounds, annoyingly - there might be a flag I can set to fix that, I need to dig into the docs.)
     
    wyx087 likes this.
  7. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,017
    Likes Received:
    8,045
    Quick break from work for some more processing experiments. Nothing too smart this time, just dropping the centre point of the contrast pass from 40% to 30%:

    time.jpg

    Yes, it's a little paler - but I'm aiming for a more true-to-life Jon, as my benchmark:

    jon.jpg

    Getting there!

    EDIT:
    25%?

    jon2.jpg

    EDIT EDIT:
    4.5x25% makes the inside pages nice, but the cover does come out a little pale:


    cover.jpg

    Not *terribly* pale, but the red's certainly a shade or two lighter than it should be. But if I drop back down to 5x25% (or 5x30%, or my original 5x40%) then poor Jon's still very tomatoesque. Hmm.

    I could process the covers independently, but I'd rather not.
     
    Last edited: 3 Apr 2025
    wyx087 likes this.
  8. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,017
    Likes Received:
    8,045
    Also experimenting with deskew at the same time. I don't like using it, as a general rule, 'cos it has a tendency to decide to rotate random pages diagonally for no reason, but if I could get it working reliably it'd save me so much time. The recommended setting is 40%; I'm doing a full-mag run at 25%, which I confirmed as correctly deskewing a slightly skwiffy advert page. Fingers crossed!

    It's 36% the way through now, reckons there's about 280 more seconds to go. Pretty computationally intensive - my CPU fan's letting me know it's unhappy with what I'm doing.

    7m4s, 660 pages. Let's have a look.

    Here's the sqwiffy page, straight from the scanner (I mean, scaled, obviously, the actual file's 12MB):

    quantexraw.jpg

    And here's the post-processed version:

    quantexprocessed.jpg

    Flat! Bright! Happy!

    Now, to page through 660 PNG files to find ones at weird angles...

    EDIT:
    Got to admit, this was one I figured it'd screw up:

    angled.jpg

    But no, that's right!

    EDIT EDIT:
    No errors! Nice straight pages! Huzzah!
     
    Last edited: 3 Apr 2025
    IanW likes this.
  9. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,017
    Likes Received:
    8,045
    Disaster! (Well, not really, but it's annoying.)

    There's a flaw in my scan-to-PDF script. It works absolutely fine, right up until the very end when it uses pdftk to merge all the individual PDFs Tesseract created into one big PDF. This, naively, is done thusly:

    pdftk *pdf cat output output.pdf

    If you know shell scripting, you'll know that "*pdf" is a shell expansion which means "every file in the current directory ending in the letters pdf". This has always worked absolutely fine for me, no notes, 10/10. But... there's a limit to how many files you can squeeze into a single command line. I hadn't hit it before, as I'd only scanned relatively small mags. 660 pages, though? Yeah, that's too many files. The script just halts at that point.

    So, I now need to find a way to merge all the PDFs, in the correct order, without hitting the file limit. Boo-urns.


    Turns out pdftk is *fine* with 660 files at once. So long as one of the files isn't zero bytes, 'cos then it'll choke on it.

    No idea why Tesseract threw a wobbler on that one, the input's no different to any other.
     
    Last edited: 3 Apr 2025
  10. yuusou

    yuusou Multimodder

    Joined:
    5 Nov 2006
    Posts:
    3,182
    Likes Received:
    1,259
    Something like this?
    Code:
    find . -maxdepth 1 -name "*.pdf" | sort | xargs -n 99 bash -c '[ -f output.pdf ] && set -- output.pdf "$@"; pdftk "$@" cat output output.pdf' _
    EDIT: just saw your edit. I thought you'd hit an argument limit.
     
  11. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,017
    Likes Received:
    8,045
    Yeah, that's what I thought, too, but nope!

    There were actually *two* issues with that run, and I have no idea what caused either of 'em. The most obvious was when the script hung, which turns out to have been because Tesseract created one zero-byte output file which pdftk then rejected as not being a PDF. Fair.

    The other issue: I was flicking through the output PDF (once I'd got it made) and suddenly spotted that the even and odd pages were swapped. Went to the beginning, fine. Flipped the pages until I found where the problem is: no page 140. There's a Page 139, four pages of Compaq insert, then Page 141.

    Oh, no! Have I thrown away an unscanned page?

    No, wait, I can't skip one page. I mean, I physically can't. The scanner scans both sides of a sheet at the same time. I could skip two pages, or any multiple thereof, but I can't skip one.

    Went to the original files: there's P140, just before the Compaq insert. Went to the PDFs, no P140. Went to the processed PNGs... also no P140. Turns out ImageMagick just... didn't do that one. Didn't error out, just... didn't do it. Reprocessed it fine.

    So, yeah, two very puzzling errors, there. I do hope it's not a sign that my new memory's unstable...
     
  12. yuusou

    yuusou Multimodder

    Joined:
    5 Nov 2006
    Posts:
    3,182
    Likes Received:
    1,259
    I think your kernel would panic if the memory were bad.
     
  13. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,017
    Likes Received:
    8,045
    Not "bad," but running out-of-spec - it's supposed to be 3,600MHz stuff but the DOCP profile set 3,200MHz. I tried 3,600MHz, and it wouldn't POST; tried a step down, still wouldn't boot. I'm running at 3,400MHz now, and it's possible it's just not happy above 3,200MHz. Might need to leave it running a memtest, see what that says.

    In the meantime, I'm going to run through the PNG-to-processed-PNG-to-PDF workflow *again*, same mag. See what happens. (I need to anyway, 'cos I've only just realised I did it at the old 5x40% setting. D'oh!)

    EDIT:
    Okay, this time I have 660 processed files, which is the correct number. That's a start!

    EDIT EDIT:
    But it's choking on one of them in the convert-to-JPEG stage. "convert-im6.q16: no images defined `/tmp/tmp.tVMwyzYPUv/pcpro047-565.jpg' @ error/convert.c/ConvertImageCommand/3261."

    Two runs, same file each time. pcpro047-565.png definitely exists, so it shouldn't have any difficulty converting it to a JPEG file. But it does not like that one particular file:

    blacklaw@shodan:/media/RAM Disk/test$ convert pcpro047-564.png zzztest.jpg
    blacklaw@shodan:/media/RAM Disk/test$ convert pcpro047-565.png zzztest.jpg
    convert-im6.q16: no images defined `zzztest.jpg' @ error/convert.c/ConvertImageCommand/3261.

    D'you wanna know something neat? ImageMagick created the damn file in the first place.

    file says it's a PNG image, but I can't open it: Image Viewer says it has a "bad adaptive filter value."

    Definitely giving the new RAM the side-eye, here. I've never had this before.
     
    Last edited: 3 Apr 2025
  14. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,017
    Likes Received:
    8,045
    Tried opening that dodgy image in The GIMP: the entire desktop froze. Totally unresponsive.

    Hit the reset switch, and went into the BIOS to re-set everything back to the DOCP defaults:

    description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 3200 MHz (0.3 ns)

    Not what you'd hope for from RAM sold as 3,600MHz, but whatever. So long as it's stable!

    Now, let's try all that again, shall we?

    EDIT:
    Speaking of real-world differences between 3,200MHz, 3,400MHz, and 3,600MHz: it just finished processing the files in 7m12s. At 3,400MHz, it took 7m4s. And I was doing heavier stuff in the background this time, so the actual difference is even less. Not going to lose too much sleep over that!

    I now have 660 PNGs. Or at least 660 files that end in PNG. Fingers crossed...
     
    Last edited: 3 Apr 2025
  15. noizdaemon666

    noizdaemon666 I'm Od, Therefore I Pwn

    Joined:
    15 Jun 2010
    Posts:
    6,191
    Likes Received:
    892
    https://www.corsair.com/uk/en/explorer/diy-builder/memory/amd-expo-vs-docp/

    I know it isn't strictly on-topic, but buried in there is the reason your 3600MHz RAM isn't 3600MHz RAM. DOCP is an ASUS technology which essentially interprets the XMP data and sets the correct speeds and timings (I think, I've only scan-read about it) but on AMD it only works up to 3200MHz because, you know, reasons. Likely to be AMD memory controllers requiring different timings for the higher speeds, hence the no-boot you encountered at 3600MHz, but as you've found out, the extra speed doesn't make a massive difference to general workloads.
     
    Arboreal and Gareth Halfacree like this.
  16. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,017
    Likes Received:
    8,045
    Well, that'd do it! Cheers!
     
  17. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,017
    Likes Received:
    8,045
    Finished a full scanned-PNGs-to-processed-PNGs-to-PDF run: no errors. Seven minutes to process the images, a little under nine to create the PDF. But, here's the important bit, it's entirely automated. I did nothing. I didn't load it into Scan Tailor and spend four hours tweaking each page, I just YOLO'd it through ImageMagick and Tesseract.

    Full issue downloadable here (for the next two weeks, at least) for the curious. Note that I'm still dialling in the compression settings: that's at 50% JPEG, which is higher than I'd like to use but even then it's 500MB(!).

    Without doing the manual tweaking, the results aren't as good. The two-page spreads don't line up perfectly, as I'm not correcting page deformation during scanning. There's a gray border around some pages from the scanner, and a white border around that from the deskewing; now, I could automagically fix that by cropping a fixed number of pixels off each image - but we'd risk losing page content on pages where it does reach to the edge of the image.

    Thoughts welcome!

    EDIT:
    While I'm at it, here's a version compressed with jpegli instead of ImageMagick, same 50% quality target. It comes out around 30MB smaller, which is nice, but takes around 50s longer to process. Artefacts are less pronounced, but only 'cos the image is also blurrier in general. Both look fine at normal fill-my-screen-with-the-page zoom levels, though.
     
    Last edited: 3 Apr 2025
  18. Byron C

    Byron C I was told there would be cheesecake…?

    Joined:
    12 Apr 2002
    Posts:
    11,184
    Likes Received:
    5,882
    Not that I have any desire to add to your already considerable workload, but you interested in donations to the cause Mr H?

    They might not press your nostalgia buttons, but I just rescued 7 or 8 copies of mid-2000s-era Edge magazine from my mother’s storage lockup.

    A bit sniffy and up its own backside at times, but for my money Edge magazine had the best artwork and visuals going. At one point I read it religiously, I have no idea what happened to my collection… Mind you, at one point I had an almost complete collection of ST Format magazines… Sadly those were lost long before I started reading Edge…
     
    Gareth Halfacree likes this.
  19. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    18,017
    Likes Received:
    8,045
    I wouldn't say no - Edge is a lot easier to scan, 'cos it doesn't have the onionskin catalogues and card inserts, and is about a sixth the size!

    They may have already been archived, tho'. I've got a folder here, hang on...

    Code:
    edge [uk] 001.pdf
    edge [uk] 002.pdf
    edge [uk] 003.pdf
    edge [uk] 004.pdf
    edge [uk] 005.pdf
    edge [uk] 006.pdf
    edge [uk] 007.pdf
    edge [uk] 008.pdf
    edge [uk] 009.pdf
    edge [uk] 010.pdf
    edge [uk] 011.pdf
    edge [uk] 012.pdf
    Edge_UK_012.pdf
    edge [uk] 013.pdf
    edge [uk] 014.pdf
    edge [uk] 015.pdf
    edge [uk] 016.pdf
    edge [uk] 017.pdf
    edge [uk] 018.pdf
    Edge_UK_018.pdf
    edge [uk] 019.pdf
    Edge_UK_019.pdf
    edge [uk] 020.pdf
    edge [uk] 021.pdf
    Edge_UK_021.pdf
    edge [uk] 022.pdf
    Edge_UK_022.pdf
    edge [uk] 023.pdf
    Edge_UK_023.pdf
    edge [uk] 024.pdf
    Edge_UK_024.pdf
    edge [uk] 025.pdf
    Edge_UK_025.pdf
    edge [uk] 026.pdf
    Edge_UK_026.pdf
    edge [uk] 027.pdf
    Edge_UK_027.pdf
    edge [uk] 028.pdf
    Edge_UK_028.pdf
    edge [uk] 029.pdf
    Edge_UK_029.pdf
    edge [uk] 030.pdf
    Edge_UK_030.pdf
    edge [uk] 031.pdf
    Edge_UK_031.pdf
    edge [uk] 032.pdf
    Edge_UK_032.pdf
    edge [uk] 033.pdf
    edge [uk] 034.pdf
    edge [uk] 035.pdf
    edge [uk] 036.pdf
    edge [uk] 037.pdf
    edge [uk] 038.pdf
    edge [uk] 039.pdf
    edge [uk] 040.pdf
    edge [uk] 041.pdf
    edge [uk] 042.pdf
    edge [uk] 043.pdf
    edge [uk] 044.pdf
    edge [uk] 045.pdf
    edge [uk] 046.pdf
    edge [uk] 047.pdf
    edge [uk] 048.pdf
    edge [uk] 049.pdf
    edge [uk] 050.pdf
    edge [uk] 051.pdf
    edge [uk] 052.pdf
    edge [uk] 053.pdf
    edge [uk] 054.pdf
    edge [uk] 055.pdf
    edge [uk] 056.pdf
    edge [uk] 057.pdf
    edge [uk] 058.pdf
    edge [uk] 059.pdf
    edge [uk] 060.pdf
    edge [uk] 061.pdf
    edge [uk] 062.pdf
    edge [uk] 063.pdf
    edge [uk] 064.pdf
    edge [uk] 065.pdf
    edge [uk] 066.pdf
    edge [uk] 067.pdf
    edge [uk] 068.pdf
    edge [uk] 069.pdf
    edge [uk] 070.pdf
    edge [uk] 071.pdf
    edge [uk] 072.pdf
    edge [uk] 073.pdf
    edge [uk] 074.pdf
    edge [uk] 075.pdf
    edge [uk] 076.pdf
    edge [uk] 077.pdf
    edge [uk] 078.pdf
    edge [uk] 079.pdf
    edge [uk] 080.pdf
    edge [uk] 081.pdf
    edge [uk] 082.pdf
    edge [uk] 083.pdf
    edge [uk] 084.pdf
    edge [uk] 085.pdf
    edge [uk] 086.pdf
    edge [uk] 087.pdf
    edge [uk] 088.pdf
    edge [uk] 089.pdf
    edge [uk] 090.pdf
    edge [uk] 091.pdf
    edge [uk] 092.pdf
    edge [uk] 093.pdf
    edge [uk] 094.pdf
    edge [uk] 095.pdf
    edge [uk] 096.pdf
    edge [uk] 097.pdf
    edge [uk] 098.pdf
    edge [uk] 099.pdf
    edge [uk] 100.pdf
    edge [uk] 101.pdf
    edge [uk] 102.pdf
    edge [uk] 103.pdf
    edge [uk] 104.pdf
    edge [uk] 105.pdf
    edge [uk] 106.pdf
    edge [uk] 107.pdf
    edge [uk] 108.pdf
    edge [uk] 109.pdf
    edge [uk] 110.pdf
    edge [uk] 111.pdf
    edge [uk] 112.pdf
    edge [uk] 113.pdf
    edge [uk] 114.pdf
    edge [uk] 115.pdf
    edge [uk] 116.pdf
    edge [uk] 117.pdf
    edge [uk] 118.pdf
    edge [uk] 119.pdf
    edge [uk] 120.pdf
    edge [uk] 121.pdf
    edge [uk] 122.pdf
    edge [uk] 123.pdf
    edge [uk] 124.pdf
    edge [uk] 125.pdf
    edge [uk] 126.pdf
    edge [uk] 127.pdf
    edge [uk] 128.pdf
    edge [uk] 129.pdf
    edge [uk] 130.pdf
    edge [uk] 131.pdf
    edge [uk] 132.pdf
    edge [uk] 133.pdf
    edge [uk] 134.pdf
    edge [uk] 135.pdf
    edge [uk] 136.pdf
    edge [uk] 137.pdf
    edge [uk] 138.pdf
    edge [uk] 139.pdf
    edge [uk] 140.pdf
    edge [uk] 141.pdf
    edge [uk] 142.pdf
    edge [uk] 143.pdf
    edge [uk] 144.pdf
    edge [uk] 145.pdf
    edge [uk] 146.pdf
    edge [uk] 147.pdf
    edge [uk] 148.pdf
    edge [uk] 149.pdf
    edge [uk] 150.pdf
    edge [uk] 151.pdf
    edge [uk] 152.pdf
    edge [uk] 153.pdf
    edge [uk] 154.pdf
    edge [uk] 155.pdf
    edge [uk] 156.pdf
    edge [uk] 157.pdf
    edge [uk] 158.pdf
    edge [uk] 159.pdf
    edge [uk] 160.pdf
    edge [uk] 161.pdf
    edge [uk] 162.pdf
    edge [uk] 163.pdf
    edge [uk] 164.pdf
    edge [uk] 165.pdf
    edge [uk] 166.pdf
    edge [uk] 167.pdf
    edge [uk] 168.pdf
    edge [uk] 169.pdf
    edge [uk] 170.pdf
    edge [uk] 171.pdf
    edge [uk] 172.pdf
    edge [uk] 173.pdf
    edge [uk] 174.pdf
    edge [uk] 175.pdf
    edge [uk] 176.pdf
    edge [uk] 177.pdf
    edge [uk] 178.pdf
    edge [uk] 179.pdf
    edge [uk] 180.pdf
    edge [uk] 181.pdf
    edge [uk] 182.pdf
    edge [uk] 183.pdf
    edge [uk] 210.pdf
    edge [uk] 211.pdf
    edge [uk] 212.pdf
    edge [uk] 213.pdf
    edge [uk] 214 .pdf
    edge [uk] 215.pdf
    edge [uk] 216.pdf
    edge [uk] 217.pdf
    edge [uk] 218.pdf
    edge [uk] 219.pdf
    edge [uk] 220.pdf
    edge [uk] 221.pdf
    edge [uk] 222.pdf
    edge [uk] 223.pdf
    edge [uk] 224.pdf
    edge [uk] 225.pdf
    edge [uk] 226.pdf
    edge [uk] 227.pdf
    edge [uk] 228.pdf
    edge [uk] 229.pdf
    edge [uk] 230.pdf
    edge [uk] 231.pdf
    edge [uk] 232.pdf
    edge [uk] 233.pdf
    edge [uk] 234.pdf
    edge [uk] 235.pdf
    edge [uk] 236.pdf
    edge [uk] 238.pdf
    edge [uk] 239.pdf
    edge [uk] 240.pdf
    edge [uk] 242.pdf
    edge [uk] 245.pdf
    edge [uk] 246.pdf
    edge [uk] 247.pdf
    edge [uk] 249.pdf
    edge [uk] 250.pdf
    edge [uk] 251.pdf
    edge [uk] 252.pdf
    If they're in that list already, my work is done!
     
    Byron C likes this.
  20. Byron C

    Byron C I was told there would be cheesecake…?

    Joined:
    12 Apr 2002
    Posts:
    11,184
    Likes Received:
    5,882
    That’s a big list! I’ll check when I get back home :happy:
     

Share This Page