1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

CPU Why I Love Many Core Computing

Discussion in 'Hardware' started by Gareth Halfacree, 27 Sep 2020.

  1. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,132
    Likes Received:
    6,726
    Yeah, yeah, I know all the arguments against just-throw-more-cores-at-the-problem - Amdahl's Law and all that. But...

    I have a shiny new Fujitsu SP-1425 scanner, with automatic document feeder. Which means I can, if I so choose, digitise old magazines and what-not pretty quickly. Which is neat.

    But if I want the benefits of digitisation, I have to OCR the resulting images. Which isn't a problem, Tesseract is free-as-in-speech-and-beer and does a fantastic job. Eventually. Running across a 300dpi magazine scan on my Ryzen 2700X, Tesseract takes about 132 seconds per page. Which adds up when you're scanning chunky magazines.

    The solution: parallelising the problem with GNU Parallel. I've got eight cores, 16 threads - so I can run 16 Tesseract workers at once. The output of GNU Parallel hammers home the difference that makes:

    Code:
    local:16/1/100%/132.0s
    That's after the first page has finished: average job speed of 132 seconds exactly.

    Code:
    local:0/36/100%/11.4s
    That's after the last page has finished: average job speed of 11.4 seconds.

    That's an 11.6x speed-up over running a single Tesseract worker. Okay, it's not a linear sixteenfold increase, but then I haven't got 16 cores - I've got eight cores each of which runs two threads, so every bit of extra performance above an eightfold boost is effectively a bonus.

    And there we go, nearly twelve times faster than if I were using Tesseract without GNU Parallel to speed things up:

    upload_2020-9-27_14-23-32.png

    A full-text-searchable PDF. Noice.
     
    Bloody_Pete and Anfield like this.
  2. Bloody_Pete

    Bloody_Pete Technophile

    Joined:
    11 Aug 2008
    Posts:
    8,438
    Likes Received:
    1,109
    We do the same thing at work with some of our image analytics on our massive datasets, just throw as many workers at it as possible! Had a 128 thread server crunching through it all, was a noisey beast though!
     
  3. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,132
    Likes Received:
    6,726
    I can imagine!

    The really nice thing about GNU Parallel, compared to something like xargs, is that you can set up remote workers - and all you need is SSH and a copy of GNU Parallel (and, y'know, the tool you're actually wanting to use) on each. Then when you execute a job, it runs it locally on all logical cores *and* across all logical cores on all accessible remote systems - copying the file it's working on to the remote system, working on it, and copying the result back to your local system. All pretty much invisibly.

    I don't use it much 'cos my desktop's the one with all the cores - my server's a dual-core and most of the rest of the hardware which gets left running all the time are some flavour of Raspberry Pi or other - but it's nice to have the option. Also means you can shove a noisy 128-core beast somewhere else and not be bothered by the noise!
     
    Bloody_Pete likes this.
  4. Bloody_Pete

    Bloody_Pete Technophile

    Joined:
    11 Aug 2008
    Posts:
    8,438
    Likes Received:
    1,109
    Oh really? I did not know that! I'll have to keep that in mind for the future! Would be useful to be able to did it distrubuted across all of our workstations is needed!
     
  5. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,132
    Likes Received:
    6,726
    Yup - pretty easy to use, too. You can either specify remote machines as you're constructing the command line or create a config file with 'em already in there.
     
    Bloody_Pete likes this.
  6. RedFlames

    RedFlames ...is not a Belgian football team

    Joined:
    23 Apr 2009
    Posts:
    15,421
    Likes Received:
    3,010
    interesting... something to have a play about with in WSL
     
  7. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,132
    Likes Received:
    6,726
    Well, turns out that "132 seconds per page" may not have been quiiiiite accurate.

    I've never used Tesseract before, so I had no idea what to expect from its performance - and, as I usually do for batch jobs, I threw it at GNU Parallel right away.

    By default, GNU Parallel creates as many workers as "logical CPUs" (i.e. threads). So, parallel tesseract spawns 16 jobs. In total, it took about 6m50s to complete the recognition job.

    That sounded like a long time, so I ran a for i in *jpg; do tesseract; done instead. Which finished in 2m10s. A third the time of the parallel version.

    Yeah. Not really proving my point so much, there.

    Figured maybe there's something about Tesseract that doesn't like running on "logical" CPUs, so I tried parallel -j8 tesseract to limit it to eight workers. The result: 0m34s. We're still looking at just shy of a fourfold increase, but nowhere near the twelvefold I thought I was enjoying.

    Oh, well, ne'er mind. It'll be interesting to see if things are any different on a longer PDF...
     
    Bloody_Pete likes this.
  8. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,132
    Likes Received:
    6,726
    Okay, this was irritating me so I did some investigation - because while I could see SMT having little to no impact on performance, it shouldn't hurt it that badly.

    To recap:
    upload_2020-9-28_12-4-53.png

    That's not what you'd expect to see.

    upload_2020-9-28_12-5-6.png

    That's *definitely* not what you'd expect to see.

    Turns out Tesseract isn't, as I falsely assumed, single-threaded. It has its own multithreading.

    upload_2020-9-28_12-5-35.png

    Which is crap. Seriously, a four percent performance boost on an 8C16T CPU? Pfft.

    So, what happens if you disable it?

    upload_2020-9-28_12-6-3.png

    That. That happens. 16 workers with threading disabled is demonstrably the fastest mode, as I had expected would be the case. It's 653 percent faster than the default mode with Tesseract's internal multithreading active, and 28 percent faster than running a single-threaded worker per physical CPU core.

    Itch scratched!

    EDIT:
    Decided to run one more test, to get a true view of the speed-up: the Apricot GW-BASIC Manual, 297 pages including covers.

    It took 319s for Tesseract alone, using its in-built multithreading.

    It took 37.5s for 16 Tesseract workers via GNU Parallel, with multithreading disabled.

    That's an 8.5x speedup. Can't complain at that!
     
    Last edited: 28 Sep 2020
    wyx087, Bloody_Pete, Spraduke and 3 others like this.

Share This Page