1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Linux Awkward Awk

Discussion in 'Software' started by Gareth Halfacree, 5 Aug 2024.

  1. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,359
    Likes Received:
    7,180
    So, I'm trying to... don't laugh, but I'm trying to create a simple static site generator... as a bash script, using only standard tools you'd find on any Linux (or BSD, for that matter) install.

    For Reasons.

    I've got a page generator (separate to the site generator, so it can be parallelised) and it works (amazingly):

    Code:
    tail -n +2 "$1" | cat header.html - footer.html | awk 'NR==FNR {a[n++]=$0; next}/--TITLE--/{print a[0]; next}1' "$1" - | tee generated-pages/"$1"
    
    (I've sliced out the boilerplate stuff and only put the meat in there. Yes, it's one line.)

    To explain: header.html and footer.html are headers and footers, obviously. header.html contains a line reading "--TITLE--". $1 is an HTML file with the page contents, the first line of which is "<title>Title</title>".

    The script first reads in the page contents file minus the first line, then concatenates the header and footer on it; this is then piped to awk, which reads the first line of the page contents file and substitutes that for "--TITLE--" so the title tag goes in the <head> section where it belongs. Then everything is piped out to tee, so I can see it in the terminal, and to a generated page.

    Now, like I say, this works. However, the awk part is very, very stupid.

    Why is it stupid? Because I'm reading in the entire page contents and creating an array, only to ignore everything except the first line.

    I don't normally use awk, and my attempts to figure out how to do this Not Stupidly have hit a brick wall - and I've got seven articles to write, so I need to crack on with those. There is a getline which does what I want, and works great... except I can't get it to accept a bash variable as the filename from which it reads. The way I'm doing it now, I can use a bash variable... but I'm stuck reading the entire file.

    I can read the entire file, it's HTML, we're talking kilobytes at worst, and the whole thing runs in well under a second (0.005s real-time, apparently), but it annoys me.

    Any awksperts got any ideas? Any seddites want to show how much better it is than awk?
     
  2. yuusou

    yuusou Multimodder

    Joined:
    5 Nov 2006
    Posts:
    2,941
    Likes Received:
    1,026
    How about:
    • Put the first line in a variable
    • Print out the documents as you were doing
    • Replace the first occurrence of --TITLE-- using sed
    • tee
    Code:
    fl=$(head -n 1 "$1"); tail -n +2 "$1" | cat header.html - footer.html | sed "0,/--TITLE--/s/--TITLE--/$title/" | tee generated-pages/"$1"
    EDIT:
    it could probably be simpler even. Don't really need to filter the file as sed will read the first line regardless.
    Code:
    fl=$(head -n 1 "$1"); tail -n +2 "$1" | cat header.html - footer.html | sed "s/--TITLE--/$title/" | tee generated-pages/"$1"
     
    wyx087 likes this.
  3. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,359
    Likes Received:
    7,180
    I'll give it a go once I've got the morning's work squared away - cheers!
     
  4. sandys

    sandys Multimodder

    Joined:
    26 Mar 2006
    Posts:
    5,022
    Likes Received:
    779
    Struggling to understand what you are doing but don't you get the first line and print what you want from it then exit, perhaps using a BEGIN in the awk? Not sure why you need an array, just change the order of how your doing things to get the order wjere you want after pulling out the Title.

    Use Awk a fair bit but no expert just fumble around until it works :D passing variables etc with awk and bash scripts I use a lot.
     
  5. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,359
    Likes Received:
    7,180
    I have tried this, using awk's getline. It works perfectly... if I write the name of the file to read myself. If I tell it to open $1, though, it tries to open a file literally called "$1".
     
  6. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,359
    Likes Received:
    7,180
    sed ain't happy with that: after swapping "fl" to "title" (to match the later use of $title) I get:

    sed: -e expression #1, char 26: unknown option to `s'

    Looks like it's 'cos what sed's swapping in there has angle brackets ("<title>Title</title>"). If I edit the HTML file so it's just plain text ("Title") it works fine. I could escape the brackets in the source file, but that's an ugly solution. EDIT: Hmm, or I could pass the variable through sed to escape the brackets for me, which is even uglier but hidden from view...

    I really could just leave it reading the whole file: I tested it on the weedy old dual-core laptop last night, with a source file 4,563 lines long (the entirety of The Bee Movie script, my go-to for large-text-file testing): it finished generating the page in 0.045s wall time, no errors.

    EDIT:
    I'm an idiot, it's not the brackets - it's the slash!

    Code:
    title=$(head -n 1 "$1"); tail -n +2 "$1" | sed ':a;N;$!ba;s/\n/<br \/>\n/g' | cat header.html - footer.html | sed "s@--TITLE--@$title@" | tee generated-pages/"$1"
    That works fine, now I'm not terminating the substitution early by including the delimiter in my text. Well, it'll work fine as long as I don't put an @ in a page title, anyway...
     
    Last edited: 6 Aug 2024
    yuusou likes this.
  7. yuusou

    yuusou Multimodder

    Joined:
    5 Nov 2006
    Posts:
    2,941
    Likes Received:
    1,026
    If this is what you want to do, then you'll wanna try using a file descriptor, something like <(echo $1)
     
  8. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,359
    Likes Received:
    7,180
    Nah, I'll stick with sed. The performance of both options (and the Secret Third Option, of using envsubst) is microsecond-identical, as far as I can tell, and the sed version is more readable. (envsubst is even more readable, though it does require me to use $TITLE instead of --TITLE-- as the target to be replaced - and it will do every instance in the file, rather than just the first.)
     

Share This Page