Linux Awkward Awk

Gareth Halfacree · 5 Aug 2024

So, I'm trying to... don't laugh, but I'm trying to create a simple static site generator... as a bash script, using only standard tools you'd find on any Linux (or BSD, for that matter) install.

For Reasons.

I've got a page generator (separate to the site generator, so it can be parallelised) and it works (amazingly):
Code:
tail -n +2 "$1" | cat header.html - footer.html | awk 'NR==FNR {a[n++]=$0; next}/--TITLE--/{print a[0]; next}1' "$1" - | tee generated-pages/"$1"
(I've sliced out the boilerplate stuff and only put the meat in there. Yes, it's one line.)

To explain: header.html and footer.html are headers and footers, obviously. header.html contains a line reading "--TITLE--". $1 is an HTML file with the page contents, the first line of which is "<title>Title</title>".

The script first reads in the page contents file minus the first line, then concatenates the header and footer on it; this is then piped to awk, which reads the first line of the page contents file and substitutes that for "--TITLE--" so the title tag goes in the <head> section where it belongs. Then everything is piped out to tee, so I can see it in the terminal, and to a generated page.

Now, like I say, this works. However, the awk part is very, very stupid.

Why is it stupid? Because I'm reading in the entire page contents and creating an array, only to ignore everything except the first line.

I don't normally use awk, and my attempts to figure out how to do this Not Stupidly have hit a brick wall - and I've got seven articles to write, so I need to crack on with those. There is a getline which does what I want, and works great... except I can't get it to accept a bash variable as the filename from which it reads. The way I'm doing it now, I can use a bash variable... but I'm stuck reading the entire file.

I can read the entire file, it's HTML, we're talking kilobytes at worst, and the whole thing runs in well under a second (0.005s real-time, apparently), but it annoys me.

Any awksperts got any ideas? Any seddites want to show how much better it is than awk?

yuusou · 6 Aug 2024

How about:

Put the first line in a variable

Print out the documents as you were doing

Replace the first occurrence of --TITLE-- using sed

tee
Code:
fl=$(head -n 1 "$1"); tail -n +2 "$1" | cat header.html - footer.html | sed "0,/--TITLE--/s/--TITLE--/$title/" | tee generated-pages/"$1"
EDIT:
it could probably be simpler even. Don't really need to filter the file as sed will read the first line regardless.
Code:
fl=$(head -n 1 "$1"); tail -n +2 "$1" | cat header.html - footer.html | sed "s/--TITLE--/$title/" | tee generated-pages/"$1"

Gareth Halfacree · 6 Aug 2024

I'll give it a go once I've got the morning's work squared away - cheers!

sandys · 6 Aug 2024

Struggling to understand what you are doing but don't you get the first line and print what you want from it then exit, perhaps using a BEGIN in the awk? Not sure why you need an array, just change the order of how your doing things to get the order wjere you want after pulling out the Title.

Use Awk a fair bit but no expert just fumble around until it works passing variables etc with awk and bash scripts I use a lot.

Gareth Halfacree · 6 Aug 2024

sandys said: ↑

but don't you get the first line and print what you want from it then exit, perhaps using a BEGIN in the awk?
Click to expand...

I have tried this, using awk's getline. It works perfectly... if I write the name of the file to read myself. If I tell it to open $1, though, it tries to open a file literally called "$1".

Gareth Halfacree · 6 Aug 2024

yuusou said: ↑
it could probably be simpler even. Don't really need to filter the file as sed will read the first line regardless.
Code:
fl=$(head -n 1 "$1"); tail -n +2 "$1" | cat header.html - footer.html | sed "s/--TITLE--/$title/" | tee generated-pages/"$1"
Click to expand...
sed ain't happy with that: after swapping "fl" to "title" (to match the later use of $title) I get:

sed: -e expression #1, char 26: unknown option to `s'

Looks like it's 'cos what sed's swapping in there has angle brackets ("<title>Title</title>"). If I edit the HTML file so it's just plain text ("Title") it works fine. I could escape the brackets in the source file, but that's an ugly solution. EDIT: Hmm, or I could pass the variable through sed to escape the brackets for me, which is even uglier but hidden from view...

I really could just leave it reading the whole file: I tested it on the weedy old dual-core laptop last night, with a source file 4,563 lines long (the entirety of The Bee Movie script, my go-to for large-text-file testing): it finished generating the page in 0.045s wall time, no errors.

EDIT:
I'm an idiot, it's not the brackets - it's the slash!
Code:
title=$(head -n 1 "$1"); tail -n +2 "$1" | sed ':a;N;$!ba;s/\n/<br \/>\n/g' | cat header.html - footer.html | sed "s@--TITLE--@$title@" | tee generated-pages/"$1"
That works fine, now I'm not terminating the substitution early by including the delimiter in my text. Well, it'll work fine as long as I don't put an @ in a page title, anyway...

yuusou · 6 Aug 2024

Gareth Halfacree said: ↑

I have tried this, using awk's getline. It works perfectly... if I write the name of the file to read myself. If I tell it to open $1, though, it tries to open a file literally called "$1".
Click to expand...

If this is what you want to do, then you'll wanna try using a file descriptor, something like <(echo $1)

Gareth Halfacree · 6 Aug 2024

yuusou said: ↑

If this is what you want to do, then you'll wanna try using a file descriptor, something like <(echo $1)
Click to expand...

Nah, I'll stick with sed. The performance of both options (and the Secret Third Option, of using envsubst) is microsecond-identical, as far as I can tell, and the sed version is more readable. (envsubst is even more readable, though it does require me to use $TITLE instead of --TITLE-- as the target to be replaced - and it will do every instance in the file, rather than just the first.)

Log in or Sign up

Linux Awkward Awk

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

yuusou Multimodder

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

sandys Multimodder

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

yuusou Multimodder

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

Share This Page

Log in or Sign up

Linux Awkward Awk

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

yuusou Multimodder

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

sandys Multimodder

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

yuusou Multimodder

Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

Share This Page

Useful Searches