[Bash Challenge 8] Can You Solve This Bash Script Puzzle?

Welcome to the Bash Challenge #8 by Yes I Know IT & It’s FOSS. In this weekly challenge, we will show you a terminal screen, and we will count on you to help us obtaining the result we wanted. There can be many solutions, and being creative is the most amusing part of the challenge.

If you haven’t done it already, do take a look at previous challenges:

You can also buy these challenges (with unpublished challenges) in book form and support us:

Suggested read
Bash It Out! Bash Script Puzzle Book by It's FOSS is Available Now!

Ready to play? So here is this week’s challenge.

How to add a header?

This week I work with several data files and one header file. I just want to insert the contents of the header file on top of each data file:

Bash Challenge 8 solve bash script problem

For the sake of the demonstration, I only displayed one file. But you may imagine I have many of them — too many for considering manual editing.

Anyway, for some reason my solution didn’t work: not only I’ve lost the data but my header appears twice.

cat HEADER DATA01 | tee DATA01
# Month, Year, Est.Value
# Month, Year, Est.Value

As you can see, I really need your help here — both to explain to me what was going on and to help me in solving that issue. I’m really looking forward to read your solutions in the comment section below!

Few details

To create this challenge, I used:

  • GNU Bash, version 4.4.5 (x86_64-pc-linux-gnu)
  • Debian 4.8.7-1 (amd64)
  • All commands are those shipped with a standard Debian distribution
  • No commands were aliased

Solution

How to reproduce

Here is the raw code we used to produce this challenge. If you run that in a terminal, you will be able to reproduce exactly the same result as displayed in the challenge illustration (assuming you are using the same software version as me):

rm -rf ItsFOSS
mkdir -p ItsFOSS
cd ItsFOSS
cat > HEADER << EOT
# Month, Year, Est.Value
EOT
cat > DATA01 << EOT
Dec, 2015, 15000
Jan, 2016, 12540
Feb, 2016, 11970
EOT
clear
head HEADER DATA01
cat HEADER DATA01 | tee DATA01

What was the problem?

In a pipeline, all commands are launched in parallel. That means the cat command reading the DATA01 file and the tee command overwriting that same file are launched simultaneously.

This is really a race condition. On my system, tee had time to overwrite the destination file before cat had the opportunity to read it. To illustrate that, we can delay the commands and see the output is clearly dependent on the timing:

cat HEADER DATA01 | ( sleep 1; tee DATA01 )
# Month, Year, Est.Value
Dec, 2015, 15000
Jan, 2016, 12540
Feb, 2016, 11970
(sleep 1 ; cat HEADER DATA01 ) | tee DATA01
# Month, Year, Est.Value

I would have a similar issue (albeit deterministic this time) using the simpler:

cat HEADER DATA01 > DATA01

In that case, the shell always overwrites the destination file before launching the cat command. So the content of the file is lost long before cat had even the opportunity to read it.

How to fix that?

Obviously, no one would ever use the sleep hack in a real situation. But this is not an issue: as part of the standard POSIX tools, we have several commands at our disposal to insert the header on top of a file. Before that, let’s take a look at the most basic solution.

The KISS solution

cat HEADER DATA01 > DATA01.NEW
mv -f DATA01.NEW DATA01

Do I really need to comment that? Well, while being rudimentary, this solution has a nice feature: since rm will use the system call rename, which itself is atomic in that sense that referencing the DATA01 file, other processes will either see the old content or the new content — but neither an half-written content.

A somewhat similar solution, but avoiding to create a temporary file visible on the filesystem would obtain first a file descriptor to read from the original file before overwriting it:

exec 3<DATA01              # (1)
rm -f DATA01               # (2)
cat HEADER - <&3 >DATA01   # (3)
exec 3<&-                  # (4)
  • Open the file DATA1 for reading using the file descriptor 3;

  • Unlink the original file (i.e.: remove its directory entry, but not the data as the file is still open);
  • Use cat to read the header first, followed by a stdin read from file descriptor 3 and write to a new DATA01 file;
  • Close the file descriptor 3 This will effectively delete the old DATA01 content.

Please note this solution is no longer atomic in the sense used above. Anyways, Kudos to Adithya Kiran Gangu for having proposed that solution!

Using sed

While encountering similar problems for the first time, my idea was to use sed. It is quite easy to insert a “header” after the first line using sed. But it’s more difficult to insert something before the first line. In fact, to achieve that, we will need a little bit of magic:

sed -i '1{
  r HEADER
  N
}' DATA01

To fully understand, you need to know the (r)ead command inserts the content of a file in the destination stream, but only once the current line processing has ended. That’s why I used the (N)ext command: it will end the line 1 processing early (i.e.: before normal line output). So, when encountering that command, sed ends processing of line 1. Which triggers output of the content of the HEADER file. But the line 1 itself is not sent to the output. It is kept in the sed buffer.

Then sed reads the next line of input, append it to the buffer, and as we do not have any rule for line 2, process it as usual by sending its buffer to the output (remember at that stage, the buffer contains both line 1 and line 2).

This solution has a major drawback: it assumes there is a line 2. If the data file contains only one line, this will fail miserably.

Using ed or ex

We have very few occasions of using ed or its cousin ex. Both are line oriented editors. Their behavior is very similar to vi in that sense you load file into memory, and send commands to the editor to modify that file. The only difference here is we will script the commands instead of sending them interactively.

ed DATA01 << .
0r HEADER
wq
.
ex -s DATA01 << .
0r HEADER
wq
.

This works great, but as we have to load the whole file into memory which could be an issue for very large files.

As always, those are probably only a subset of all possible solutions. So don’t hesitate to use the comment section to share your own ideas.

And stay tuned for more fun!

Comments

  1. The ed editor??? Yup, the old ed editor is still installed by default on most Unixes and can still be useful:

    ed -s DATA01 << HEADER DATA01 <==
    Dec, 2015, 15000
    Jan, 2016, 12540
    Feb, 2016, 11970

  2. The problem with tee is that it empties the file immediately before cat operation.. use sponge utility from more utils solves this issue.
    cat header data01 |sponge data01

    • we can also use temporary file descriptors to save the content and readout from them in the below fashion. pure bash based solution. Its not recommended to use stdin and stdout operations on the same file when you use pipe operation.
      $ exec 3<DATA01 && rm -f DATA01 && cat HEADER – DATA01 && exec 3<&-
      1. create a temp file descriptor 3 for DATA01
      2. remove the original file
      3. use cat to read the header first followed by a stdin read from file descriptor 3 and write to DATA01
      4. close the file descriptor 3
      Like · Reply · 1 min · Edited

Leave a Reply

Your email address will not be published. Required fields are marked *

[i]
[i]