Home » Miscellanea » Unix/Bash Scripting » Extract lines from a file

Extract lines from a file

Sometimes it might be useful to extract a bunch of lines from an input file, especially when each line actually refers to a record of “structured” information.
In such a case, we can distinguish if the lines to be extracted are just a few from when you are interested in extracting many lines, which would be hardly enumerated by hand.
Suppose you have a file named file with 1,000,000 lines, and you’d like to extract from it the following records: 131, 2,096, 5,487, 37,149, 575,082.
Well, given that you’re only interested in five out of a million lines, you can simply enumerate the lines that you want to extract explicitly, either using sed or awk as follows:

> sed -n '131p;2096p;5487p;37149p;575082p' file

or

> awk 'NR == 131 || NR == 2096 || NR == 5487 || NR == 37149 || NR == 575082' file

However, what if you’d want to extract 100 records or more?
Surely, the solutions above don’t scale out very well, and they can be practically used as long as the number of lines to be extracted are “reasonable” (e.g., maybe less than 20?).
When the number of records to be filtered out increase, a possible solution is to have a temporary file (let’s call it line_to_extract) where storing the line numbers we want to extract from the original file. Each line of the temporary file will precisely contain the line number we want to filter out from the original file.
For instance, assume that we want to extract all the lines corresponding to the first 30 Fibonacci’s numbers, namely fibonacci(1), fibonacci(2), up to fibonacci(30). Note that fibonacci(1) = 1 = fibonacci(2), therefore we take just one of those.
With this scheme in mind, the line_to_extract looks like the following:

line_to_extract:
1
2
3
5
...
832040

Then, to actually extract all the lines above from file you can do as follows:

> awk 'FNR==NR{a[$1];next}(FNR in a){print}' line_to_extract file


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: