Google Analytics

Search

To search for specific articles you can use advanced Google features. Go to www.google.com and enter "site:darrellgrainger.blogspot.com" before your search terms, e.g.

site:darrellgrainger.blogspot.com CSS selectors

will search for "CSS selectors" but only on my site.


Thursday, March 8, 2007

Extracting part of a log using Bourne shell

Someone recently asked me how to select a range of text from a log file. Because it was a log file, each line started with the date and time for each log entry.

She wanted to extract all the log entries from a start time to an end time. For example, all log entries from 08:07 to 08:16 on March 8th, 2007. The format for the timestamp would be:
2007-03-08 08:07:ss.sss [log message]

where ss.sss was the seconds and [log message] was the actual text message written to the log.

My solution, using Bourne shell, was to determine the first occurance of "2007-03-08 08:07" using grep. The GNU grep command would be:
START=`grep -n -m1 "2007-03-08 08:07" logfile.log | cut -d: -f1`

The -n will prefix the results with the line number. The -m1 tells it to quit after the first match. The output is going to be something like:
237:2007-03-08 08:07:ss.sss [log message]

where 237 is the line number. So the cut -d: will break the line at the semicolons and the -f1 will take the first field, i.e. 237.

Next you want to find the last occurance of 08:16. I would suggest looking for 08:17 using the same grep command, e.g.
END=`grep -n -m1 "2007-03-08 08:17" logfile.log | cut -d: -f1`


The reason you want to look for the value after the real END time is because a log might have many entries for 08:16. By looking for 08:17 we know we have captured all the entries for 08:16 rather than just the first entry.

This will give us the line AFTER the line we want, so we do the following to decrement it by one:
END=`expr $END - 1`

Now we want to extract everything from START to END in the log. We start by extracting everything from 1 to the END using the head command:
head -n $END logfile.log

Now we want to trim off the first START lines from this. For that we can use the tail command. But the tail command wants to know how many lines are to be kept. The value of START is the number of lines we want to get rid of. So we really want $END - $START + 1. So:
LINES=`expr $END - $START + 1`

Finally we would have:
head -n $END logfile.log | tail -n $LINES

and this will display only the lines from 08:07 to 08:16 on March 8, 2007.

No comments: