<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Austin Matzko&#039;s Blog &#187; grep</title>
	<atom:link href="http://austinmatzko.com/tag/grep/feed/" rel="self" type="application/rss+xml" />
	<link>http://austinmatzko.com</link>
	<description>A blog about philosophy, Christianity, web development and whatever else I feel like writing about.</description>
	<lastBuildDate>Wed, 16 Mar 2011 17:14:59 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2-RC4-18391</generator>
		<item>
		<title>sed and Multi-Line Search and Replace</title>
		<link>http://austinmatzko.com/2008/04/26/sed-multi-line-search-and-replace/</link>
		<comments>http://austinmatzko.com/2008/04/26/sed-multi-line-search-and-replace/#comments</comments>
		<pubDate>Sat, 26 Apr 2008 17:21:15 +0000</pubDate>
		<dc:creator>filosofo</dc:creator>
				<category><![CDATA[Computers]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[grep]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[regex]]></category>
		<category><![CDATA[sed]]></category>
		<category><![CDATA[shell]]></category>

		<guid isPermaLink="false">http://www.ilfilosofo.com/?p=458</guid>
		<description><![CDATA[I&#8217;ve been experimenting with getting regular expression patterns to match over multiple lines using sed. For example, one might want to change &#60;p&#62;previous text&#60;/p&#62; &#60;h2&#62; &#60;a href=&#34;http://some-link.com&#34;&#62;A title here&#60;/a&#62; &#60;/h2&#62; &#60;p&#62;following text&#60;/p&#62; to &#60;p&#62;previous text&#60;/p&#62; No title here &#60;p&#62;following text&#60;/p&#62; sed cycles through each line of input one line at a time, so the most [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been experimenting with getting regular expression patterns to match over multiple lines using <a href="http://en.wikipedia.org/wiki/Sed"><code>sed</code></a>.  For example, one might want to change</p>
<div class="filosofo-highlight-light html4strict" style="font-family: monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;p&gt;</span></span>previous text<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/p&gt;</span></span><br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;h2&gt;</span></span><br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;a</span> <span style="color: #000066;">href</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;http://some-link.com&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>A title here<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/a&gt;</span></span><br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/h2&gt;</span></span><br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;p&gt;</span></span>following text<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/p&gt;</span></span></div>
<p>to </p>
<div class="filosofo-highlight-light html4strict" style="font-family: monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;p&gt;</span></span>previous text<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/p&gt;</span></span><br />
No title here<br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;p&gt;</span></span>following text<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/p&gt;</span></span></div>
<p><code>sed</code> cycles through each line of input one line at a time, so the most obvious way to match a pattern that extends over several lines is to concatenate all the lines into what is called <code>sed</code>&#8216;s &#8220;hold space,&#8221; then look for the pattern in that (long) string.  That&#8217;s what I do in the following lines:</p>
<div class="filosofo-highlight-light bash" style="font-family: monospace;"><span style="color: #808080; font-style: italic;">#!/bin/sh</span><br />
<span style="color: #c20cb9; font-weight: bold;">sed</span> -n <span style="color: #ff0000;">'<br />
# if the first line copy the pattern to the hold buffer<br />
1h<br />
# if not the first line then append the pattern to the hold buffer<br />
1!H<br />
# if the last line then ...<br />
$ {<br />
&nbsp; &nbsp; &nbsp; &nbsp; # copy from the hold to the pattern buffer<br />
&nbsp; &nbsp; &nbsp; &nbsp; g<br />
&nbsp; &nbsp; &nbsp; &nbsp; # do the search and replace<br />
&nbsp; &nbsp; &nbsp; &nbsp; s/&lt;h2.*&lt;/h2&gt;/No title here/g<br />
&nbsp; &nbsp; &nbsp; &nbsp; # print<br />
&nbsp; &nbsp; &nbsp; &nbsp; p<br />
}<br />
'</span> sample.php <span style="color: #000000; font-weight: bold;">&gt;</span> sample-edited.php;</div>
<p>A more compact version: </p>
<div class="filosofo-highlight-light bash" style="font-family: monospace;"><br />
<span style="color: #c20cb9; font-weight: bold;">sed</span> -n <span style="color: #ff0000;">'1h;1!H;${;g;s/&lt;h2.*&lt;/h2&gt;/No title here/g;p;}'</span> sample.php <span style="color: #000000; font-weight: bold;">&gt;</span> sample-edited.php;<br />
&nbsp;</div>
<p>As far as I can tell, that&#8217;s the most efficient way to match general multi-line patterns.  I initially thought it might be more efficient not to keep the complete input in the hold buffer, so I modified the algorithm to print out the string whenever a match is found:</p>
<div class="filosofo-highlight-light bash" style="font-family: monospace;"><br />
<span style="color: #808080; font-style: italic;">#!/bin/sh</span><br />
<span style="color: #c20cb9; font-weight: bold;">sed</span> -n <span style="color: #ff0000;">'1h <br />
1!{<br />
&nbsp; &nbsp; &nbsp; &nbsp; # if the sought-after regex is not found, append the pattern space to hold space<br />
&nbsp; &nbsp; &nbsp; &nbsp; /&lt;h2.*&lt;/h2&gt;/ !H<br />
&nbsp; &nbsp; &nbsp; &nbsp; # copy hold space into pattern space<br />
&nbsp; &nbsp; &nbsp; &nbsp; g<br />
&nbsp; &nbsp; &nbsp; &nbsp; # if the regex is found, then...<br />
&nbsp; &nbsp; &nbsp; &nbsp; /&lt;h2.*&lt;/h2&gt;/ {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # the regular expression<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; s/&lt;h2.*&lt;/h2&gt;/No title here/g<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # print <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; p<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # read the next line into the pattern space<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; n<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # copy the pattern space into the hold space<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; h<br />
&nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; &nbsp; &nbsp; # copy pattern buffer into hold buffer<br />
&nbsp; &nbsp; &nbsp; &nbsp; h<br />
}<br />
# if the last line then print<br />
$p<br />
'</span> sample.php <span style="color: #000000; font-weight: bold;">&gt;</span> sample-edited.php;<br />
&nbsp;</div>
<p>In the last example, <code>sed</code> concatenates lines only until it finds a match, and then it prints the line (after substituting the text).  Then, it starts again to concatenate the following lines.  </p>
<p>However, that approach is usually massively inefficient, as the regex work increases logarithmically.  Unless a <code>sed</code> guru can point out a better way, I&#8217;m going to continue using the first approach.  </p>
<p>I&#8217;ve put the following script, which I call &#8220;<code>sedml</code>,&#8221; for <code>sed</code> multi-line, in my bash path.</p>
<div class="filosofo-highlight-light bash" style="font-family: monospace;"><span style="color: #808080; font-style: italic;">#!/bin/sh</span><br />
<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">$</span>#&quot;</span> -lt <span style="color: #000000;">2</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span> <br />
<span style="color: #000000; font-weight: bold;">then</span><br />
<span style="color: #7a0874; font-weight: bold;">exit</span>;<br />
<span style="color: #000000; font-weight: bold;">fi</span><br />
<br />
<span style="color: #808080; font-style: italic;"># change the input file if no 3rd argument</span><br />
<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> -z <span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">$</span>3&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span><br />
<span style="color: #000000; font-weight: bold;">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #007800;">outputfile=</span><span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">$</span>1&quot;</span><br />
<span style="color: #000000; font-weight: bold;">else</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #007800;">outputfile=</span><span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">$</span>3&quot;</span><br />
<span style="color: #000000; font-weight: bold;">fi</span><br />
<span style="color: #c20cb9; font-weight: bold;">sed</span> -n <span style="color: #ff0000;">'<br />
# if the first line copy the pattern to the hold buffer<br />
1h<br />
# if not the first line then append the pattern to the hold buffer<br />
1!H<br />
# if the last line then ...<br />
$ {<br />
&nbsp; &nbsp; &nbsp; &nbsp; # copy from the hold to the pattern buffer<br />
&nbsp; &nbsp; &nbsp; &nbsp; g<br />
&nbsp; &nbsp; &nbsp; &nbsp; # do the search and replace<br />
&nbsp; &nbsp; &nbsp; &nbsp; '</span><span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">$</span>2&quot;</span><span style="color: #ff0000;">'<br />
&nbsp; &nbsp; &nbsp; &nbsp; # print<br />
&nbsp; &nbsp; &nbsp; &nbsp; p<br />
}<br />
'</span> $<span style="color: #000000;">1</span> <span style="color: #000000; font-weight: bold;">&gt;</span> $<span style="color: #000000;">1</span>.tmp;<br />
<span style="color: #c20cb9; font-weight: bold;">mv</span> -f $<span style="color: #000000;">1</span>.tmp <span style="color: #007800;">$outputfile</span>;<br />
&nbsp;</div>
<p>So I can replace multi-line patterns in multiple files like so:</p>
<div class="filosofo-highlight-light bash" style="font-family: monospace;">&nbsp;<span style="color: #c20cb9; font-weight: bold;">grep</span> -rl <span style="color: #ff0000;">'&lt;h2'</span> <span style="color: #000000; font-weight: bold;">*</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #000000; font-weight: bold;">while</span> <span style="color: #c20cb9; font-weight: bold;">read</span> i; <span style="color: #000000; font-weight: bold;">do</span> sedml <span style="color: #007800;">$i</span> <span style="color: #ff0000;">&quot;s/&lt;h2.*&lt;/h2&gt;/No title here/g&quot;</span> <span style="color: #007800;">$i</span>.tmp; <span style="color: #000000; font-weight: bold;">done</span>;</div>
]]></content:encoded>
			<wfw:commentRss>http://austinmatzko.com/2008/04/26/sed-multi-line-search-and-replace/feed/</wfw:commentRss>
		<slash:comments>30</slash:comments>
		</item>
	</channel>
</rss>

