<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Regular Expression &#8211; Xojo Programming Blog</title>
	<atom:link href="https://blog.xojo.com/tag/regular-expression/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.xojo.com</link>
	<description>Blog about the Xojo programming language and IDE</description>
	<lastBuildDate>Fri, 26 Oct 2018 13:31:05 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>Adventures in Regular Expressions</title>
		<link>https://blog.xojo.com/2018/10/22/adventures-in-regular-expressions/</link>
		
		<dc:creator><![CDATA[Paul Lefebvre]]></dc:creator>
		<pubDate>Mon, 22 Oct 2018 16:19:27 +0000</pubDate>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[RegEx]]></category>
		<category><![CDATA[Regular Expression]]></category>
		<guid isPermaLink="false">https://blog.xojo.com/?p=5041</guid>

					<description><![CDATA[Normally I'd crack my knuckles and start on a Xojo project and use string find/replacing to massage the text. But this is messy and tedious.

So I thought, why not try Regular Expressions (RegEx)? I don't really know much about RegEx and frankly they scare me a bit.]]></description>
										<content:encoded><![CDATA[<p>As you probably know, every version of Xojo includes an extensive list of release notes that is included in the Documentation folder as an HTML file called ReleaseNotes.htm.</p>
<p>To make these even easier to access, I needed a way to get these into the wiki. It would be easiest if I could just copy and paste the HTML contents onto a wiki page, but MediaWiki can&#8217;t quite process all the HTML in that file so I needed a way to clean it up a bit.</p>
<p><span id="more-5041"></span></p>
<p>Take a look at a single line that contains a release note:</p>
<pre>&lt;tr class="spacer"&gt;&lt;td class="reportid"&gt;&lt;a href="feedback://showreport?report_id=52522"&gt;52522&lt;/a&gt;&lt;/td&gt;&lt;td class="category"&gt;Framework » All&lt;/td&gt;&lt;td class="desc"&gt;Added AntiAliasMode property on Graphics class. This property controls the level of interpolation/quality when drawing scaled Pictures. Valid modes are from the Graphics.AntiAliasModes enumeration: LowQuality, DefaultQuality, and HighQuality. The default is DefaultQuality.&lt;/td&gt;&lt;/tr&gt;</pre>
<p>This is a row in an HTML table. The problem is the &lt;a href&gt; part which MediaWiki does not use for displaying links. So I needed a way to clean that up. There are also some &lt;tbody&gt; tags that I needed to remove.</p>
<p>Normally I&#8217;d crack my knuckles and start on a Xojo project and use string find/replacing to massage the text. But this is messy and tedious.</p>
<p>So I thought, why not try Regular Expressions (RegEx)? I don&#8217;t really know much about RegEx and frankly they scare me a bit. After all, who can make heads or tails of something that looks like this:</p>
<pre>\b(\d{1,2})([/-])(\d{1,2})\2(\d{4}|\d{2})\b</pre>
<p>That gibberish above is called a RegEx pattern and apparently that one is for date validation.</p>
<p>Anyway, I figured I would give RegEx a try. My first challenge was to find a pattern that I could start with and maybe modify. Heading to the Xojo doc pages for the <a href="https://docsupgrade.xojo.com/RegEx">RegEx class</a>, I found this pattern to remove HTML tags from text on the <a href="https://docsupgrade.xojo.com/RegEx.Replace">RegEx.Replace</a> page&#8217;s Sample Code:</p>
<pre>&lt;[^&lt;&gt;]+&gt;</pre>
<p>Since I did need to remove HTML tags, I figured it was a good place to start.</p>
<p>To explain this pattern:</p>
<ol>
<li>The &#8220;&lt;&#8221; is the starting character</li>
<li>The &#8220;[&#8221; bracket starts a character class</li>
<li>The &#8220;^&#8221; means not, so &#8220;^&lt;&gt;&#8221; means all characters except &#8220;&lt;&#8221; and &#8220;&gt;&#8221;</li>
<li>The &#8220;]&#8221; ends the character class and the &#8220;+&#8221; means match 1 or more characters</li>
<li>End with a &#8220;&gt;&#8221; character</li>
</ol>
<p>So that means match all the text that starts with a &#8220;&lt;&#8221; has one or more characters after that except &#8220;&lt;&#8221; and &#8220;&gt;&#8221; and ends with a &#8220;&gt;&#8221;.</p>
<p><em>FYI: For all my RegEx testing I used <a href="https://itunes.apple.com/us/app/regexrx/id498370702?mt=12">RegExRX</a> &#8212; an excellent tool for dealing with RegEx written in Xojo by long-time Xojo developer Kem Tekinay.</em></p>
<p>After starting RegExRX I put the above pattern in the Search Pattern field and pasted a small snippet of the HTML into the Source Text field. I could immediately see that all the HTML tags were matched.</p>
<p><img fetchpriority="high" decoding="async" class="alignnone size-full wp-image-5043" src="https://blog.xojo.com/wp-content/uploads/2018/10/2018-10-12_09-54-33.png" alt="" width="735" height="692" /></p>
<p>But I did not need all the HTML tags to be matched, I only needed &lt;a&gt; and &lt;tbody&gt;. A slight modification to the above pattern can tell it to only find text that starts with &#8220;&lt;a&#8221; or &#8220;&lt;/a&#8221;. Looking at the RegEx reference on the <a href="https://docsupgrade.xojo.com/RegEx">RegEx page</a> I saw that the &#8220;?&#8221; character means 0 or 1 matches. So I can change the pattern to this:</p>
<pre>&lt;/?a[^&lt;&gt;]*&gt;</pre>
<p>Which means find HTML tags that start with &#8220;&lt;a&#8221; or &#8220;&lt;/a&#8221;. With this pattern I can see that only the &lt;a&gt; and &lt;/a&gt; tags are highlighted:</p>
<p><img decoding="async" class="alignnone size-full wp-image-5055" src="https://blog.xojo.com/wp-content/uploads/2018/10/2018-10-12_13-24-33.png" alt="" width="735" height="851" /></p>
<p>To actually remove the tags I use a Replacement and replace an empty string for the matched tag. Switching to the Replace tab in RegExRX, this is what that looks like:</p>
<p><img decoding="async" class="alignnone size-full wp-image-5056" src="https://blog.xojo.com/wp-content/uploads/2018/10/2018-10-12_13-28-45.png" alt="" width="735" height="851" /></p>
<p>To do the same things for &lt;tbody&gt; I changed the pattern to this:</p>
<pre>&lt;/?tbody[^&lt;&gt;]*&gt;</pre>
<p>Now it was time to use this in a Xojo project. You use the <a href="https://docsupgrade.xojo.com/RegEx">RegEx class</a> to work with regular expressions. This function takes text (loaded from a release notes file) and applies the search pattern and an empty replace pattern to remove the tags and stick the result into the Clipboard so I can paste it into the wiki:</p>
<pre>Private Function CleanHTML(html As String) as String
  Dim re As New RegEx
  re.SearchPattern = "&lt;/?a[^&lt;&gt;]*&gt;" // Find &lt;a href&gt;&lt;/a&gt; tags for removal
  re.ReplacementPattern = ""
  re.Options.ReplaceAllMatches = True
  Dim plainHTML As String = re.Replace(html)

  // Remove any &lt;tbody&gt; tags as MediaWiki doesn't process them
  re.SearchPattern = "&lt;/?tbody[^&lt;&gt;]*&gt;"
  re.ReplacementPattern = ""
  re.Options.ReplaceAllMatches = True
  plainHTML = re.Replace(plainHTML)

  Dim c As New Clipboard
  c.Text = plainHTML

  Return plainHTML
End Function</pre>
<p>This worked wonderfully, but after trying this out I decided I wanted to keep the links to the Feedback case around so that you can click on a case number to open the case in the Feedback app and read all its history. This means that instead of removing the &lt;a&gt; tags I needed to change them from this:</p>
<pre>&lt;a href="feedback://showreport?report_id=52522"&gt;52522&lt;/a&gt;</pre>
<p>to this:</p>
<pre>[http://feedback.xojo.com/case/52522 52522]</pre>
<p>To do this I now needed to use a subgroup to save the case ID so I could use it as part of the replacement string. To create subgroups you group parts of the RegEx pattern using parentheses. Essentially I wanted to have clear parts of the pattern for the &lt;a&gt; start tag, the value (case ID) and the &lt;/a&gt; closing tag.</p>
<p>I started by simplifying the &lt;a&gt; tag search to this to just match the opening tag:</p>
<pre>&lt;a[^&lt;&gt;]*&gt;</pre>
<p>I then added a part to match the case ID which is just a series of 1 or more numbers. RegEx has a command to match only digits, which is &#8220;\d&#8221;, to which we can add the &#8220;+&#8221; to to repeat it one or more times as needed. I then wrapped that in parentheses to get a group resulting in this:</p>
<pre>&lt;a[^&lt;&gt;]*&gt;(\d+)</pre>
<p>Lastly I added the part to match the closing tag &lt;/a&gt; to get the final pattern:</p>
<pre>&lt;a[^&lt;&gt;]*&gt;(\d+)&lt;/a&gt;</pre>
<p>In the end you can see the &lt;a&gt;, &lt;/a&gt; tags and the case ID value are all matched:</p>
<p><img loading="lazy" decoding="async" class="alignnone size-full wp-image-5057" src="https://blog.xojo.com/wp-content/uploads/2018/10/2018-10-12_13-32-42.png" alt="" width="735" height="851" /></p>
<p>This changed pattern means I now have a group I can use for the replacement. Switching to the Replace tab in RegExRX you&#8217;ll notice that the entire matched text is removed because the Replace Pattern is blank. Typing &#8220;$1&#8221; (this contains the group with the Case ID value) in the Replace Pattern showed the Case ID in the replaced text:</p>
<p><img loading="lazy" decoding="async" class="alignnone size-full wp-image-5058" src="https://blog.xojo.com/wp-content/uploads/2018/10/2018-10-12_13-33-30.png" alt="" width="735" height="851" /></p>
<p>And now I put the rest of the text I wanted for the Replace Pattern:</p>
<pre>[feedback://showreport?report_id=$1 $1]</pre>
<p>The replaced text now looks like what I wanted:</p>
<p><img loading="lazy" decoding="async" class="alignnone size-full wp-image-5059" src="https://blog.xojo.com/wp-content/uploads/2018/10/2018-10-12_13-34-06.png" alt="" width="735" height="851" /></p>
<p>Here is the updated Xojo function:</p>
<pre>Private Function CleanHTML(html As String) as String
  Dim re As New RegEx
  re.SearchPattern = "&lt;a[^&lt;&gt;]*&gt;(\d+)&lt;/a&gt;" // Find &lt;a href&gt; tags and save case # as a group
  re.ReplacementPattern = "[http://feedback.xojo.com/case/$1 $1]" // Swap in wiki link format with correct URL
  re.Options.ReplaceAllMatches = True
  Dim plainHTML As String = re.Replace(html)

  // Remove any &lt;tbody&gt; tags as MediaWiki doesn't process them
  re.SearchPattern = "&lt;/?tbody[^&lt;&gt;]*&gt;"
  re.ReplacementPattern = ""
  re.Options.ReplaceAllMatches = True
  plainHTML = re.Replace(plainHTML)

  Dim c As New Clipboard
  c.Text = plainHTML

  Return plainHTML
End Function</pre>
<p>I hope this little adventure in RegEx has helped you appreciate how wonderful they are for string searching and replacement. I&#8217;m still no expert, but I found this to be much, much better than messy string searching and parsing using InStr and friends.</p>
<p>To learn more about Regular Expressions, check out the <a href="https://regexone.com">RegExOne site</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
