Regular expressions are one of the most important components of tcl, especially for web based programs, like CGI scripts and Database processing scripts. Tcl regular expression commands are broken up into two categories: find (regexp) and find-and-replace (regsub).

rege
The basic syntax for regexp is
regexp ?switches? exp string ?matchVar? ?subMatchVar sub-MatchVar ...?

exp is the regular expression, string is the string you are looking for a match in. matchVar will contain the first substring that matches the regular expression. The submatchVars will contain the match for each parenthetical match, respectively. Look at the Tcl Regexp Manpage for more complete information.

regsub
regsub is similar to regexp. Instead of just finding the sub-string, it allows you to replace it with something else. This is useful for commenting things out, or getting rid of HTML tags, or changing text in a document, etc. The syntax for regsub is
regsub ?switches? exp string subSpec varName

exp is the regular expression, string is the string you are looking for a match in. subSpec is what you want to replace exp with, and varName is the name of the variable you want the result to go into. For more information, and descriptions of possible switches, please see Tcl Regsub Manpage

Notes

Something to note is that Tcl regexp and regsub are greedy, meaning it will always find the longest substring that matches the regular expression. This can be a bad thing! For example, say you are looking for html elements (things which begin with "<" and end with ">". try

	regexp {<.*>} $string

but this will match from the beginning of the first HTML tag to the end of the last one! If you are like me, and start every document with <html>, and end with </html> then it will match the whole document! If you were trying to replace the HTML tags with something else, you are out of luck!

Sample Regular Expressions
Here are some regular expressions I use pretty often and descriptions of how to use them. Let me know if you find some cases for which these expressions do not work.

Used forExpressionDescription
Removing HTML tags regsub -all {<([^<])*>} $s {} s Find and replace with nulls, all strings which begin with a <, end with a > and do NOT have a < in the middle.
Replacing ' with '' regsub -all ' $s '' s Replace all occurrences of ' with ''. This is referred to as "quoting out" the ', which interferes with proper insertion into a SQL database.
Finding IP addresses regexp [0-9]+\.[0-9]+\.[0-9]+\.[0-9]+ Finds strings in the form of number.number.number.number
Parse href tags regsub -all {<a href=([^<]*)>} $s \1 s Find all strings that begin with "<a href=", end with ">", and don't have a "<" in the middle. Replace these with the stuff between "<a href=" and ">"
Parse out non-numeric chars regsub -all {([^0-9])+} $string {} string Replace all characters which are not in [0-9] with nothing (effectively removing everything but numbers).


This document last modified: Friday, November 05, 2004 me@rustybrooks.com