Saturday, July 21, 2012

String Matching and Pattern Matching in Perl

Perl is a powerful tool for pattern matching as well as string matching. It can be done easily with help of regular expressions. One of reasons why perl string matching is taught to every bioinformatics student is ability to do sequence or pattern matching with help of it.

How pattern matching is done?

Regular expressions are used to find any set of possible patterns present in a string. Approximate matching of strings is similar to that of pattern matching with regular expressions. With the help of approximate string matching algorithm, one can find any set of possible patterns in a string.

There are three different approaches for doing it in perl.

A) Edit Distance: The first method is using edit distance for string matching or pattern matching. The edit distance between the two strings of characters is number of operations required to transform one of them to another.

B) Levenshtein distance: It is a second distance method used for perl string matching. The levenshetin distance score is a metric for measuring difference between two sequences. Levenshtein distance between two strings is given by the number of minimum operations needed to transform one string to another, where an operation can be an insertion, a deletion or a substitution of a single character.

C) Fuzzy Matching: It is third method for doing pattern matching in perl. This method is used for finding strings that approximately match a given pattern string. It is also known as inexact matching and is among popular methods used for matching strings.

Application of Approximate perl string matching in Bioinformatics

a) It can be used to find genetic variability of an organism or group of organisms. Thus, it is considered as a very good method for determining the genetic variability. It will be clearer with following example:

Let's assume that a sequence has four different variations in the gene.

1) ATGGTACGTA

2) ATGGTACGAA

3) ATGGTACGAT

4) ATGGTAAGTG

In approximate perl string matching, sequences of two combinations are compared and seen which string matches with the first string more approximately. The string which matches approximately is selected.

Thus, it plays a major role in finding the genetic variability of an organism or between group of organisms.

b) Pattern matching also plays an important role in finding the consensus sequence of a group of organisms.

In these ways, pattern matching is very useful for determining some difficult results in biological sequence data.

Reet is an Bioinformatician who writes about Bioinformatics on Gene Byte's Bioinformatics Blog. Visit Gene Byte's Bioinformatics portal for bioinformatics notes, bioinformatics presentations and video tutorials.


View the original article here

0 comments:

Post a Comment

Blog Archive