Click to See Complete Forum and Search --> : Any Regex Experts out there?


kwb
12-02-2005, 10:05 AM
Hi,

I need to separate tags that are all grouped together with no spacing in between eg,

2005-11-28 15:51:22: DEBUG: importFixPumpdata: 2005/11/18 00:06:46:187: FIXPump: Received data on connection {CLIENT} [8=FIX.4.29=024335=D50=DCN3230197=N57=RISKGATEWAY34=93849=CLIENTFUT_EXLINK56=EXLINK_CLIENT43=N52=2005 1118-05:06:46200=200512207=Japan40=255=TSE 01 0F2005Z11=fud630_20051118167=FUT54=159=044=138.0821=238=1560=20051118-05:06:461=ATOP1110=032]

Between the [] there are a number of FIX tags, eg '8=FIX.4.2' and there are no spaces in between, so the next FIX tag immediately follows the previous one, 8=FIX.4.29=02433 and so on (8=FIX.4.2 9=02433). Any ideas on how I can separate them? I have tried a number of REGEX and using split.

PS They are all on the one line as well.

Many thanks

KWB

felgall
12-02-2005, 04:37 PM
8=FIX.4.29=02433 could mean 8=F and IX.4.29=02433

without some sort of separator between the fields there is no way to tell when one finishes and the next starts. You will need to look at what is generating the tags and get it to write a separator value since by the stage you are looking at the info is not available.

NogDog
12-02-2005, 04:57 PM
As Felgall points out, the best solution would be to get whatever is generating that text to include a field separator of some sort. If that's not practical, then the only way I see it being possible to extract programatically is if there is always a single digit before the equals sign. If it can be a varying number of digits, then I doubt there's a regexp solution that could be depended upon to be correct.

Nedals
12-02-2005, 08:27 PM
I'm a little suspicious that you may be misleading yourself. This data looks as though it may be comming from a serial port. How did you print out that line to put in your post? Is it possible that the data is, in fact, comming at you line by line?

If you run this snippet of code...

while (<DATA>) {
chomp;
print $_;
}

__DATA__
8=FIX.4.2
9=02433
5=D
50=DCN32301
97=N5
7=RISKGATEWAY
34=938
49=CLIENTFUT_EXLINK

you get...
8=FIX.4.29=024335=D50=DCN3230197=N57=RISKGATEWAY34=93849=CLIENTFUT_EXLINK
But it's actually on seperate lines and easily split up.

kwb
12-07-2005, 08:04 AM
I've cracked it! I substituted the hexidecimal delimeter 001 with a space:

while(<FH>){
chomp;
my $tag;
my $tag = s/\001/ /g;
printf "$_ \n";
}

Which now gives me the following output:

2005/11/18 00:06:46:187: FIXPump: Received data on connection {CLIENT} [8=FIX.4.2 9=0243 35=D 50=DCN32301 97=N 57=RISKGATEWAY 34=938 49=CLIENTFUT_EXLINK 56=EXLINK_CLIENTFUT 43=N 52=20051118-05:06:46 200=200512 207=Japan 40=2 55=TSE 01 0F2005Z 11=fud630_20051118 167=FUT 54=1 59=0 44=138.08 21=2 38=15 60=20051118- 5:06:46 1=ATOP11 10=032 ]