Click to See Complete Forum and Search --> : String Manipulation


Dragonkai
10-13-2007, 08:36 PM
I've have thought about this for a long time and I can't seem think up a function that can do this.

Basically

I have alot of text and inside the text there are a few lines that start with "#" Like this:

#Blajh blah blah balh.
#blah blah blah.
#blah blah blah.

Now I want to get all the sentences that start with "#" out of the big piece of text. I was thinking of exploding the whole piece of text between the "#" and after then somehow stripping all the text under it. Like maybe looking for the last occurrence of "#" then looking for the next fullstop sign then stripping it. However I just can't of a way to do that.

Any help appreciated.

scragar
10-13-2007, 09:55 PM
how are the lines separated? if it's in a consistant method it's easy with a RegEx:

var a = "some\n# text \n that # \nmay\n# have\n #'s";
alert(a);
a = ("\n"+a+"\n"); // make sure it has your tag at both ends.
alert(a.replace(/\n#[^\n]*\n/g, "\n"));

Dragonkai
10-14-2007, 04:41 AM
OK here is an extract out of the all the text:

"

ia}}

===Proper noun===
'''Kiribati'''

# Country in Oceania. Official name: Republic of Kiribati.
# The [[Micronesian]] language spoken in Kiribati, also known as [[Gilbertese]].

====Translations====
{{top}}
*Bosnian: Kiribati {{m}}
*[[Breton]]: Kiribati
*Bulgarian: [[Кирибати]] (Kiribati)
*Chinese: [[基里巴斯]] (Jīlǐbāsī)

"

What I want is the sentences that are starting with hash.

Like I want this:

"

# Country in Oceania. Official name: Republic of Kiribati.
# The [[Micronesian]] language spoken in Kiribati, also known as [[Gilbertese]].

"

So I was thinking. Maybe explode the string from the first occurence of hash. Then using the second string that has been cut from #

Which in that example would be this:
"

# Country in Oceania. Official name: Republic of Kiribati.
# The [[Micronesian]] language spoken in Kiribati, also known as [[Gilbertese]].

====Translations====
{{top}}
*Bosnian: Kiribati {{m}}
*[[Breton]]: Kiribati
*Bulgarian: [[Кирибати]] (Kiribati)
*Chinese: [[基里巴斯]] (Jīlǐbāsī)

"

Then using this to find last occurrence of # (What function does that?)
explode it again making two different strings.

First one:

# Country in Oceania. Official name: Republic of Kiribati.

Second one:
"
# The [[Micronesian]] language spoken in Kiribati, also known as [[Gilbertese]].

====Translations====
{{top}}
*Bosnian: Kiribati {{m}}
*[[Breton]]: Kiribati
*Bulgarian: [[Кирибати]] (Kiribati)
*Chinese: [[基里巴斯]] (Jīlǐbāsī)
"

Then using the second one and find the first occurrence of the full stop and then exploding again. Making another two strings.

One being:

# The [[Micronesian]] language spoken in Kiribati, also known as [[Gilbertese]].

and the second being:

====Translations====
{{top}}
*Bosnian: Kiribati {{m}}
*[[Breton]]: Kiribati
*Bulgarian: [[Кирибати]] (Kiribati)
*Chinese: [[基里巴斯]] (Jīlǐbāsī)

Then using implode to join up the
# The [[Micronesian]] language spoken in Kiribati, also known as [[Gilbertese]].

with

the

# Country in Oceania. Official name: Republic of Kiribati.

And done.

Anyhow that was what I thought up. Can anyone think of an more efficient way?

Also I do not understand what your code does...

hyperlisk
10-14-2007, 04:47 AM
how are the lines separated? if it's in a consistant method it's easy with a RegEx:

var a = "some\n# text \n that # \nmay\n# have\n #'s";
alert(a);
a = ("\n"+a+"\n"); // make sure it has your tag at both ends.
alert(a.replace(/\n#[^\n]*\n/g, "\n"));
I think he wants PHP, not JavaScript :P


<?php

// $text is all of that text that you want to get the lines from.
preg_match_all('/^#[^\n\r]*/im',$text,$matches);
$matches = $matches[0];
// Now $matches will be an array of those lines.

?>

Dragonkai
10-14-2007, 04:48 AM
Oh, that looks much more understandable lol.

Anyhow... just a newbie question what does /im mean?

And thanks I'll try that out.

hyperlisk
10-14-2007, 04:53 AM
'i' tells the reular expression to make the match case-insensitive (Not really needed here, but I'm just so used to putting it :P)

'm' tells the regular expression to match across multiple lines.