Click to See Complete Forum and Search --> : String Manipulation
Dragonkai
10-13-2007, 08:36 PM
I've have thought about this for a long time and I can't seem think up a function that can do this.
Basically
I have alot of text and inside the text there are a few lines that start with "#" Like this:
#Blajh blah blah balh.
#blah blah blah.
#blah blah blah.
Now I want to get all the sentences that start with "#" out of the big piece of text. I was thinking of exploding the whole piece of text between the "#" and after then somehow stripping all the text under it. Like maybe looking for the last occurrence of "#" then looking for the next fullstop sign then stripping it. However I just can't of a way to do that.
Any help appreciated.
scragar
10-13-2007, 09:55 PM
how are the lines separated? if it's in a consistant method it's easy with a RegEx:
var a = "some\n# text \n that # \nmay\n# have\n #'s";
alert(a);
a = ("\n"+a+"\n"); // make sure it has your tag at both ends.
alert(a.replace(/\n#[^\n]*\n/g, "\n"));
Dragonkai
10-14-2007, 04:41 AM
OK here is an extract out of the all the text:
"
ia}}
===Proper noun===
'''Kiribati'''
# Country in Oceania. Official name: Republic of Kiribati.
# The [[Micronesian]] language spoken in Kiribati, also known as [[Gilbertese]].
====Translations====
{{top}}
*Bosnian: Kiribati {{m}}
*[[Breton]]: Kiribati
*Bulgarian: [[Кирибати]] (Kiribati)
*Chinese: [[基里巴斯]] (Jīlǐbāsī)
"
What I want is the sentences that are starting with hash.
Like I want this:
"
# Country in Oceania. Official name: Republic of Kiribati.
# The [[Micronesian]] language spoken in Kiribati, also known as [[Gilbertese]].
"
So I was thinking. Maybe explode the string from the first occurence of hash. Then using the second string that has been cut from #
Which in that example would be this:
"
# Country in Oceania. Official name: Republic of Kiribati.
# The [[Micronesian]] language spoken in Kiribati, also known as [[Gilbertese]].
====Translations====
{{top}}
*Bosnian: Kiribati {{m}}
*[[Breton]]: Kiribati
*Bulgarian: [[Кирибати]] (Kiribati)
*Chinese: [[基里巴斯]] (Jīlǐbāsī)
"
Then using this to find last occurrence of # (What function does that?)
explode it again making two different strings.
First one:
# Country in Oceania. Official name: Republic of Kiribati.
Second one:
"
# The [[Micronesian]] language spoken in Kiribati, also known as [[Gilbertese]].
====Translations====
{{top}}
*Bosnian: Kiribati {{m}}
*[[Breton]]: Kiribati
*Bulgarian: [[Кирибати]] (Kiribati)
*Chinese: [[基里巴斯]] (Jīlǐbāsī)
"
Then using the second one and find the first occurrence of the full stop and then exploding again. Making another two strings.
One being:
# The [[Micronesian]] language spoken in Kiribati, also known as [[Gilbertese]].
and the second being:
====Translations====
{{top}}
*Bosnian: Kiribati {{m}}
*[[Breton]]: Kiribati
*Bulgarian: [[Кирибати]] (Kiribati)
*Chinese: [[基里巴斯]] (Jīlǐbāsī)
Then using implode to join up the
# The [[Micronesian]] language spoken in Kiribati, also known as [[Gilbertese]].
with
the
# Country in Oceania. Official name: Republic of Kiribati.
And done.
Anyhow that was what I thought up. Can anyone think of an more efficient way?
Also I do not understand what your code does...
hyperlisk
10-14-2007, 04:47 AM
how are the lines separated? if it's in a consistant method it's easy with a RegEx:
var a = "some\n# text \n that # \nmay\n# have\n #'s";
alert(a);
a = ("\n"+a+"\n"); // make sure it has your tag at both ends.
alert(a.replace(/\n#[^\n]*\n/g, "\n"));
I think he wants PHP, not JavaScript :P
<?php
// $text is all of that text that you want to get the lines from.
preg_match_all('/^#[^\n\r]*/im',$text,$matches);
$matches = $matches[0];
// Now $matches will be an array of those lines.
?>
Dragonkai
10-14-2007, 04:48 AM
Oh, that looks much more understandable lol.
Anyhow... just a newbie question what does /im mean?
And thanks I'll try that out.
hyperlisk
10-14-2007, 04:53 AM
'i' tells the reular expression to make the match case-insensitive (Not really needed here, but I'm just so used to putting it :P)
'm' tells the regular expression to match across multiple lines.