Thread: Text processing help needed !

    Jul 2017

    Text processing help needed !


    End result shall be a csv file or a google spreadsheet with the following column structure;

    “Address, City, Zipcode, Name”

    “Zipcode” shall always be the string “Malmö”, for each row.

    Iterate a parse loop on the following actions through the html file;

    Find a string “Malmö<br />”

    Parse backwards (leftwards) from this string until you find the first “<br>” tag.

    Extract the string located between the found “<br />” tag and “Malmö<br />”, in the example below “211 77 ”.

    Put this string as Zipcode.

    Parse leftwards until you find the first string “<p>”.

    Check if the string between the “<br>” tag and the “<p>” tag contain the string “lgh [4 digit number]”. If so, check if the 4 digit number start with “10”.

    If the number start with “10”, remove the “lgh [4 digit number]” from thefull string between the “<br>” and the “<p>” tag and place the rest as Address.

    If no “lgh [4 digit number]” is present in the string checked in step 9, put the full string between the “<br>” and the “<p>” tag as Address.

    Parse leftwards from the “<p>” tag, until you find the first string “ år”.

    Copy the number immediately to the left of this “ år” string, and add it first to the string present between the “</a>” and the “</i>” tag leftwards of the number, and place the combined result as Name.

    Iterate a new loop and keep looping until the end of the html file.

    </i> Juväng</a><span>, 40 år</span></h3><div style="width: 420px"><div class="col adress"><p>Einar Hansens Esplanad 14 lgh 1003<br>211 77 Malmö<br />

    Would render this row;

    Einar Hansens Esplanad 14 , Malmö, 211 77 , 0Juväng

    Sep 2017
    Hi, I have no idea how to help you on that, but send a message to the guys at *Links removed by Site Staff so it doesn't look like you're spamming us. Please don't post them again.**
    Jan 2017
    Coimbatore, India
    Hello freemanackerman,
    Which programming language are you using? where is the file located? Is it a html or txt file?

