REGEX doubt

Discussion in 'Scripting' started by cacats, Oct 18, 2018.

  1. cacats

    cacats Regular Member

    Joined:
    Mar 9, 2016
    Messages:
    285
    Likes Received:
    69
    Gender:
    Male
    Occupation:
    Law student
    Location:
    Spain
    Hey, guys, I would like to extract the coloured information. How would you do that with regex?

    Each coloured sentence must be extracted with a different regex.

    Thanks in advance!!!

    <a id="ember12737" data-control-id="xQ0cyDo0SuewLUR/2owMoA==" data-control-name="search_srp_result" href="/in/servicio-de-empleo-colegio-de-polit%C3%B3logos-y-soci%C3%B3logos-7542b825/" class="search-result__wrapper search-result__result-link ember-view"> <figure class="search-result__image mt4 mb3 mh2"> <div id="ember12738" class="presence-entity presence-entity--size-3 ember-view"><div id="ember12739" aria-label="Servicio de Empleo Colegio de Politólogos y Sociólogos" class=" presence-entity__image EntityPhoto-circle-3 ember-view" style="background-image: url("https://media.licdn.com/dms/image/C...t=IK6KaSo7S2BVJZGyPcQ48BvKq32LD6h8g4sZwqIi9fM");"> <span class="visually-hidden">Servicio de Empleo Colegio de Politólogos y Sociólogos</span> </div> <div id="ember12740" class="presence-indicator presence-entity__indicator presence-entity__indicator--size-3 hidden presence-indicator--size-3 ember-view"> <span class="visually-hidden"> Status is offline </span> </div> </div> </figure> <div class="search-result__info pt3 pb4 pr4"> <h3 id="ember12741" class="actor-name-with-distance single-line-truncate ember-view"><span class="name-and-icon"><span class="name-and-distance"> <span class="name actor-name">Servicio de Empleo Colegio de Politólogos y Sociólogos</span> <span id="ember12742" class="distance-badge separator ember-view"> <span class="visually-hidden">2nd degree connection</span> <span class="dist-value">2nd</span> </span> </span><!----></span> </h3> <p class="search-result__subline-level-1 subline-level-1 t-14 t-black--light t-normal"> Servicio de Empleo y Carrera Profesional en Colegio de Politólogos y Sociólogos </p> <p class="subline-level-2 t-12 t-black--light t-normal search-result__truncate"> <span class="search-result__location-pin svg-icon-wrap"><li-icon aria-hidden="true" type="map-marker-icon" size="small"><svg viewBox="0 0 24 24" width="24px" height="24px" x="0" y="0" preserveAspectRatio="xMinYMin meet" class="artdeco-icon" focusable="false"><path d="M8,4a2,2,0,1,0,2,2A2,2,0,0,0,8,4ZM8,7.13A1.13,1.13,0,1,1,9.13,6,1.13,1.13,0,0,1,8,7.13ZM8,1A5,5,0,0,0,3,6a5.37,5.37,0,0,0,.41,2S5.91,13,7.22,15.52A0.86,0.86,0,0,0,8,16H8a0.86,0.86,0,0,0,.78-0.48C10.09,13,12.59,8,12.59,8A5.37,5.37,0,0,0,13,6,5,5,0,0,0,8,1Zm2.88,6.24L8,12.92,5.12,7.24A3.49,3.49,0,0,1,4.88,6a3.13,3.13,0,0,1,6.25,0A3.49,3.49,0,0,1,10.88,7.24Z" class="small-icon" style="fill-opacity: 1"></path></svg></li-icon></span> Madrid Area, Spain </p> <!----><!----> </div> <!----></a>
     
  2. cacats

    cacats Regular Member

    Joined:
    Mar 9, 2016
    Messages:
    285
    Likes Received:
    69
    Gender:
    Male
    Occupation:
    Law student
    Location:
    Spain
    Okay, I made a lot of changes and I don't need the blue, green and white part.

    Any ideas on how to extract the red part?

    Thanks in advance ;)
     
  3. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP

    Joined:
    May 1, 2009
    Messages:
    8,259
    Likes Received:
    9,130
    Gender:
    Male
    Occupation:
    Geographer, Tourism Manager
    Location:
    Ghosted
    Does it need to be regex?
    If not, try MS Excel. Set the quotation mark as delimiter and split the text. If the structure of the data is the same in each line (i suppose you have more of these), the needed bit should be arranged in the same column under each other in every line, so you can simply copy it. If you don't need the surplus dashes at the beginning and the end, you can clear the data in Notepad++ using what? Yes, of course, regex. lol
     
  4. TasDePixels

    TasDePixels Junior Member

    Joined:
    Mar 8, 2018
    Messages:
    118
    Likes Received:
    121
    Gender:
    Male
    Occupation:
    Software engineer
    Location:
    Morocco
    You don't actually need regex for that.
    except if you like kinky stuff, but hey, I'm not here to judge : )

    What programming language are you using ? I suggest cheerio library if you you're comfortable with JavaScript.

    You can do something like this :
    Var $ = cheerio.load (yourHTMLpage);
    Var target = $('elementClass').attr ('href')
     
  5. cacats

    cacats Regular Member

    Joined:
    Mar 9, 2016
    Messages:
    285
    Likes Received:
    69
    Gender:
    Male
    Occupation:
    Law student
    Location:
    Spain
    I am using browser automation studio, that's why I needed that to be done in REGEX hahaha. But thanks for your input!!
     
  6. Dred Shep

    Dred Shep Newbie

    Joined:
    Oct 22, 2018
    Messages:
    4
    Likes Received:
    0
    Easy with JQuery.

    Code:
    $("#ember12737").attr('href')
    $("#ember12739").attr('label')
    $('#ember12739').css('background-image').replace('url(','').replace(')','').replace(/\"/gi, "")
    $('.search-result__subline-level-1').text().trim()
    
    
    If the ids are different in each occasion, then trace the element some other way, but jQuery does it easily.

    edit: Just saw that you're using browser automation studio. Why?
     
  7. Witicagnall4435

    Witicagnall4435 Jr. VIP Jr. VIP

    Joined:
    Mar 11, 2015
    Messages:
    132
    Likes Received:
    27
    For the red part use:
    Code:
    (?<=href=").*(?=/"\ class)
     
    • Thanks Thanks x 1
  8. nextime

    nextime Newbie

    Joined:
    Apr 27, 2018
    Messages:
    38
    Likes Received:
    16
    Gender:
    Male
    Red part:
    Code:
    (?<=href=\").*?(?=\"\ class)
    Blue part:
    Code:
    (?<=aria-label\=").*?(?=" )
    Green part:
    Code:
    (?<=t-normal">).*?(?=</p>)
    White part:
    Code:
    (?<=icon></span>).*?(?=</p>)
    Works well, tested in notepad++, probably your software can require to make some changes (because there are differences in regex engines), not sure.
     
    • Thanks Thanks x 1
  9. javabro

    javabro Jr. VIP Jr. VIP

    Joined:
    Dec 2, 2015
    Messages:
    985
    Likes Received:
    1,205
    Gender:
    Male
    Occupation:
    They pay me to write code.
    Location:
    Pearl of the Indian ocean
    Home Page:
    This is 20day old thread. Op must have figured it out already.
     
  10. cacats

    cacats Regular Member

    Joined:
    Mar 9, 2016
    Messages:
    285
    Likes Received:
    69
    Gender:
    Male
    Occupation:
    Law student
    Location:
    Spain
    Yep, I learnt regex from Scratch hahaha. But thanks anyway guys!
     
  11. nextime

    nextime Newbie

    Joined:
    Apr 27, 2018
    Messages:
    38
    Likes Received:
    16
    Gender:
    Male
    Oh, I have not noticed that

    It would be better to let us know about that before :)