1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Excel: How to extract cell contents based on keywords (HELP)

Discussion in 'BlackHat Lounge' started by CoyoteAssassin, Feb 9, 2012.

  1. CoyoteAssassin

    CoyoteAssassin Elite Member

    Joined:
    Jan 3, 2010
    Messages:
    1,862
    Likes Received:
    3,906
    Occupation:
    Full Time IMer
    Location:
    USA
    Folks.

    I have an Excel file with over 1M records. There is only one column of data.

    I have a list of keywords in notepad.

    I somehow need to take the keyword in notepad (ie 'seo') and extract every cell in Excel that contains the word 'seo'.

    I realize that I can do conditional formatting but due to the size, this will take a VERY long time and lock up the computer when sorting by the format.

    Any tips or suggestions?
     
  2. bhstudent

    bhstudent Junior Member

    Joined:
    Nov 13, 2011
    Messages:
    130
    Likes Received:
    19
    If you know some sql, load it up into phpmyadmin and do a query for regex seo. Another way will be to filter data in excel easy for ascending/descending, but that might not be so effective depending on how it is. Easiest way I think is the mysql with regex.
     
    • Thanks Thanks x 1
  3. CoyoteAssassin

    CoyoteAssassin Elite Member

    Joined:
    Jan 3, 2010
    Messages:
    1,862
    Likes Received:
    3,906
    Occupation:
    Full Time IMer
    Location:
    USA
    I'm glad that you understand what I am asking for. I've only used SQL and phpMyAdmin when installing scripts but that is it.

    The ideal approach would be to dump all of my keywords into a file, the the script will then build a list of text (from Excel) based on the matches.

    That way, 'seo' will have its on results page, 'tree' will have its own, and 'garden' will have its own.
     
  4. Topher

    Topher Registered Member

    Joined:
    Aug 25, 2010
    Messages:
    74
    Likes Received:
    54
    Location:
    Atlanta
    What bhstudent suggested is correct. Just thought i'd suggest an alternative... You can also create a ms access database and import the spreadsheet. Then create query to filter the data. i.e SELECT youfieldname FROM yourtablename WHERE yourfieldname LIKE '%seo%'
     
    • Thanks Thanks x 1
  5. flc735

    flc735 Regular Member

    Joined:
    Apr 30, 2011
    Messages:
    284
    Likes Received:
    82
    Occupation:
    Writer
    Location:
    Los Angeles, CA
    this would be my solution

    -paste into word
    -use "replace all" to replace every "seo" into "8seo8"
    -replace all spaces with "7"
    -paste back into excel
    -(im using excel 07) highlight column and go to data>text to columns>delimited, and set delimiter as "8"
    (this will use the number 8 to use as column separators)
    -insert a row at the top
    -highlight all 3 columns and click filter
    -filter each column alphabetically to easily find and delete the non seo words and delete them by row

    now you have all your seo keywords with the 7's still in there
    -in column 4, row 2 type "=a2&b2&c2"
    -copy and paste that formula down the column.
    -copy and pastespecial>values to get the actual phrase back to 1 column again
    -copy and paste into word again and use "replace all" to change the 7's back to spaces

    fin
     
  6. bertbaby

    bertbaby Elite Member

    Joined:
    Apr 15, 2009
    Messages:
    2,019
    Likes Received:
    1,496
    Occupation:
    Product marketing
    Location:
    USA
    Home Page:
    You could use the data query command in Excel. Or use a datafilter and use teh Text Contain command with the keyword as the text. Extract and repeat for the next keyword. How are you with Crystal Reports?
     
    • Thanks Thanks x 1
  7. CoyoteAssassin

    CoyoteAssassin Elite Member

    Joined:
    Jan 3, 2010
    Messages:
    1,862
    Likes Received:
    3,906
    Occupation:
    Full Time IMer
    Location:
    USA
    Thanks guys. These are good suggestions and I'll give each one consideration.

    :)