1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Detect Language Of Cell In Excel or txt file line - 50k to process

Discussion in 'Black Hat SEO' started by Thesiege84, Nov 17, 2013.

  1. Thesiege84

    Thesiege84 Regular Member

    Joined:
    May 5, 2009
    Messages:
    360
    Likes Received:
    97
    Occupation:
    Money Churner
    Location:
    Churning Money
    Hi Guys,

    Sorry this is in the wrong section but basically i have a big excel file of 50k title tags of websites and i need to somehow strip out the English ones.

    I would preferably need some sort of script/code that could detect the language of a cell or line in txt file and output the language that it uses or simply to detect whether its english or not.

    I can easily sort the rows A-Z and delete all the russian/chinese ones cos they use different characters but i have no idea how to distinguish french and english on a mass scale!

    Im quickly running out of ideas after spending hours of searching for this.

    Does anyone have any ideas?

    Thanks

    Dave
     
  2. skgxhfwq

    skgxhfwq Newbie

    Joined:
    Oct 8, 2013
    Messages:
    3
    Likes Received:
    0
    Export the file as csv and parse it using php, java, etc and use a language detection library. Then import the file back into excel.
     
  3. stugz

    stugz Junior Member

    Joined:
    Apr 14, 2013
    Messages:
    154
    Likes Received:
    33
    You can do this with Perl.

    Spreadsheet::parseExcel::SaveParser - to read and modify the spreadsheet
    Lingua::Identify - to give you a probability percentage for the language. It supports 33 languages including Russian.
     
  4. Thesiege84

    Thesiege84 Regular Member

    Joined:
    May 5, 2009
    Messages:
    360
    Likes Received:
    97
    Occupation:
    Money Churner
    Location:
    Churning Money
    Thanks for the replies guys, im currently trying this with php with some free services online.

    Thing is the free ones only support around 5k queries per day so i might try some paid ones once i get the code right!

    Ill update this thread with what i did incase anyone has the same issue in the future!
     
  5. skgxhfwq

    skgxhfwq Newbie

    Joined:
    Oct 8, 2013
    Messages:
    3
    Likes Received:
    0
    if you want you can send me the file and I'll parse it for you
     
  6. skgxhfwq

    skgxhfwq Newbie

    Joined:
    Oct 8, 2013
    Messages:
    3
    Likes Received:
    0
    just put the text to analyze in the first column and save the file as file csv, then run parse_csv php and it'll output the results in file-processed csv


     
  7. Thesiege84

    Thesiege84 Regular Member

    Joined:
    May 5, 2009
    Messages:
    360
    Likes Received:
    97
    Occupation:
    Money Churner
    Location:
    Churning Money
    Thanks anyway but i completed the project.

    I used PHP with the API of this website: http://detectlanguage.com/

    Worked perfectly for what i wanted, i created a web app where i paste in the lines from excel, press output and it outputs ONLY the english ones.

    Then used Vlookup function in excel to match it back up to the rest of the columns!

    That DetectLanguage site is REALLY cheap too, 1mil requests for $15 p/m :)

    Thanks for everyones help!