1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Creating a selenium Webscraper (Help)

Discussion in 'Black Hat SEO Tools' started by Iamhere123, Jul 28, 2017.

  1. Iamhere123

    Iamhere123 Newbie

    Joined:
    Jul 28, 2017
    Messages:
    13
    Likes Received:
    0
    Gender:
    Male
    Hi :). I am trying to build a simple web scraper that extracts data and then puts it into excel. Can anyone help me create this? My link is below at what I've got so far.

    I simply want it to go here https:at//en.wikipedia.org/wiki/Trinity_Seven#Anime and gets the ep number and title of anime and write that to excel col 1 and 2.

    Thanks :D


    --www dot dropbox.com/s/dmyym3swhapuno7/Selenium%20Scraper.txt?dl=0--
     
  2. mihai1497

    mihai1497 Jr. VIP Jr. VIP

    Joined:
    Jul 22, 2013
    Messages:
    801
    Likes Received:
    117
    Home Page:
    Try to find the title by class name instead of xpath(code may be a little different in java, the c# version is this):
    Code:
    By.ClassName("summary")
    Instead of:
    Code:
    By.xpath("//th[text()='Original air date']//following::td[contains(text(), '20')]")
    Also, you don't have to scrape the ep number, because the titles are in order.
     
    • Thanks Thanks x 1
  3. Iamhere123

    Iamhere123 Newbie

    Joined:
    Jul 28, 2017
    Messages:
    13
    Likes Received:
    0
    Gender:
    Male
    Something I'll consider when getting elements. Atm I'm trying to get the data into an excel spreadsheet.
    -- //www dot dropbox.com/home?preview=the+Writing+to+excel+range.txt

    It only gets the first lines tho :S. Looking up how to extract for an unknown range. I'm sure there will be answers hidden on stackoverflow or some website somewhere. Undoubtedly a simple fix which I am over complicating :D

    I'm trying to find atm firstfreerow to see if it can't extract to that..

    public static boolean isRowEmpty(Row row) {
    for (int c = row.getFirstCellNum(); c < row.getLastCellNum(); c++) {
    Cell cell = row.getCell(c);
    if (cell != null && cell.getCellType() != Cell.CELL_TYPE_BLANK)
    return false;
    }
    return true;
     
    Last edited: Jul 30, 2017
  4. bartosimpsonio

    bartosimpsonio Jr. VIP Jr. VIP Premium Member

    Joined:
    Mar 21, 2013
    Messages:
    12,487
    Likes Received:
    11,187
    Occupation:
    CHEAP
    Location:
    DATASETS
    Home Page:
  5. Iamhere123

    Iamhere123 Newbie

    Joined:
    Jul 28, 2017
    Messages:
    13
    Likes Received:
    0
    Gender:
    Male
    Oh I plan on scraping lots of pages, but this is just me getting used to Selenium Webdriver apache actions :). I'll have to modify the above so I can get the first free row or a unknown range I guess.