Cool link lock feature.
Link lock is limited to 3 keywords or I can just put in the form without loading a file:
?Code:http://www.website.com {kw1|kw2|kw3|kw4|kw4|kw5|...|kw20}
it seems that SJ is now integrated with captcha sniper.
But how do you set it up? SJ has a check box that activates when it detects CS.
The question is, what settings do we choose in CS (beta):
There are 3 possible settings:
- use host file redirect
- use clean image algorithms
- Use CSSE
![]()
Stop making these advertising threads. The one you made last time got deleted, didn't you get it last time?
Not using private proxies for scraping is a common misconception, you just have to do it right. Keep your connections at 20% or less of your proxies and you can pretty much scrape forever. Its faster then dealing with checking public proxies and then dealing with all the failure rates and low speed.
In tests using CS 1.5 Beta, we did not enable any of the 3 you mentioned.
Jet communicate with CS directly, without host file redirect.
Also note that there have been new platforms added (downloadable).
It seems the captcha-sniper integration does not work for me, as no captchas are being solved. But that's when I resume projects. In order for me to use captcha-sniper to I have to start a project from beginning?
Make sure you have the corresponding platforms checked in CS, like WP, Lifetype (there's a typo in CS, I thinks its lifetime there).
Does ScrapeJet recognize CS and enabled the checkbox on the config screen?
CS integration might not work with some platforms yet, because the definition files have not been updated to include the captcha identifier.
WordPress def includes currently sicaptcha and recaptcha, other captcha types will be added over time (as soon they are encountered).
Same is valid for BlogEngine and other platforms. There might be no captcha identifier yet in the def file, but they will be added over time (Captchas CS support and can solve).
Important was to integrate the CS communication first. Adding captchas is then done by just tuning the def files.
In some tests yesterday, using LifeType platform, we could double the success rate due to CS integration.
Yes SJ recognized CS and it's checked.
And the correct checkboxes for blog and pixelpost is checked. Maybe it's the def files not having a captcha identifier.
I assume it is the def file. You can check it by looking for "imagecaptchaidentifier" in the def file.
For a simple test, try just the lifetype platform. Having CS and Jet side by side, you should see how the captchas get submitted.
I have it only on wordpress which is not blogengine or movable. Do I have to re-download the updated .def files from somewhere?
But even wordpress hasn't solved any captchas after running SJ over night.
Maybe it doesn't work when resuming project? Maybe I need to start the project from beginning?
No, it does not matter you resume or restart.
If ever there's a new def file, it will appear in the list which popup when you click "Install definitions" on the config page.
Just to the following test:
Download the lifetype definition file
Enable only the lifetype def file on the config screen, uncheck all other
Make sure CS is running and detected by Scrapejet
Create a project, use a simple keyword like "test"
Add the project and run it. Having CS and Jet side by side, you should see lots of captchas sent to CS.
We will add a "Captcha Test" function soon, which allow to send some test captchas to CS to verify the integration is working.
I know ScrapeJet developers have so many things already in their todo list, and I just want to make their life a little harder.
So... Feature request:
Would be gread "Assign A folder with lists to post". Instead of using a 500.000 list, we can split that in 1.000 lists and ScrapeJet will post / verify links for each 1.000 list.
Advantages: less crush using small lists, quicker results in "Backlinks" folder.
Thank you![]()
Edit:
Ahh.. I forgot anothe one: add a little tiny option to use "bad words" list only to filter urls, not page content. I try to avoid links from adult sites to my child related sites, but I don't really care about neighbor links.
Now if I will experience hiccups i'll know why![]()
You are great!
For bad words I'm a little confused too. I can't remember now where I read this option is skipping pages containing in content "bad words". If it's just filtering urls containing "bad words" it's what I need.