Httpwebrequest (C#) capturing Captcha

knockoutlocal

Junior Member
Joined
Jun 5, 2012
Messages
172
Reaction score
72
I've seen some of the other posts on here regarding grabbing the captcha but they were either in VB and it was a bit hard to understand or it involved grabbing the image directly from the url.

Example:
With outlook.com when I sign up there is a captcha at the bottom. If i pull the image url and try to access it directly it throws a a server error. I know when I use fiddler it sends the image (I can see it in the response headers).

I'm not looking for a full solution with all the bells and whistles just looking for a technique to pull that one image. When I use httpwebrequest it'll just pull the page down for me and I know that may be because of the content-type that I am specifying?

Been reading around and googling up and down trying to figure this out. Any help would be appreciated.

Again, I just want to figure out how to grab that image thats sent back from the server (The one I see in fiddler). I know how to do the rest of it i.e. sending to DBC/submitting the post request.
Thanks
 
It's a bit tricky. I don't have any code to show you, but here is what needs to be done:
1) Navigate to - https://signup.live.com/signup.aspx?lic=1
2)
Code:
string hipUrl = Regex.Unescape(Regex.Match(html, "\"HipUrl\":\"(.+?)\"").Groups[1].Value);
3) Navigate to hipUrl
4)
Code:
string captchaUrl = Regex.Match(html, "\"imageurl\":\"(.+?)\"").Groups[1].Value;

I didn't test it, but it should work. Good luck.
 
Thank you, but when you go the url, nothing pops up. I would think the captcha needs to be pulled from the response headers no?
 
Do you mean when you go to the URL using your browser? That's because the image is one use only: as soon as you request the web page, your browser automatically fetches the image and then it gets deleted from the server. That's why you can't access it after you've requested the page.

Now, I don't know much about C# (I'm mainly a Java guy) but I assume that HttpWebRequest only fetches the html source of that page and nothing else (i.e. no images or other resources). This means that once you have the html you can still access the captcha image from their server because it hasn't been served to you yet. Now, using theMagicNumber's code you can get the URL of that image and use HttpWebRequest to fetch it.
 
Few things to check

1) Are your cookies are set
2) Are you setting the referrer?

Is the call you are making, when viewed in fiddler EXACTLY the same as the browser. If so it will work.

If it isn't working, the answer is in your headers. You may think it isn't, but it is. All the time, without fail.
 
Back
Top