Search Engines :: Winsock
Hi.
I will begin work on a feature in a program to do a search on search engines such as Google and Yahoo. For example, the user will type in whatever phrase he/she wants such as "race & cars," and the program will do a search via Google and print the responses.
I am not sure what message I need to send and where to send it if I want to search from, for example, Google. Do I send this:
// "http://www.google.com/search?hl=en&lr=&ie=UTF8&oe=UTF8&q=race+%26+cars//
I pasted the code above directly from the URL when I did a search at Google.
Secondly, what data will the search engine return? For example, the search above came back with more than ten pages at Google. Will Google send back all pages or one page at a time?
Please add if you have any experience dealing with search engines and/or communicating with websites in general.
Thanks,
Kuphryn
firs of all.. the protocal to get that search page would be.. connect to the server (www.google.com) on port 80, then send the following over the connection:
GET /search?hl=en&lr=&ie=UTF8&oe=UTF8&q=race+%26+cars HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel
Accept-Language: en-gb
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)
Host: www.google.com:80
Msg=Connection: Keep-Alive
please note.. there are 2 carage returns sent at the end of the message
The server will then send you the resulting search page.. but it doesnt just provide the info you want. It will return the HTML for that page.. ie goto http://www.google.com/search?hl=en&lr=&ie=UTF8&oe=UTF8&q=race+%26+cars and then right click the page and press view source and thats what you will recieve
good luck
~ wrathgame
GET /search?hl=en&lr=&ie=UTF8&oe=UTF8&q=race+%26+cars HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel
Accept-Language: en-gb
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)
Host: www.google.com:80
Msg=Connection: Keep-Alive
please note.. there are 2 carage returns sent at the end of the message
The server will then send you the resulting search page.. but it doesnt just provide the info you want. It will return the HTML for that page.. ie goto http://www.google.com/search?hl=en&lr=&ie=UTF8&oe=UTF8&q=race+%26+cars and then right click the page and press view source and thats what you will recieve
good luck
~ wrathgame
~ Tim
i.e. Look up the HTTP protocol.
EDIT:
Does that make sense? the HTT Protocol protocol?
Rolls off the tongue better than 'the HTTP'.
[edited by - JuNC on June 17, 2002 2:00:01 PM]
EDIT:
Does that make sense? the HTT Protocol protocol?
![](smile.gif)
[edited by - JuNC on June 17, 2002 2:00:01 PM]
Nice! Thanks.
I have questions.
-----
GET /search?hl=en&lr=&ie=UTF8&oe=UTF8&q=race+%26+cars HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel
Accept-Language: en-gb
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)
Host: www.google.com:80
Msg=Connection: Keep-Alive
-----
Is the message above exactly what I should be sending at one time or one line each? I assume at one time, but I want to make certain. The two lines at the end are "2 carage returns."
For example, will the engine engine send all pages back or one page at a time? If it sends one page at a time, then that means you will have to send another message for the next page.
Kuphryn
I have questions.
-----
GET /search?hl=en&lr=&ie=UTF8&oe=UTF8&q=race+%26+cars HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel
Accept-Language: en-gb
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)
Host: www.google.com:80
Msg=Connection: Keep-Alive
-----
Is the message above exactly what I should be sending at one time or one line each? I assume at one time, but I want to make certain. The two lines at the end are "2 carage returns."
For example, will the engine engine send all pages back or one page at a time? If it sends one page at a time, then that means you will have to send another message for the next page.
Kuphryn
If you look at a google query in ie, you''ll notice that the URLs for the different pages have this form:
(page 2)
http://www.google.com/search?q=something&hl=en&lr=&ie=UTF8&oe=UTF8&start=10&sa=N
(page 3)
http://www.google.com/search?q=something&hl=en&lr=&ie=UTF8&oe=UTF8&start=20&sa=N
etc. so to retrieve various sets of data it looks like you just need to requery the system.
To answer the question about the GET packet, you really need to find some references to basic HTTP implementations (random searches revealed this and this, I''m sure there are better resources). Maybe someone will have specific answers but understanding at least the basics of HTTP will be vital.
(page 2)
http://www.google.com/search?q=something&hl=en&lr=&ie=UTF8&oe=UTF8&start=10&sa=N
(page 3)
http://www.google.com/search?q=something&hl=en&lr=&ie=UTF8&oe=UTF8&start=20&sa=N
etc. so to retrieve various sets of data it looks like you just need to requery the system.
To answer the question about the GET packet, you really need to find some references to basic HTTP implementations (random searches revealed this and this, I''m sure there are better resources). Maybe someone will have specific answers but understanding at least the basics of HTTP will be vital.
I found out Google offers Google API that will help with the search and make the programming process much, much easier and quicker. The only drawback is the fact that under the license agreement, you can only make 1,000 query. I think I have end up having to use the Google API because of the simplicity.
I really want full control over the search via programming skill. In other words, I would like to do any searches via programming Win32 API instead of going through Google.
Kuphryn
[edited by - kuphryn on June 18, 2002 2:42:53 PM]
I really want full control over the search via programming skill. In other words, I would like to do any searches via programming Win32 API instead of going through Google.
Kuphryn
[edited by - kuphryn on June 18, 2002 2:42:53 PM]
i''m also pretty sure you can''t use the google API in a commercial application. so as long as whatever you make you give away for free then you''re set ![](smile.gif)
-me
![](smile.gif)
-me
Success! Thanks wrathgame for the nice HTTP GET code and the other modifying HTTP handshake codes.
The program I am working on can now do a search on Google and filter out sites. The only problem I face now is determine when to stop sending GET for addition pages. Google has a limit on the number of pages in a search. For example, if you do a search for "car," I believe Google returns about 800 or so pages. Google sends ten hits per GET command, so you send the GET command 80 times. With programming, I am not sure how to determine the last page Google will send back.
Please post if there is some HTTP code that Google sends back indication that it is the last page for a particular searchkey.
Thanks,
Kuphryn
The program I am working on can now do a search on Google and filter out sites. The only problem I face now is determine when to stop sending GET for addition pages. Google has a limit on the number of pages in a search. For example, if you do a search for "car," I believe Google returns about 800 or so pages. Google sends ten hits per GET command, so you send the GET command 80 times. With programming, I am not sure how to determine the last page Google will send back.
Please post if there is some HTTP code that Google sends back indication that it is the last page for a particular searchkey.
Thanks,
Kuphryn
June 21, 2002 05:36 PM
From google.com:
Searched the web for dog. Results 811 - 820 of about 14,100,000
So pull that 14,100,000 number off the page, divide by 10, and you''re good to go.
Searched the web for dog. Results 811 - 820 of about 14,100,000
So pull that 14,100,000 number off the page, divide by 10, and you''re good to go.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement