web crawler options

Discussions about design and architecture principles, including native JADE systems and JADE interoperating with other technologies
ConvertFromOldNGs
Posts: 5321
Joined: Wed Aug 05, 2009 5:19 pm

web crawler options

Postby ConvertFromOldNGs » Fri Aug 07, 2009 11:31 am

by bitsnz >> Thu, 12 Aug 2004 22:14:39 GMT

Hi there.

I posted this in another newsgroup but it might be more appropriate to post it here.

I wrote a web crawler in java about 3 years ago. I now have to write one in JADE. Therefore, i have somewhat of a template to start with but for those who know java, the differences are quite numerous compared to JADE.

So, my question is this. What options do i have to start this web crawler. i.e. Is there a specific schema i can play with to create this web crawler or do i start from scratch.

Also, what pitfalls should i be aware of.

I need some guidance in the areas of connectivity classes to use etc.

Cheers for any help people.
From what ive read on other posts we truly have a great system in place with this newsgroup

ConvertFromOldNGs
Posts: 5321
Joined: Wed Aug 05, 2009 5:19 pm

Re: web crawler options

Postby ConvertFromOldNGs » Fri Aug 07, 2009 11:31 am

by rob >> Thu, 12 Aug 2004 22:34:01 GMT

I have used the CardSchema CnHttp class as a basis for a simple web crawler. The methods on this class write GET results to a file, but you can reimplement these, or call the extrnal functions directly, which are in karma.dll; I understand these are essentially wrappers for wininet.dll calls. Once you can 'call' a url using these functions, you should be able to use your template fairly easily.
Rob

ConvertFromOldNGs
Posts: 5321
Joined: Wed Aug 05, 2009 5:19 pm

Re: web crawler options

Postby ConvertFromOldNGs » Fri Aug 07, 2009 11:31 am

by bitsnz >> Mon, 16 Aug 2004 0:56:03 GMT

Cheers Rob.

I'll give it a go and see what develops. Using dll files are fairly new to me, but ill see what i can work through.

ConvertFromOldNGs
Posts: 5321
Joined: Wed Aug 05, 2009 5:19 pm

Re: web crawler options

Postby ConvertFromOldNGs » Fri Aug 07, 2009 11:31 am

by Steve >> Tue, 17 Aug 2004 1:31:47 GMT

Isn't it great how Jade normally hides all this complexity from you. :)

ConvertFromOldNGs
Posts: 5321
Joined: Wed Aug 05, 2009 5:19 pm

Re: web crawler options

Postby ConvertFromOldNGs » Fri Aug 07, 2009 11:31 am

by rob >> Thu, 19 Aug 2004 3:42:41 GMT

I dug out an old schema that used the CnHttp class to GET and POST internet data, and loaded this into a Jade 6.0.18 environment with CardSchema 3. The calls still work for internal urls, but not external; I suspect the calls don't handle proxies. If this is an issue, then I suspect you may be stuck without rewriting the wininet wrapper calls.

I've attached the schema anyway in case it's of any use - there's a JadeScript that fetches a web page. There is also an additional method (on CnHttp) which allows a POST (can be used to send theoretically unlimited amounts of data from a client to an http server).

btw, my web crawler was written for HTML3, and before JS became de rigeur, so doesn't find most of the links on *modern* web pages.

Cheers, Rob
Attachments
3615_1.zip
(4.15 KiB) Downloaded 123 times
Last edited by ConvertFromOldNGs on Fri Aug 07, 2009 4:12 pm, edited 1 time in total.

ConvertFromOldNGs
Posts: 5321
Joined: Wed Aug 05, 2009 5:19 pm

Re: web crawler options

Postby ConvertFromOldNGs » Fri Aug 07, 2009 11:31 am

by bitsnz >> Thu, 26 Aug 2004 1:29:17 GMT

My many thankyou's and military salutes to you rob.
Any help is better than no help.

ConvertFromOldNGs
Posts: 5321
Joined: Wed Aug 05, 2009 5:19 pm

Re: web crawler options

Postby ConvertFromOldNGs » Fri Aug 07, 2009 11:31 am

by bitsnz >> Tue, 17 Aug 2004 4:29:31 GMT

mmmm...my progress so far is not good. Ive created what i need from the CardSchema classess and methods but i cant seem to get it all working together

Any suggestions? Any examples?

ConvertFromOldNGs
Posts: 5321
Joined: Wed Aug 05, 2009 5:19 pm

Re: web crawler options

Postby ConvertFromOldNGs » Fri Aug 07, 2009 11:31 am

by bitsnz >> Tue, 17 Aug 2004 4:29:59 GMT

mmmm...my progress so far is not good. Ive created what i need from the CardSchema classess and methods but i cant seem to get it all working together

Any suggestions? Any examples?

ConvertFromOldNGs
Posts: 5321
Joined: Wed Aug 05, 2009 5:19 pm

Re: web crawler options

Postby ConvertFromOldNGs » Fri Aug 07, 2009 11:31 am

by bitsnz >> Tue, 17 Aug 2004 4:30:55 GMT

mmmm...my progress so far is not good. Ive created what i need from the CardSchema classess and methods but i cant seem to get it all working together

Any suggestions? Any examples?


Return to “Design and Architecture”

Who is online

Users browsing this forum: No registered users and 1 guest

cron