Directory local/webscraper/

Directory Created:
2009-11-28 22:57
Total Files:
5
Deleted Files:
1
Lines of Code:
651

[root]/local/webscraper
            directory in repo rss (12 files, 498 lines)
            directory in repo sites (12 files, 1487 lines)

Lines of Code

local/webscraper/ Lines of Code

Developers

Author Changes Lines of Code Lines per Change
helder 46 (100.0%) 783 (100.0%) 17.0

Most Recent Commits

helder 2010-05-07 16:26 Rev.: 742

Fixed correio da manhã

1 lines of code changed in 1 file:

  • local/webscraper: scraper.py (+1 -1)
helder 2010-03-24 16:37 Rev.: 641

Wait on empty article

3 lines of code changed in 1 file:

  • local/webscraper: scrapsite.py (+3 -1)
helder 2010-03-24 16:30 Rev.: 640

When scraping do not wait if we already have the article.

6 lines of code changed in 2 files:

  • local/webscraper: scraper.py (+3 -3), scrapsite.py (+3 -3)
helder 2010-03-23 16:24 Rev.: 636

Economico RSS feed parser

2 lines of code changed in 1 file:

  • local/webscraper: rsslib.py (+2 -1)
helder 2010-03-10 12:33 Rev.: 612

Remove a stupid fucking debug print

0 lines of code changed in 1 file:

  • local/webscraper: scraper.py (-1)
helder 2010-03-10 12:32 Rev.: 611

Fix a crash on the publico feed, now we accept feeds without terms;
Test for empty article even from a rss feed;
Fix a JN when an article is empty.

16 lines of code changed in 2 files:

  • local/webscraper: rsslib.py (+5 -2), scraper.py (+11 -2)
helder 2010-02-25 19:48 Rev.: 584

Don't log repated articles or waiting periods to file.

1 lines of code changed in 1 file:

  • local/webscraper: scraper.py (+1 -1)
helder 2010-02-25 02:28 Rev.: 581

Fixed rss source CM. Started escaping html entities.

11 lines of code changed in 1 file:

  • local/webscraper: scraper.py (+11 -5)
helder 2010-01-30 23:31 Rev.: 535

- New source 'Oje';
- How-to create a scraper;
- RSS reader for 'JN';
- Small Fixes.

9 lines of code changed in 2 files:

  • local/webscraper: rsslib.py (+3 -4), scraper.py (+6 -5)
helder 2010-01-30 02:48 Rev.: 531

Identify as existing, duplicated articles.

2 lines of code changed in 1 file:

  • local/webscraper: scraper.py (+2)
helder 2010-01-30 02:40 Rev.: 530

Use numeric urlid in source Expresso.

1 lines of code changed in 1 file:

  • local/webscraper: scraper.py (+1 -1)
helder 2010-01-30 01:44 Rev.: 529

News RSS feed reader working. NOTE needs testing.

7 lines of code changed in 1 file:

  • local/webscraper: scraper.py (+7 -7)
helder 2010-01-29 17:57 Rev.: 527

RSS feeders in the new format. Unicode problems to solve before using
in the production machine.

10 lines of code changed in 2 files:

  • local/webscraper: rsslib.py (+7 -3), scraper.py (+3 -2)
helder 2010-01-28 03:08 Rev.: 523

New RSS Feed reader working. Example feed done. All the funcionality from
the previous reader is duplicated. Now there are missing the individual
feeds, then we will be able to remove the old code.

165 lines of code changed in 3 files:

  • local/webscraper: rsslib.py (new 150), scraper.py (+14 -9), scrapsite.py (+1)
helder 2010-01-26 12:55 Rev.: 519

Continuation of the merger of the scraper code and RSS feed reader code:
decoupling of the scraper_list construction and the sites module. This
way we'll be able to inherit the scraper class for each source directly
without any side efects ou extra bagage.

10 lines of code changed in 1 file:

  • local/webscraper: scrapers.py (+10 -1)
helder 2010-01-25 20:29 Rev.: 518

Started preparing the merge of the rss feed reader and the webscraper code.

50 lines of code changed in 4 files:

  • local/webscraper: README (del), scraper.py (+22), scrapers.py (+8 -1), scrapsite.py (+20)
helder 2010-01-18 21:42 Rev.: 505

Fixed the i-online scraper for bad 'alt' attr in img tags.

1 lines of code changed in 1 file:

  • local/webscraper: scraper.py (+1 -1)
helder 2010-01-18 15:36 Rev.: 504

New debug option to use while creting scrapers.
Hack to make i-online work.

13 lines of code changed in 2 files:

  • local/webscraper: scraper.py (+12 -4), scrapsite.py (+1 -1)
helder 2010-01-16 19:16 Rev.: 503

I-Online scrapper working.

35 lines of code changed in 1 file:

  • local/webscraper: scraper.py (+35 -6)
helder 2010-01-12 16:49 Rev.: 497

Exponent config on the webscrapper

3 lines of code changed in 1 file:

  • local/webscraper: scrapsite.py (+3 -2)

(9 more)

Generated by StatSVN 0.4.0