[root]/local/webscraper
rss
(12 files, 498 lines)
sites
(12 files, 1487 lines)
Author | Changes | Lines of Code | Lines per Change |
---|---|---|---|
helder | 46 (100.0%) | 783 (100.0%) | 17.0 |
Fixed correio da manhã
1 lines of code changed in 1 file:
Wait on empty article
3 lines of code changed in 1 file:
When scraping do not wait if we already have the article.
6 lines of code changed in 2 files:
Economico RSS feed parser
2 lines of code changed in 1 file:
Remove a stupid fucking debug print
0 lines of code changed in 1 file:
Fix a crash on the publico feed, now we accept feeds without terms;
Test for empty article even from a rss feed;
Fix a JN when an article is empty.
16 lines of code changed in 2 files:
Don't log repated articles or waiting periods to file.
1 lines of code changed in 1 file:
Fixed rss source CM. Started escaping html entities.
11 lines of code changed in 1 file:
- New source 'Oje';
- How-to create a scraper;
- RSS reader for 'JN';
- Small Fixes.
9 lines of code changed in 2 files:
Identify as existing, duplicated articles.
2 lines of code changed in 1 file:
Use numeric urlid in source Expresso.
1 lines of code changed in 1 file:
News RSS feed reader working. NOTE needs testing.
7 lines of code changed in 1 file:
RSS feeders in the new format. Unicode problems to solve before using
in the production machine.
10 lines of code changed in 2 files:
New RSS Feed reader working. Example feed done. All the funcionality from
the previous reader is duplicated. Now there are missing the individual
feeds, then we will be able to remove the old code.
165 lines of code changed in 3 files:
Continuation of the merger of the scraper code and RSS feed reader code:
decoupling of the scraper_list construction and the sites module. This
way we'll be able to inherit the scraper class for each source directly
without any side efects ou extra bagage.
10 lines of code changed in 1 file:
Started preparing the merge of the rss feed reader and the webscraper code.
50 lines of code changed in 4 files:
Fixed the i-online scraper for bad 'alt' attr in img tags.
1 lines of code changed in 1 file:
New debug option to use while creting scrapers.
Hack to make i-online work.
13 lines of code changed in 2 files:
I-Online scrapper working.
35 lines of code changed in 1 file:
Exponent config on the webscrapper
3 lines of code changed in 1 file:
(9 more)