SEARCH
NEW RPMS
DIRECTORIES
ABOUT
FAQ
VARIOUS
BLOG

 
 
Changelog for python-Scrapy-1.0.1-11.1.i586.rpm :
Sat Jul 4 14:00:00 2015 jacobwinskiAATTgmail.com
- Update to 1.0.1

* Unquote request path before passing to FTPClient, it already escape paths

* include tests/ to source distribution in MANIFEST.in
- Update to 1.0.0

* New Features & Enhancements
+ Python logging
+ FEED_EXPORT_FIELDS option
+ Dns cache size and timeout options
+ support namespace prefix in xmliter_lxml
+ Reactor threadpool max size setting
+ Allow spiders to return dicts.
+ Add Response.urljoin() helper
+ look in ~/.config/scrapy.cfg for user config
+ handle TLS SNI
+ Selectorlist extract first
+ Added JmesSelect
+ add gzip compression to filesystem http cache backend
+ CSS support in link extractors
+ httpcache dont_cache meta #19 #689
+ add signal to be sent when request is dropped by the scheduler
+ avoid download large response
+ Allow to specify the quotechar in CSVFeedSpider
+ Add referer to “Spider error processing” log message
+ process robots.txt once
+ GSoC Per-spider settings
+ Add project name validation
+ GSoC API cleanup
+ Be more responsive with IO operations
+ Do leveldb compaction for httpcache on closing

* Deprecations & Removals
+ Deprecate htmlparser link extractor
+ remove deprecated code from FeedExporter
+ a leftover for.15 compatibility
+ drop support for CONCURRENT_REQUESTS_PER_SPIDER
+ Drop old engine code
+ Deprecate SgmlLinkExtractor

* Relocations
+ Move exporters/__init__.py to exporters.py
+ Move base classes to their packages
+ Module relocation
+ rename SpiderManager to SpiderLoader
+ Remove djangoitem
+ remove scrapy deploy command
+ dissolve contrib_exp
+ Deleted bin folder from root, fixes #913
+ Remove jsonrpc based webservice
+ Move Test cases under project root dir
+ Fix backward incompatibility for relocated paths in settings

* Bugfixes
+ Item multi inheritance fix
+ ItemLoader.load_item: iterate over copy of fields
+ Fix Unhandled error in Deferred
+ Force to read DOWNLOAD_TIMEOUT as int
+ scrapy.utils.misc.load_object should print full traceback
+ Fix bug for ”.local” host name
+ Fix for Enabled extensions, middlewares, pipelines info not printed anymore
+ fix dont_merge_cookies bad behaviour when set to false on meta

* Python 3 In Progress Support
+ disable scrapy.telnet if twisted.conch is not available
+ fix Python 3 syntax errors in ajaxcrawl.py
+ more python3 compatibility changes for urllib
+ assertItemsEqual was renamed to assertCountEqual in Python 3.
+ Import unittest.mock if available.
+ updated deprecated cgi.parse_qsl to use six’s parse_qsl
+ Prevent Python 3 port regressions
+ PY3: use MutableMapping for python 3
+ PY3: use six.BytesIO and six.moves.cStringIO
+ PY3: fix xmlrpclib and email imports
+ PY3: use six for robotparser and urlparse
+ PY3: use six.iterkeys, six.iteritems, and tempfile
+ PY3: fix has_key and use six.moves.configparser
+ PY3: use six.moves.cPickle
+ PY3 make it possible to run some tests in Python3

* Tests
+ remove unnecessary lines from py3-ignores
+ Fix remaining warnings from pytest while collecting tests
+ Add docs build to travis
+ TST don’t collect tests from deprecated modules.
+ install service_identity package in tests to prevent warnings
+ Fix deprecated settings API in tests
+ Add test for webclient with POST method and no body given
+ py3-ignores.txt supports comments
+ modernize some of the asserts
+ selector.__repr__ test

* Code refractoring
+ CSVFeedSpider cleanup: use iterate_spider_output
+ remove unnecessary check from scrapy.utils.spider.iter_spider_output
+ Pydispatch pep8
+ Removed unused ‘load=False’ parameter from walk_modules()
+ For consistency, use job_dir helper in SpiderState extension.
+ rename “sflo” local variables to less cryptic “log_observer”
- update to 0.24.6

* encode invalid xpath with unicode_escape under PY2 (commit 07cb3e5)

* fix IPython shell scope issue and load IPython user config (commit 2c8e573)

* Fix small typo in the docs (commit d694019)

* Fix small typo (commit f92fa83)

* Converted sel.xpath() calls to response.xpath() in Extracting the data (commit c2c6d15)
- update to 0.24.5

* Support new _getEndpoint Agent signatures on Twisted 15.0.0 (commit 540b9bc)

* DOC a couple more references are fixed (commit b4c454b)

* DOC fix a reference (commit e3c1260)

* t.i.b.ThreadedResolver is now a new-style class (commit 9e13f42)

* S3DownloadHandler: fix auth for requests with quoted paths/query params (commit cdb9a0b)

* fixed the variable types in mailsender documentation (commit bb3a848)

* Reset items_scraped instead of item_count (commit edb07a4)

* Tentative attention message about what document to read for contributions (commit 7ee6f7a)

* mitmproxy 0.10.1 needs netlib 0.10.1 too (commit 874fcdd)

* pin mitmproxy 0.10.1 as >0.11 does not work with tests (commit c6b21f0)

* Test the parse command locally instead of against an external url (commit c3a6628)

* Patches Twisted issue while closing the connection pool on HTTPDownloadHandler (commit d0bf957)

* Updates documentation on dynamic item classes. (commit eeb589a)

* Merge pull request #943 from Lazar-T/patch-3 (commit 5fdab02)

* typo (commit b0ae199)

* pywin32 is required by Twisted. closes #937 (commit 5cb0cfb)

* Update install.rst (commit 781286b)

* Merge pull request #928 from Lazar-T/patch-1 (commit b415d04)

* comma instead of fullstop (commit 627b9ba)

* Merge pull request #885 from jsma/patch-1 (commit de909ad)

* Update request-response.rst (commit 3f3263d)

* SgmlLinkExtractor - fix for parsing tag with Unicode present (commit 49b40f0)

Mon Mar 2 13:00:00 2015 toddrme2178AATTgmail.com
- Add python-pyasn1 requirement

Thu Sep 4 14:00:00 2014 toddrme2178AATTgmail.com
Update to 0.24.4

* pem file is used by mockserver and required by scrapy bench

* scrapy bench needs scrapy.tests
*
- Update to 0.24.3

* no need to waste travis-ci time on py3 for 0.24

* Update installation docs

* There is a trove classifier for Scrapy framework!

* update other places where w3lib version is mentioned

* Update w3lib requirement to 1.8.0

* Use w3lib.html.replace_entities() (remove_entities() is
deprecated)

* set zip_safe=False

* do not ship tests package

* scrapy.bat is not needed anymore

* Modernize setup.py

* headers can not handle non-string values

* fix ftp test cases

* The sum up of travis-ci builds are taking like 50min to complete

* Update shell.rst typo

* removes weird indentation in the shell results

* improved explanations, clarified blog post as source, added link
for XPath string functions in the spec

* renamed UserTimeoutError and ServerTimeouterror #583

* adding some xpath tips to selectors docs

* fix tests to account for https://github.com/scrapy/w3lib/pull/23

* get_func_args maximum recursion fix #728

* Updated input/ouput processor example according to #560.

* Fixed Python syntax in tutorial.

* Add test case for tunneling proxy

* Bugfix for leaking Proxy-Authorization header to remote host when
using tunneling

* Extract links from XHTML documents with MIME-Type
\"application/xml\"

* Merge pull request #793 from roysc/patch-1

* Fix typo in commands.rst

* better testcase for settings.overrides.setdefault

* Using CRLF as line marker according to http 1.1 definition
- Update to 0.24.2

* Use a mutable mapping to proxy deprecated settings.overrides and
settings.defaults attribute

* there is not support for python3 yet

* Update python compatible version set to debian packages

* DOC fix formatting in release notes
- Update to 0.24.1

* Fix deprecated CrawlerSettings and increase backwards
compatibility with .defaults attribute
- Update to 0.24.0

* Enhancements
+ Improve Scrapy top-level namespace
+ Add selector shortcuts to responses
+ Add new lxml based LinkExtractor to replace unmantained
SgmlLinkExtractor
+ Cleanup settings API - part of per-spider settings
*
*GSoC
project
*
*
+ Add UTF8 encoding header to templates
+ Telnet console now binds to 127.0.0.1 by default
+ Update debian/ubuntu install instructions
+ Disable smart strings in lxml XPath evaluations
+ Restore filesystem based cache as default for http
cache middleware
+ Expose current crawler in Scrapy shell
+ Improve testsuite comparing CSV and XML exporters
+ New `offsite/filtered` and `offsite/domains` stats
+ Support process_links as generator in CrawlSpider
+ Verbose logging and new stats counters for DupeFilter
+ Add a mimetype parameter to `MailSender.send()`
+ Generalize file pipeline log messages
+ Replace unencodeable codepoints with html entities in
SGMLLinkExtractor
+ Converted SEP documents to rst format
+ Tests and docs for clickdata\'s nr index in FormRequest
+ Allow to disable a downloader handler just like any other
component
+ Log when a request is discarded after too many redirections
+ Log error responses if they are not handled by spider callbacks
+ Add content-type check to http compression mw
+ Run pypy tests using latest pypi from ppa
+ Run test suite using pytest instead of trial
+ Build docs and check for dead links in tox environment
+ Make scrapy.version_info a tuple of integers
+ Infer exporter\'s output format from filename extensions
+ Support case-insensitive domains in `url_is_from_any_domain()`
+ Remove pep8 warnings in project and spider templates
+ Tests and docs for `request_fingerprint` function
+ Update SEP-19 for GSoC project `per-spider settings`
+ Set exit code to non-zero when contracts fails
+ Add a setting to control what class is instanciated as
Downloader component
+ Pass response in `item_dropped` signal
+ Improve `scrapy check` contracts command
+ Document `spider.closed()` shortcut
+ Document `request_scheduled` signal
+ Add a note about reporting security issues
+ Add LevelDB http cache storage backend
+ Sort spider list output of `scrapy list` command
+ Multiple documentation enhancemens and fixes

* Bugfixes
+ Encode unicode URL value when creating Links in
RegexLinkExtractor
+ Ignore None values in ItemLoader processors
+ Fix link text when there is an inner tag in SGMLLinkExtractor
and HtmlParserLinkExtractor
+ Fix wrong checks on subclassing of deprecated classes
+ Handle errors caused by inspect.stack() failures
+ Fix a reference to unexistent engine attribute
+ Fix dynamic itemclass example usage of type()
+ Use lucasdemarchi/codespell to fix typos
+ Fix default value of attrs argument in SgmlLinkExtractor to be
tuple
+ Fix XXE flaw in sitemap reader
+ Fix engine to support filtered start requests
+ Fix offsite middleware case on urls with no hostnames
+ Testsuite doesn\'t require PIL anymore
- Update to 0.22.2

* fix a reference to unexistent engine.slots. closes #593

* downloaderMW doc typo (spiderMW doc copy remnant)

* Correct typos
- Update to 0.22.1

* localhost666 can resolve under certain circumstances

* test inspect.stack failure

* Handle cases when inspect.stack() fails

* Fix wrong checks on subclassing of deprecated classes. closes #581

* Docs: 4-space indent for final spider example

* Fix HtmlParserLinkExtractor and tests after #485 merge

* BaseSgmlLinkExtractor: Fixed the missing space when the link has
an inner tag

* BaseSgmlLinkExtractor: Added unit test of a link with an inner tag

* BaseSgmlLinkExtractor: Fixed unknown_endtag() so that it only set
current_link=None when the end tag match the opening tag

* Fix tests for Travis-CI build

* replace unencodeable codepoints with html entities.

* RegexLinkExtractor: encode URL unicode value when creating Links

* Updated the tutorial crawl output with latest output.

* Updated shell docs with the crawler reference and fixed the actual
shell output.

* PEP8 minor edits.

* Expose current crawler in the scrapy shell.

* Unused re import and PEP8 minor edits.

* Ignore None\'s values when using the ItemLoader.

* DOC Fixed HTTPCACHE_STORAGE typo in the default value which is now
Filesystem instead Dbm.

* show ubuntu setup instructions as literal code

* Update Ubuntu installation instructions

* Merge pull request #550 from stray-leone/patch-1

* modify the version of scrapy ubuntu package

* fix 0.22.0 release date

* fix typos in news.rst and remove (not released yet) header
- Update to 0.22.0

* Enhancements
+ [
*
*Backwards incompatible
*
*] Switched HTTPCacheMiddleware
backend to filesystem
To restore old backend set `HTTPCACHE_STORAGE` to
`scrapy.contrib.httpcache.DbmCacheStorage`
+ Proxy \\https:// urls using CONNECT method
+ Add a middleware to crawl ajax crawleable pages as defined by
google
+ Rename scrapy.spider.BaseSpider to scrapy.spider.Spider
+ Selectors register EXSLT namespaces by default
+ Unify item loaders similar to selectors renaming
+ Make `RFPDupeFilter` class easily subclassable
+ Improve test coverage and forthcoming Python 3 support
+ Promote startup info on settings and middleware to INFO level
+ Support partials in `get_func_args` util
+ Allow running indiviual tests via tox
+ Update extensions ignored by link extractors
+ Add middleware methods to get files/images/thumbs paths
+ Improve offsite middleware tests
+ Add a way to skip default Referer header set by
RefererMiddleware
+ Do not send `x-gzip` in default `Accept-Encoding` header
+ Support defining http error handling using settings
+ Use modern python idioms wherever you find legacies
+ Improve and correct documentation

* Fixes
+ Update Selector class imports in CrawlSpider template
+ Fix unexistent reference to `engine.slots`
+ Do not try to call `body_as_unicode()` on a non-TextResponse
instance
+ Warn when subclassing XPathItemLoader, previously it only warned
on instantiation.
+ Warn when subclassing XPathSelector, previously it only warned
on instantiation.
+ Multiple fixes to memory stats
+ Fix overriding url in `FormRequest.from_response()`
+ Fix tests runner under pip 1.5
+ Fix logging error when spider name is unicode
- Update to 0.20.2

* Update CrawlSpider Template with Selector changes

* fix method name in tutorial. closes GH-480
- Update to 0.20.1

* include_package_data is required to build wheels from published
sources

* process_parallel was leaking the failures on its internal
deferreds.
- Update to 0.20.0

* Enhancements
+ New Selector\'s API including CSS selectors
+ Request/Response url/body attributes are now immutable
(modifying them had been deprecated for a long time)
+ :setting:`ITEM_PIPELINES` is now defined as a dict (instead of a
list)
+ Sitemap spider can fetch alternate URLs
+ `Selector.remove_namespaces()` now remove namespaces from
element\'s attributes.
+ Paved the road for Python 3.3+
+ New item exporter using native python types with nesting support
+ Tune HTTP1.1 pool size so it matches concurrency defined by
settings
+ scrapy.mail.MailSender now can connect over TLS or upgrade using
STARTTLS
+ New FilesPipeline with functionality factored out from
ImagesPipeline
+ Recommend Pillow instead of PIL for image handling
+ Added debian packages for Ubuntu quantal and raring
+ Mock server (used for tests) can listen for HTTPS requests
+ Remove multi spider support from multiple core components
+ Travis-CI now tests Scrapy changes against development versions
of `w3lib` and `queuelib` python packages.
+ Add pypy 2.1 to continuous integration tests
+ Pylinted, pep8 and removed old-style exceptions from source
+ Use importlib for parametric imports
+ Handle a regression introduced in Python 2.7.5 that affects
XmlItemExporter
+ Bugfix crawling shutdown on SIGINT
+ Do not submit `reset` type inputs in FormRequest.from_response
+ Do not silence download errors when request errback raises an
exception

* Bugfixes
+ Fix tests under Django 1.6
+ Lot of bugfixes to retry middleware under disconnections using
HTTP 1.1 download handler
+ Fix inconsistencies among Twisted releases
+ Fix scrapy shell bugs
+ Fix invalid variable name in setup.py
+ Fix tutorial references
+ Improve request-response docs
+ Improve best practices docs
+ Improve django integration docs
+ Document `bindaddress` request meta
+ Improve `Request` class documentation

* Other
+ Dropped Python 2.6 support
+ Add `cssselect`_ python package as install dependency
+ Drop libxml2 and multi selector\'s backend support, `lxml`_ is
required from now on.
+ Minimum Twisted version increased to 10.0.0, dropped Twisted 8.0
support.
+ Running test suite now requires `mock` python library
- Update to 0.18.4

* IPython refuses to update the namespace. fix #396

* Fix AlreadyCalledError replacing a request in shell command.

* Fix start_requests laziness and early hangs
- Update to 0.18.3

* fix regression on lazy evaluation of start requests

* forms: do not submit reset inputs

* increase unittest timeouts to decrease travis false positive
failures

* backport master fixes to json exporter

* Fix permission and set umask before generating sdist tarball
- Update to 0.18.2

* Backport `scrapy check` command fixes and backward compatible
multi crawler process
- Update to 0.18.1

* remove extra import added by cherry picked changes

* fix crawling tests under twisted pre 11.0.0

* py26 can not format zero length fields {}

* test PotentiaDataLoss errors on unbound responses

* Treat responses without content-length or Transfer-Encoding as
good responses

* do no include ResponseFailed if http11 handler is not enabled

* New HTTP client wraps connection losts in ResponseFailed
exception.

* limit travis-ci build matrix

* Merge pull request #375 from peterarenot/patch-1

* Fixed so it refers to the correct folder

* added quantal & raring to support ubuntu releases

* fix retry middleware which didn\'t retry certain connection errors
after the upgrade to http1 client, closes GH-373

* fix XmlItemExporter in Python 2.7.4 and 2.7.5

* minor updates to 0.18 release notes

* fix contributters list format
- Update to 0.18.0

* Lot of improvements to testsuite run using Tox, including a way to
test on pypi

* Handle GET parameters for AJAX crawleable urls

* Use lxml recover option to parse sitemaps

* Bugfix cookie merging by hostname and not by netloc

* Support disabling `HttpCompressionMiddleware` using a flag setting

* Support xml namespaces using `iternodes` parser in `XMLFeedSpider`

* Support `dont_cache` request meta flag

* Bugfix `scrapy.utils.gz.gunzip` broken by changes in python 2.7.4

* Bugfix url encoding on `SgmlLinkExtractor`

* Bugfix `TakeFirst` processor shouldn\'t discard zero (0) value

* Support nested items in xml exporter

* Improve cookies handling performance

* Log dupe filtered requests once

* Split redirection middleware into status and meta based
middlewares

* Use HTTP1.1 as default downloader handler

* Support xpath form selection on `FormRequest.from_response`

* Bugfix unicode decoding error on `SgmlLinkExtractor`

* Bugfix signal dispatching on pypi interpreter

* Improve request delay and concurrency handling

* Add RFC2616 cache policy to `HttpCacheMiddleware`

* Allow customization of messages logged by engine

* Multiples improvements to `DjangoItem`

* Extend Scrapy commands using setuptools entry points

* Allow spider `allowed_domains` value to be set/tuple

* Support `settings.getdict`

* Simplify internal `scrapy.core.scraper` slot handling

* Added `Item.copy`

* Collect idle downloader slots

* Add `ftp://` scheme downloader handler

* Added downloader benchmark webserver and spider tools
:ref:`benchmarking`

* Moved persistent (on disk) queues to a separate project
(queuelib_) which scrapy now depends on

* Add scrapy commands using external libraries

* Added ``--pdb`` option to ``scrapy`` command line tool

* Added :meth:`XPathSelector.remove_namespaces` which allows to
remove all namespaces from XML documents for convenience (to work
with namespace-less XPaths). Documented in :ref:`topics-selectors`

* Several improvements to spider contracts

* New default middleware named MetaRefreshMiddldeware that handles
meta-refresh html tag redirections,

* MetaRefreshMiddldeware and RedirectMiddleware have different
priorities to address #62

* added from_crawler method to spiders

* added system tests with mock server

* more improvements to Mac OS compatibility (thanks Alex Cepoi)

* several more cleanups to singletons and multi-spider support
(thanks Nicolas Ramirez)

* support custom download slots

* added --spider option to \"shell\" command.

* log overridden settings when scrapy starts
- Update to 0.16.5

* obey request method when scrapy deploy is redirected to a new
endpoint

* fix inaccurate downloader middleware documentation. refs #280

* doc: remove links to diveintopython.org, which is no longer
available.

* Find form nodes in invalid html5 documents

* Fix typo labeling attrs type bool instead of list
- Update to 0.16.4

* fixes spelling errors in documentation

* add doc about disabling an extension. refs #132

* Fixed error message formatting. log.err() doesn\'t support cool
formatting and when error occurred, the message was: \"ERROR: Error
processing %(item)s\"

* lint and improve images pipeline error logging

* fixed doc typos

* add documentation topics: Broad Crawls & Common Practies

* fix bug in scrapy parse command when spider is not specified
explicitly.

* Update docs/topics/commands.rst
- Update dependencies
- Update package name to reflect python packaging guidelines

Mon Nov 4 13:00:00 2013 castedoAATTcastedo.com
- Upgrade .spec dependencies to work with SLE 11 SP3

* python-twisted 8.0 from standard SLE11 repository not working, force >= 9.0

* use new \"python-pyOpenSSL\" name rather than old \"python-openssl\"

Mon Jan 21 13:00:00 2013 p.drouandAATTgmail.com
- Update to version 0.16.3:

* Remove concurrency limitation when using download delays and still
ensure inter-request delays are enforced (commit 487b9b5)

* Add error details when image pipeline fails (commit 8232569)

* Improve mac os compatibility (commit 8dcf8aa)

* Setup.py: use README.rst to populate long_description (commit 7b5310d)

* Doc: removed obsolete references to ClientForm (commit 80f9bb6)

* Correct docs for default storage backend (commit 2aa491b)

* Doc: removed broken proxyhub link from FAQ (commit bdf61c4)

* Fixed docs typo in SpiderOpenCloseLogging example (commit 7184094)

Wed May 23 14:00:00 2012 jfunkAATTfunktronics.ca
- Update to 1.14.4

* added precise to supported ubuntu distros (commit b7e46df)

* fixed bug in json-rpc webservice reported in
https://groups.google.com/d/topic/scrapy-users/qgVBmFybNAQ/discussion.
also removed no longer supported \'run\' command from extras/scrapy-ws.py
(commit 340fbdb)

* meta tag attributes for content-type http equiv can be in any order. #123
(commit 0cb68af)

* replace \"import Image\" by more standard \"from PIL import Image\". closes
[#88] (commit 4d17048)

* return trial status as bin/runtests.sh exit value. #118 (commit b7b2e7f)
- Run tests
- Add missing dependencies

Fri Apr 27 14:00:00 2012 jfunkAATTfunktronics.ca
- Update to 1.14.3

* forgot to include pydispatch license. #118

* include egg files used by testsuite in source distribution. #118 (c897793)

* update docstring in project template to avoid confusion with genspider
command, which may be considered as an advanced feature. refs #107
(2548dcc)

* added note to docs/topics/firebug.rst about google directory being shut
down (668e352)

* Merge branch \'0.14\' of github.com:scrapy/scrapy into 0.14 (835d082)

* dont discard slot when empty, just save in another dict in order to
recycle if needed again. (8e9f607)

* do not fail handling unicode xpaths in libxml2 backed selectors (b830e95)

* fixed minor mistake in Request objects documentation (bf3c9ee)

* fixed minor defect in link extractors documentation (ba14f38)

* removed some obsolete remaining code related to sqlite support in scrapy
(0665175)
- Rebuild spec with current conventions
- Build docs

Tue Mar 13 13:00:00 2012 jfunkAATTfunktronics.ca
- Update to 0.14.2

* move buffer pointing to start of file before computing checksum. refs #92
(6a5bef2)

* Compute image checksum before persisting images. closes #92 (9817df1)

* remove leaking references in cached failures (673a120)

* fixed bug in MemoryUsage extension: get_engine_status() takes exactly 1
argument (0 given) (11133e9)

* Merge branch \'0.14\' of github.com:scrapy/scrapy into 0.14 (1627320)

* fixed struct.error on http compression middleware. closes #87 (1423140)

* ajax crawling wasn\'t expanding for unicode urls (0de3fb4)

* Catch start_requests iterator errors. refs #83 (454a21d)

* Speed-up libxml2 XPathSelector (2fbd662)

* updated versioning doc according to recent changes (0a070f5)

* scrapyd: fixed documentation link (2b4e4c3)

* extras/makedeb.py: no longer obtaining version from git (caffe0e)

Thu Jan 12 13:00:00 2012 jfunkAATTfunktronics.ca
- Update to 0.14.1:

* extras/makedeb.py: no longer obtaining version from git (caffe0e)

* bumped version to 0.14.1 (6cb9e1c)

* fixed reference to tutorial directory (4b86bd6)

* doc: removed duplicated callback argument from Request.replace() (1aeccdd)

* fixed formatting of scrapyd doc (8bf19e6)

* Dump stacks for all running threads and fix engine status dumped by
StackTraceDump extension (14a8e6e)

* added comment about why we disable ssl on boto images upload (5223575)

* SSL handshaking hangs when doing too many parallel connections to S3
(63d583d)

* change tutorial to follow changes on dmoz site (bcb3198)

* Avoid _disconnectedDeferred AttributeError exception in Twisted>=11.1.0
(98f3f87)

* allow spider to set autothrottle max concurrency (175a4b5)

Thu Dec 8 13:00:00 2011 jfunkAATTfunktronics.ca
- Update to 0.14.0.2841
- New features and settings
- Support for AJAX crawleable urls
- New persistent scheduler that stores requests on disk, allowing to
suspend and resume crawls (r2737)
- added -o option to scrapy crawl, a shortcut for dumping scraped items
into a file (or standard output using -)
- Added support for passing custom settings to Scrapyd schedule.json api
(r2779, r2783)
- New ChunkedTransferMiddleware (enabled by default) to support chunked
transfer encoding (r2769)
- Add boto 2.0 support for S3 downloader handler (r2763)
- Added marshal to formats supported by feed exports (r2744)
- In request errbacks, offending requests are now received in
failure.request attribute (r2738)
- Big downloader refactoring to support per domain/ip concurrency limits
(r2732)
- CONCURRENT_REQUESTS_PER_SPIDER setting has been deprecated and replaced
by:
CONCURRENT_REQUESTS, CONCURRENT_REQUESTS_PER_DOMAIN,
CONCURRENT_REQUESTS_PER_IP
check the documentation for more details
- Added builtin caching DNS resolver (r2728)
- Moved Amazon AWS-related components/extensions (SQS spider queue,
SimpleDB stats collector) to a separate project: scaws (r2706, r2714)
- Moved spider queues to scrapyd: scrapy.spiderqueue ->
scrapyd.spiderqueue (r2708)
- Moved sqlite utils to scrapyd: scrapy.utils.sqlite -> scrapyd.sqlite
(r2781)
- Real support for returning iterators on start_requests() method. The
iterator is now consumed during the crawl when the spider is getting
idle (r2704)
- Added REDIRECT_ENABLED setting to quickly enable/disable the redirect
middleware (r2697)
- Added RETRY_ENABLED setting to quickly enable/disable the retry
middleware (r2694)
- Added CloseSpider exception to manually close spiders (r2691)
- Improved encoding detection by adding support for HTML5 meta charset
declaration (r2690)
- Refactored close spider behavior to wait for all downloads to finish and
be processed by spiders, before closing the spider (r2688)
- Added SitemapSpider (see documentation in Spiders page) (r2658)
- Added LogStats extension for periodically logging basic stats (like
crawled pages and scraped items) (r2657)
- Make handling of gzipped responses more robust (#319, r2643). Now Scrapy
will try and decompress as much as possible from a gzipped response,
instead of failing with an IOError.
- Simplified !MemoryDebugger extension to use stats for dumping memory
debugging info (r2639)
- Added new command to edit spiders: scrapy edit (r2636) and -e flag to
genspider command that uses it (r2653)
- Changed default representation of items to pretty-printed dicts.
(r2631). This improves default logging by making log more readable in
the default case, for both Scraped and Dropped lines.
- Added spider_error signal (r2628)
- Added COOKIES_ENABLED setting (r2625)
- Stats are now dumped to Scrapy log (default value of STATS_DUMP setting
has been changed to True). This is to make Scrapy users more aware of
Scrapy stats and the data that is collected there.
- Added support for dynamically adjusting download delay and maximum
concurrent requests (r2599)
- Added new DBM HTTP cache storage backend (r2576)
- Added listjobs.json API to Scrapyd (r2571)
- CsvItemExporter: added join_multivalued parameter (r2578)
- Added namespace support to xmliter_lxml (r2552)
- Improved cookies middleware by making COOKIES_DEBUG nicer and
documenting it (r2579)
- Several improvements to Scrapyd and Link extractors
- Code rearranged and removed
- Merged item passed and item scraped concepts, as they have often proved
confusing in the past. This means: (r2630)
- original item_scraped signal was removed
- original item_passed signal was renamed to item_scraped
- old log lines Scraped Item... were removed
- old log lines Passed Item... were renamed to Scraped Item... lines and
downgraded to DEBUG level
- Reduced Scrapy codebase by striping part of Scrapy code into two new
libraries:
- w3lib (several functions from
scrapy.utils.{http,markup,multipart,response,url}, done in r2584)
- scrapely (was scrapy.contrib.ibl, done in r2586)
- Removed unused function: scrapy.utils.request.request_info() (r2577)
- Removed googledir project from examples/googledir. There\'s now a new
example project called dirbot available on github:
https://github.com/scrapy/dirbot
- Removed support for default field values in Scrapy items (r2616)
- Removed experimental crawlspider v2 (r2632)
- Removed scheduler middleware to simplify architecture. Duplicates filter
is now done in the scheduler itself, using the same dupe fltering class
as before (DUPEFILTER_CLASS setting) (r2640)
- Removed support for passing urls to scrapy crawl command (use scrapy
parse instead) (r2704)
- Removed deprecated Execution Queue (r2704)
- Removed (undocumented) spider context extension (from
scrapy.contrib.spidercontext) (r2780)
- removed CONCURRENT_SPIDERS setting (use scrapyd maxproc instead) (r2789)
- Renamed attributes of core components: downloader.sites ->
downloader.slots, scraper.sites -> scraper.slots (r2717, r2718)
- Renamed setting CLOSESPIDER_ITEMPASSED to CLOSESPIDER_ITEMCOUNT (r2655).
Backwards compatibility kept.

Thu Mar 10 13:00:00 2011 jfunkAATTfunktronics.ca
- Initial release


 
ICM