Skip to content

Relative url begin with '?' problem #7

Description

@fredxiong

DefaultNormalizer should concatenate relative url begin with '?' with the url which conatains it, not the base url.
For example, if the base url is http://www.some.com, when crawler follows the relative url "?pageno=3" in page http://www.some.com/sample, the DefaultNormalizer will return http://www.some.com?pageno=3,but not http://www.some.com/sample?pageno=3, which it should be.
I solved this problem by change the interface method signature from LinkNormalizer#String normalize( final String relativeUrl) to String normalize(final String urlToCrawl, final String relativeUrl), and in PageCrawlerExecutor#run() invoke normalizer.normalize(urlToCrawl.link(), l);

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions