2017-07-24 3 views
0

Ich möchte eine lokale HTML - Datei mit JSOUP für Links analysieren. Aber es funktioniert immer noch nicht. Der Kodex ist:HTML für Links analysieren

public static Set<String> getAllLinksFromPage(String file) throws IOException{ 
    final Set<String> result = new HashSet<String>(); 
    File input = new File(file); 

    Document doc = Jsoup.parse(file); 

    Elements links = doc.select("a[href]"); 
    for(Element link : links) { 
     result.add(links.attr("abs:href")); 
    } 

    return result; 

} 

und der Ausgang ist: []

Also, was ist das Problem?

Antwort

0

Sie haben ein paar Fehler im Code, den Sie eingefügt haben:

  1. Document doc = Jsoup.parse(file) Sie können Jsoup.parse(String html) Methode hier statt Jsoup.parse(File in, String charset). Sie haben den falschen Variablennamen verwendet - file ist eine Eingabezeichenfolge (Dateiname nehme ich an), während der Verweis auf die Datei mit der Variablen input beibehalten wird. Es sollte Document doc = Jsoup.parse(input, "UTF-8"); tun

  2. Sie haben einen Tippfehler in result.add(links.attr("abs:href")); - Sie Attribut „abs: href“ nehmen aus einer Liste von Links anstelle einer Verbindung, die zur Zeit genommen wird, während über eine links Liste iterieren: result.add(link.attr("abs:href"));

nach der Anwendung werden alle Änderungen Ihre Methode sollte wie folgt aussehen:

public static Set<String> getAllLinksFromPage(String file) throws IOException { 
    final Set<String> result = new HashSet<String>(); 
    File input = new File(file); 

    Document doc = Jsoup.parse(input, "UTF-8"); 

    Elements links = doc.select("a[href]"); 
    for (Element link : links) { 
     result.add(link.attr("abs:href")); 
    } 

    return result; 
} 

ich es w mit HTML-Datei reflektiert diese Seite (ich speicherte es nur /tmp/test.html Datei und verwendet getestet haben iith Ihre Funktion) und das ist das Ergebnis, das ich bekomme:

[, https://www.facebook.com/officialstackoverflow/, https://codegolf.stackexchange.com/questions/135102/formic-functions-ant-queen-of-the-hill-contest, https://www.stackoverflowbusiness.com/advertise?utm_source=so-footer&utm_medium=referral&utm_campaign=brand-activation, https://scifi.stackexchange.com/questions/165505/what-were-mad-eye-and-neville-talking-about-after-the-unforgivable-curses-scene, https://worldbuilding.stackexchange.com/questions/86289/a-believable-place-for-your-secret-lair, https://www.stackoverflowbusiness.com/enterprise?utm_source=so-footer&utm_medium=referral&utm_campaign=brand-activation, https://stackoverflow.blog, http://stackoverflow.com/election, https://mathoverflow.net/questions/277069/what-is-homology-anyway, https://math.stackexchange.com/questions/2369011/what-is-the-best-rest-position-for-two-elevators-in-a-10-story-building, https://data.stackexchange.com, https://stackexchange.com/users/2524916/?tab=accounts, https://worldbuilding.stackexchange.com/questions/86712/what-kind-of-apocalyptic-event-can-be-predicted-years-before-it-happens, https://stackexchange.com/sites#lifearts, https://stackoverflow.com/company/work-here, https://mechanics.stackexchange.com/questions/46467/how-dangerous-is-it-to-drive-on-country-roads-with-a-potentially-failing-wheel-b, https://stackoverflow.blog/2017/07/21/trends-cloud-computing-uses-aws-uses-azure/, https://physics.stackexchange.com/questions/348279/why-does-spin-arise-in-non-relativistic-quantum-mechanics, https://serverfault.com, https://stackexchange.com/sites#science, https://dba.stackexchange.com, https://www.stackoverflowbusiness.com/?ref=topbar_help, https://meta.stackoverflow.com/questions/352065/introducing-channels-qa-for-engineering-teams, https://stackoverflow.com/company/contact, https://creativecommons.org/licenses/by-sa/3.0/, https://askubuntu.com/questions/938849/how-to-remove-text-after, https://stackexchange.com/legal, https://stackoverflow.com/users/logout, https://stackoverflow.com/company/about, https://unix.stackexchange.com/questions/381282/difference-copy-contents-folder-between-and-in-linux, https://ell.stackexchange.com/questions/137089/i-often-buy-fruits-when-i-go-to-the-supermarket-illogical, https://workplace.stackexchange.com/questions/95543/colleague-shared-pirated-material-is-it-appropriate-for-our-manager-to-make-me, https://www.stackoverflowbusiness.com/talent?utm_source=so-footer&utm_medium=referral&utm_campaign=brand-activation, https://stackoverflow.com, https://stackoverflow.com/help, https://stackexchange.com, https://stackexchange.com/questions?tab=hot, https://chemistry.stackexchange.com/questions/80287/mo-theory-why-do-hydrogen-and-lithium-bond-but-hydrogen-and-helium-dont, https://linkedin.com/company/stack-overflow, https://meta.stackoverflow.com, https://api.stackexchange.com, https://www.facebook.com/sharer.php?u=https%3a%2f%2fstackoverflow.com%2fq%2f45274717%2f2194470%3fsfb%3d2, https://tex.stackexchange.com/questions/382967/using-cmbright-only-as-a-math-font, https://movies.stackexchange.com/questions/77763/was-ras-al-ghul-nearly-immortal-in-the-dark-knight-trilogy, https://stackoverflow.blog?blb=1, https://stackoverflow.com/company/press, https://stackoverflow.com/jobs/directory/developer-jobs, https://worldbuilding.stackexchange.com/questions/86914/what-is-a-logical-explanation-for-why-a-race-of-stone-humanoids-wouldnt-become, https://ell.stackexchange.com/questions/137104/what-does-it-mean-by-a-cheery-lot, https://stackexchange.com/sites#culturerecreation, https://stackoverflow.com/questions/45273948/what-happens-when-setstate-function-is-called, https://twitter.com/stackoverflow, https://stackexchange.com/sites#technology, https://politics.stackexchange.com/questions/22003/is-identity-politics-vs-freedom-of-speech-a-valid-dichotomy, https://stackoverflow.blog/2009/06/25/attribution-required/, https://stackoverflow.com/questions/45268467/is-131-well-defined-in-c-when-sizeofint-4, https://workplace.stackexchange.com/questions/95638/how-can-i-increase-focus-when-doing-something-boring, https://chat.stackoverflow.com, https://travel.stackexchange.com/questions/98642/which-safety-features-of-the-german-autobahn-make-it-possible-to-have-no-speed-l, https://meta.stackoverflow.com/questions/352386/2017-moderator-election-qa-questionnaire, https://academia.stackexchange.com/questions/93408/what-is-the-point-of-being-the-head-of-the-department, https://www.stackoverflowbusiness.com/?utm_source=so-footer&utm_medium=referral&utm_campaign=brand-activation, mailto:?subject=Stack%20Overflow%20Question&body=Parse%20HTML%20for%20Links%0Ahttps%3a%2f%2fstackoverflow.com%2fq%2f45274717%2f2194470%3fsem%3d2, https://stackexchange.com/legal/privacy-policy, https://stackexchange.com/sites, https://www.stackoverflowbusiness.com/insights?utm_source=so-footer&utm_medium=referral&utm_campaign=brand-activation, https://twitter.com/share?url=https%3a%2f%2fstackoverflow.com%2fq%2f45274717%2f2194470%3fstw%3d2, https://plus.google.com/share?url=https%3a%2f%2fstackoverflow.com%2fq%2f45274717%2f2194470%3fsgp%3d2, https://mathoverflow.net/questions/277113/polynomials-leaving-invariant-the-gaussian-integers] 
+0

Hallo, vielen Dank für Ihre Hilfe. Es funktioniert :) –

+0

Ich bin froh, dass ich Ihnen helfen konnte :) Fühlen Sie sich frei, meine Antwort im Gegenzug zu akzeptieren. Danke! –