Matúš Svrček - Hadoop Nutch

    Table of contents
    No headers

    zadanie: rozbehat nutch nad hadoop klastrom

     

    hotove:

    • 2x gentoo pod VMware
    • prepojenie cez vpn
    • sun-jdk, ant, ine zavislosti
    • nainstalovany tomcat
    • build nutch
      • problemy pri builde: bolo treba vytvorit v build adresari subor konciaci na '.template', bug/feature v build.xml
      • 'ant package' potom spravi co ma, inak skonci s chybou
    • deploynuty nutch
    • ssh public key autent.
    • staticke zaznamy v hosts
    • instalacia nutch
    • nutch bezi na 1 node
    • nutch bezi na klastri
    • hadoop bezi na 1 node aj na klastri
    • vyhladavanie ide cez index skopirovany z dfs lokalne

     

    preco:

    • odkial sa zobral nazov secondarynamenode - vyriesene

     

    hinty:

    • pozri
      • hadoop1:50030 - manager
      • hadoop1:50060 - stav nody
      • hadoop1:50070 - dfs

     

    zdroje:

    http://wiki.apache.org/nutch/NutchHadoopTutorial

    http://www.mail-archive.com/nutch-co.../msg01951.html

    http://lucene.apache.org/nutch/tutorial.html

    http://www-scf.usc.edu/%7Ecsci572/20...tallation.html

     novsia verzia:

    http://wiki.apache.org/nutch/Nutch0....t=%28hadoop%29|(tutorial)

    index v ramfs:

    http://www.mail-archive.com/nutch-us.../msg10088.html

    dalsie howto:

    http://mail-archives.apache.org/mod_....apache.org%3E

    par otazok k hadoopu

    http://www.nabble.com/Nutch-and-Hado...d15136744.html

    problem:

    http://markmail.org/message/f4ht6h3b...+state:results

     

     

    chyba v search.jsp

    riadok 172 je <jsp:include page="<%= language + "/include/header.html"%>"/>

    a ma byt <jsp:include page="<%= language + \"/include/header.html\"%>"/>

    taka ista chyba je aj v ostatnych .jsp, podla jsp specky to ma byt backslashovane

    na zaciatku treba sformatovat dfs na vsetkych nodach, nie iba na master ako tvrdi manual, inak sa zda ze to bezi a po istom case (par hodin) to hodi chybu a pristupe k fs

    vysoky load serverov - bug/feature? alebo problem vm? (crawl s hlbkou 4 -> na vm s 1 cpu load cez 24 !), na 2 cpu okolo 8-9

    velka pamatova narocnost na master node - vela java procesov

    Tag page (Edit tags)
    • No tags
    You must login to post a comment.
    Powered by MindTouch Core