improve report-status.sh

Bug #667013 reported by siznax
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Archive Widecrawl
Fix Committed
Low
Kenji Nagahashi

Bug Description

had intended the bash to be a working model, to be re-written in python.

either 1) apply Kenji's bash suggestions, or 2) re-write in python.

Tags: telemetry
Revision history for this message
siznax (siznax) wrote :
Download full text (5.4 KiB)

On 10/25/10 2:04 PM, Kenji Nagahashi wrote:
> Hi Steve,
>
> Reading report-status*.sh, I came up with some ideas to simplify the code.
> As it is easier to present actual code, I attached a patch for those files.
>
> Changes also include adding "--max-time 30" option to curl invocation,
> as it seems reading job status page sooo long when it is actively crawling.
>
> Hope you find this useful.
> --Kenji

diff --git a/report-status-cluster.sh b/report-status-cluster.sh
index 7b7e339..d7a8e01 100755
--- a/report-status-cluster.sh
+++ b/report-status-cluster.sh
@@ -2,18 +2,14 @@
 # siznax 2010

 cluster=/home/steve/crawling/live/cluster.txt
-status=/home/steve/crawling/live/report-status.sh
-nodes=`cut -d ' ' -f 1 $cluster`
+status=`dirname $0`/report-status.sh

-self=`echo $0 | tr '/' ' ' | awk '{print $NF}'`
+self=`basename $0`

 echo $self `date`

-for node in $nodes
-do
- host=`grep $node $cluster | awk '{print $1}'`
- port=`grep $node $cluster | awk '{print $2}'`
- job=` grep $node $cluster | awk '{print $3}'`
- auth=`grep $node $cluster | awk '{print $4}'`
- $status $host $port $job $auth 0
-done
+exec 3<$cluster
+while read -u 3 host port job auth; do
+ $status $host $port $job $auth 0
+done
+exec 3<&-
diff --git a/report-status.sh b/report-status.sh
index ec978e8..0470905 100755
--- a/report-status.sh
+++ b/report-status.sh
@@ -17,6 +17,8 @@ version="r0"
 LC_ALL=en_US.UTF-8 # for thousands separator
 hs_errs=/home/steve/crawling/live/hs_errs.txt

+verbose=false
+
 if [ $# -lt 5 ]
 then
     echo "Usage: $script host port job auth verbose"
@@ -26,19 +28,19 @@ else
     port=$2
     job=$3
     auth=$4
- verbose=$5
+ [ $5 == 0 ] || verbose=true
 fi

-tags=('<statusDescription>'\
- '<totalUriCount>'\
- '<downloadedUriCount>'\
- '<novel>'\
- '<currentDocsPerSecond>'\
- '<averageDocsPerSecond>'\
- '<currentKiBPerSec>'\
- '<averageKiBPerSec>'\
- '<elapsedPretty>'\
- '<launchCount>')
+tags=('statusDescription'\
+ 'totalUriCount'\
+ 'downloadedUriCount'\
+ 'novel'\
+ 'currentDocsPerSecond'\
+ 'averageDocsPerSecond'\
+ 'currentKiBPerSec'\
+ 'averageKiBPerSec'\
+ 'elapsedPretty'\
+ 'launchCount')
 keys=(status\
       total\
       downloaded\
@@ -53,18 +55,20 @@ keys=(status\
 #---------------------------------------------------------------

 function get_tag_value {
- cmd="grep '$tag' $tmpfile\
- | tr '<' ' '\
- | tr '>' ' '\
- | tr -s ' '\
- | cut -d ' ' -f 3"
- value=`eval $cmd`
+ value=`sed -ne "\%.*<$tag>\(.*\)</$tag>.*%{s//\1/p;q}" $tmpfile`
+# cmd="grep '$tag' $tmpfile | tr -s '<>' ' '
+# | tr '<' ' '\
+# | tr '>' ' '\
+# | tr -s ' '\
+# | cut -d ' ' -f 3"
+# value=`eval $cmd`
 }
 function get_status {
- value=`grep statusDescription $tmpfile\
- | grep -o '>[^\<]*'\
- | awk '{print $NF}'\
- | head -1`
+ value=`sed -ne '/.*<statusDescription>.*: \([^<]*\).*/{s//\1/p;q}' $tmpfile`
+# value=`grep statusDescription $tmpfile\
+# | grep -o '>[^\<]*'\
+# | awk '{print $NF}'\
+# ...

Read more...

Revision history for this message
siznax (siznax) wrote :
Download full text (5.0 KiB)

diff --git a/report-status-cluster.sh b/report-status-cluster.sh
index 7b7e339..d7a8e01 100755
--- a/report-status-cluster.sh
+++ b/report-status-cluster.sh
@@ -2,18 +2,14 @@
 # siznax 2010

 cluster=/home/steve/crawling/live/cluster.txt
-status=/home/steve/crawling/live/report-status.sh
-nodes=`cut -d ' ' -f 1 $cluster`
+status=`dirname $0`/report-status.sh

-self=`echo $0 | tr '/' ' ' | awk '{print $NF}'`
+self=`basename $0`

 echo $self `date`

-for node in $nodes
-do
- host=`grep $node $cluster | awk '{print $1}'`
- port=`grep $node $cluster | awk '{print $2}'`
- job=` grep $node $cluster | awk '{print $3}'`
- auth=`grep $node $cluster | awk '{print $4}'`
- $status $host $port $job $auth 0
-done
+exec 3<$cluster
+while read -u 3 host port job auth; do
+ $status $host $port $job $auth 0
+done
+exec 3<&-
diff --git a/report-status.sh b/report-status.sh
index ec978e8..0470905 100755
--- a/report-status.sh
+++ b/report-status.sh
@@ -17,6 +17,8 @@ version="r0"
 LC_ALL=en_US.UTF-8 # for thousands separator
 hs_errs=/home/steve/crawling/live/hs_errs.txt

+verbose=false
+
 if [ $# -lt 5 ]
 then
     echo "Usage: $script host port job auth verbose"
@@ -26,19 +28,19 @@ else
     port=$2
     job=$3
     auth=$4
- verbose=$5
+ [ $5 == 0 ] || verbose=true
 fi

-tags=('<statusDescription>'\
- '<totalUriCount>'\
- '<downloadedUriCount>'\
- '<novel>'\
- '<currentDocsPerSecond>'\
- '<averageDocsPerSecond>'\
- '<currentKiBPerSec>'\
- '<averageKiBPerSec>'\
- '<elapsedPretty>'\
- '<launchCount>')
+tags=('statusDescription'\
+ 'totalUriCount'\
+ 'downloadedUriCount'\
+ 'novel'\
+ 'currentDocsPerSecond'\
+ 'averageDocsPerSecond'\
+ 'currentKiBPerSec'\
+ 'averageKiBPerSec'\
+ 'elapsedPretty'\
+ 'launchCount')
 keys=(status\
       total\
       downloaded\
@@ -53,18 +55,20 @@ keys=(status\
 #---------------------------------------------------------------

 function get_tag_value {
- cmd="grep '$tag' $tmpfile\
- | tr '<' ' '\
- | tr '>' ' '\
- | tr -s ' '\
- | cut -d ' ' -f 3"
- value=`eval $cmd`
+ value=`sed -ne "\%.*<$tag>\(.*\)</$tag>.*%{s//\1/p;q}" $tmpfile`
+# cmd="grep '$tag' $tmpfile | tr -s '<>' ' '
+# | tr '<' ' '\
+# | tr '>' ' '\
+# | tr -s ' '\
+# | cut -d ' ' -f 3"
+# value=`eval $cmd`
 }
 function get_status {
- value=`grep statusDescription $tmpfile\
- | grep -o '>[^\<]*'\
- | awk '{print $NF}'\
- | head -1`
+ value=`sed -ne '/.*<statusDescription>.*: \([^<]*\).*/{s//\1/p;q}' $tmpfile`
+# value=`grep statusDescription $tmpfile\
+# | grep -o '>[^\<]*'\
+# | awk '{print $NF}'\
+# | head -1`
 }

 # -- COUNT CRASHES ---------------------------------------------
@@ -80,7 +84,7 @@ fi
 date_str=`date "+%Y-%m-%dT%TZ"`
 h=`echo $host | cut -d '.' -f 1`
 url="https://${host}:${port}/engine/job/${job}"
-copts="-kLs -u $auth --anyauth -H 'Accept: application/xml' --location"
+copts="-kLs -u $auth --anyauth -H 'Accept: application/xml' --location --max-time 20"
 tmpfile=/var/tmp/status.$$
 c...

Read more...

Revision history for this message
siznax (siznax) wrote :

hm, looks like attaching a patch just puts it in another comment. sorry.

Revision history for this message
siznax (siznax) wrote :

may want to move this entirely into cacti if possible

Revision history for this message
Kenji Nagahashi (knagahashi) wrote :

re-implemented report-status.sh in python: report-status.py
as of Jan 3, 2011, report-status-cluster.sh runs report-status.py instead of report-status.sh

siznax (siznax)
Changed in archivewidecrawl:
assignee: nobody → Kenji Nagahashi (knagahashi)
importance: Undecided → Low
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.