I recently had the problem that I was reading a lot of rows from an HBase table and filtered the majority of rows in the first steps of my scalding job. -> The Hadoop counters didn't change and the job timed out after 10min.
Would it be possible to add a counter that counts lines read (or hundrets of lines read) and publishes the values to a hadoop counter to avoid timing out?