Video recording of construction operations provides an understandable data that could be used to analyze and improve construction performance. Despite the benefits, manual stopwatch study of previously recorded videos can be labor-intensive, may suffer from biases of the observers, and impractical after substantial period of observations. To address these limitations, this paper presents a new vision-based method for automated action recognition of construction equipment from different camera viewpoints. This is particularly a challenging task as construction equipment can be partially occluded and they usually come in wide variety of sizes and appearances. The scale and pose of the equipment action can also significantly vary based on the camera configurations. In the proposed method, first a video is represented as a collection of spatio-temporal features by extracting space-time interest points and describing each feature with a histogram of oriented gradients (HOG). The algorithm automatically learns the probability distributions of the spatio-temporal features and action categories using a multiple binary Support Vector Machine (SVM) classifier. This strategy handles noisy feature points arisen from typical dynamic backgrounds. Given a novel video sequence, the multiple binary SVM classifier recognizes and localizes multiple equipment actions in long and dynamic video sequences containing multiple equipment actions. We have exhaustively tested our algorithm on 1,200 videos from earthmoving operations. Results with average accuracy of 85% across all categories of equipment actions reflect the promise of the proposed method for automated performance monitoring.