Full Waveform Inversion (FWI) is an iterative method that allows to determine the subsurface parameters from the observed data at surface and an initial model. The main drawback of a 3D FWI implementation in the time domain is the computational cost because the required memory and the execution time are very expensive. In serial platforms, the main constraint of a 3D FWI implementation is the execution time. On the other hand, in Parallel platforms such as GPUs the main constraint is the available memory resources. In this paper, we designed and implemented a strategy that takes advantage of both platforms, serial and parallel, using MPI and CUDA-C to resolve both problems. The new implementation has a speedup factor of 1.84x and a 76% of reduction of the required memory. This methodology makes feasible the 3D FWI implementation using a GPU cluster.