Abstract This work studies the feasibility of using Reinforcement Learning and Policy Based Optimization, on a continuous action space, for straight line path generation of four bar linkages. Based on the path radius of curvature, an optimization method introduces a reward function that is calculated by the inflection circle and the Euler-Savary equation to train policy based agents. Two experiments are defined, which run 200 agents in each setting. Results are compared to demonstrate the process and outcome of using this design method. In experiment one the starting point is fixed at the center of the design space. In experiment two the agents start from a random point in the design space. We conclude that agents trained in both experimental settings are capable of creating linkages that generate quasi-straight line segments (high radii of curvature). The results across 400 distinct agents show that more than 50% of the generated designs are reasonably straight, such that the linkages are suitable in applications for quasi-straight line path generation.