Facing the future combat scenario with manned and unmanned aerial vehicle cooperation, real-time and accurate air combat decision-making is the basis of winning. Aiming at the above scenarios, this paper abstracts the characteristic model of single agent, and proposes an algorithm based on proximal policy optimization to obtain the air combat decision sequence by using reward and punishment incentive in the real-time interaction with the environment. The simulation results show that the algorithm proposed in this paper can adapt to the complex battlefield situation and get a reasonable decision-making strategy after training and learning.