Document - Towards a fully-observable Markov decision process with generative models for integrated 6G-non-terrestrial networks

2023

Journal article Open Access

Towards a fully-observable Markov decision process with generative models for integrated 6G-non-terrestrial networks

Machumilane A., Cassara P., Gotta A.

5G mobile communication 6G mobile communication Actor-critic Bandwidth Generative Models (GMs) Markov processes Multipath NTN Reinforcement learning Reliability Satellite Satellite broadcasting Satellites Traffic splitting

The upcoming sixth generation (6G) mobile networks require integration between terrestrial mobile networks and non-terrestrial networks (NTN) such as satellites and high altitude platforms (HAPs) to ensure wide and ubiquitous coverage, high connection density, reliable communications and high data rates. The main challenge in this integration is the requirement for line-of-sight (LOS) communication between the user equipment (UE) and the satellite. In this paper, we propose a framework based on actorcritic reinforcement learning and generative models for LOS estimation and traffic scheduling on multiple links connecting a user equipment to multiple satellites in 6G-NTN integrated networks. The agent learns to estimate the LOS probabilities of the available channels and schedules traffic on appropriate links to minimise end-to-end losses with minimal bandwidth. The learning process is modelled as a partially observable Markov decision process (POMDP), since the agent can only observe the state of the channels it has just accessed. As a result, the learning agent requires a longer convergence time compared to the satellite visibility period at a given satellite elevation angle. To counteract this slow convergence, we use generative models to transform a POMDP into a fully observable Markov decision process (FOMDP). We use generative adversarial networks (GANs) and variational autoencoders (VAEs) to generate synthetic channel states of the channels that are not selected by the agent during the learning process, allowing the agent to have complete knowledge of all channels, including those that are not accessed, thus speeding up the learning process. The simulation results show that our framework enables the agent to converge in a short time and transmit with an optimal policy for most of the satellite visibility period, which significantly reduces end-to-end losses and saves bandwidth. We also show that it is possible to train generative models in real time without requiring prior knowledge of the channel models and without slowing down the learning process or affecting the accuracy of the models.

Source: IEEE open journal of the Communications Society 4 (2023): 1913–1930. doi:10.1109/OJCOMS.2023.3307209

Publisher: IEEE, New York, Stati Uniti d'America

Metrics

Back to previous page

Cite as

BibTeX entry

@article{oai:it.cnr:prodotti:486573,
	title = {Towards a fully-observable Markov decision process with generative models for integrated 6G-non-terrestrial networks},
	author = {Machumilane A. and Cassara P. and Gotta A.},
	publisher = {IEEE, New York, Stati Uniti d'America},
	doi = {10.1109/ojcoms.2023.3307209},
	journal = {IEEE open journal of the Communications Society},
	volume = {4},
	pages = {1913–1930},
	year = {2023}
}