Unsupervised method for video action segmentation through spatio-temporal and positional-encoded embeddings

Add the full text or supplementary notes for the publication here using Markdown formatting.