uses a few different commands in this specific instance to target a phrase that is repeated again on the same line. i was not going for accuracy, but for speed. there is probably a lot that can be eliminated from this, including the initial cat
command, which could possibly be replaced by input redirection.
#!/bin/bash
cat tags.txt | cut -d' ' -f2- | grep -o -v '\ +.+' | tr ' ' '-' | grep -o -E '[a-zA-Z0-9\-]+[a-zA-Z0-9]$' | grep -v '^-' | tr '[A-Z]' '[a-z]' | tr '-' ' ' | sort -u > newtags.txt