Python to the Rescue
I’m evaluating whether or not I should move this blog to another CMS platform so I can start building a community around it like it was before.
Right now this blog runs on Hugo and AWS Amplify and it’s freaking awesome. I push new posts to GitHub, AWS pulls the changes and rebuilds the site, and then I can look at make sure it looks fine before I merge into master.
Migrating between CMS platforms is always a tricky and pain-in-the-ass endeavor.
Cloudfront caches my images so load times are wicked fast (an SEO must) and Hugo is a dream to work with. In my humble opinion, it’s the best static CMS out there. Pelican might come in second, but Hugo just rocks and you can use it in a headless way too.
However, it’s static and there’s little I can do to integrate nicely to have things like member profiles or build a BuddyPress type of social networking site.
That said, I could do something like that with a separate technology like phpBB in the future. I could just create a subdomain like social.neuralmarkettreds.com and install a social networking platform there.
No matter what I choose to do (and Hugo is by far outweighing everything I’m looking at), I’ve been writing a lot of python scripts to format my posts in a way that will let me upload them to a dev instance of whatever CMS I choose.
Migrating between CMS platforms is always a tricky and pain-in-the-ass endeavor.
Going from WordPress to WordPress is easy. Going from WordPress to Hugo is easy. Going from WordPress to Ghost is easy but migrating from Hugo or Ghost to WordPress is a bit harder.
Migrating to ExpressionEngine is just nuts but the DataGrab plugin helps a lot (when it works).
The nice thing about Hugo is that I can expose my posts as JSON or RSS (XML format). I can limit the number of posts in my feeds to prevent scrapers from extracting everything from my site.
That’s it, 5 lines of code that I needed to format my exposed posts into a CSV file.
Once I have that JSON format I can do a lot with it, especially in python.
For example, I downloaded my posts in JSON format (saved as index.json) and then write a quick python script to write them out in CSV format.
import json
import pandas as pd
BASE_PATH = "index.json"
df_j = pd.read_json(BASE_PATH) #this reads the JSON file and sets it to the df_j dataframe
df_j.to_csv('export_json.csv')
That’s it, 5 lines of code that I needed to format my exposed posts into a CSV file. I was then able to import my posts into WordPress via a CSV plugin and into DataGrab on Expression Engine.
I use Python for so many munging types of operations. I use it for Data Science and Machine Learning. It’s a wonderful and very flexible language. I would highly recommend anyone to learn this language to make you more productive in an easy and fast way.