After minths of putzing around with toy code and simple example scripts, I decided to get started on some real code I will actually use. Extending my file name filter collection, but in Python (instead of Perl).
Python, I see has a few gotchas, like being finicky about the backslash and requiring extras where I am not used to. \. doesnt seem to work to escape a dot! Need \\. And '.'+foo seemd to want to produce '.(space)/foo' for file operations.
One potential issue here is that these scripts are disastrous if used in the wrong directories.
In Perl I can force the scripts to run at a minimum of 3-4 levels deep in the directory structure. I dont know how to do this in Python (or how to count the bloody backslashes!). I can limit them to a specific directory, but that is not practical here, as these are run from a bunch of directories.
Is re the best library for regexes? In normal practice I often use them in place of split, but I dont see a simple way of doing that with re.sub.
Any pointers on fixing, cleaning up the code would be appreciated, as well as non-trivila links to doing some heavy lifting with real world regexes would be greatly appreciated. The basic tutorials are often so simplistic as to be confusing.
Until I get up to speed, I will be over-commenting with my scripts. A screwup can and has caused the loss of hundreds of files here.
I am over the 'hump' now. Python has passed the *acid test* and I do love the simplicity of file and directory access here. And parsing arrays like strings!
#!c:\python38\python38.exe
"""
This is designed to be a general filter for renaming files.
All lines in FILTER section are meant to be replaceable for the situation at hand.
DANGER! DANGER!!! WILL ROBINSON!!!!!
This script is designed to be run ONLY in an isolated directory with select files.
It *WILL* likely kill a system directory.
"""
import os
import pathlib
import time
import re
from pathlib import Path
path = '.'
for file in os.listdir(path):
dir = [os.path.join(path, file)] # Directory Listing
for filename in dir: # Each file - must be in list format else will parse as chars
newname = filename # Keep original name
if Path(newname).is_file(): # Only if file
if re.match(".py\Z",newname) : break #dont do .py files
base = os.path.splitext(newname) # split name[0] and extension[1]
extension = base[1] #extension
if extension == '.py' : break #dont do .py files
newfile = base[0] #file name. Remove .extensions for now.
###################################### FILTERS
newfile = re.sub("\\-", " ", newfile ) #substitutions here, with mutation
newfile = re.sub(r'\d\d\d\d\d+', r' ',newfile) # Kill numbers greater than date
newfile = re.sub(r'_', r' ',newfile)
newfile = re.sub("(?!^)\\.", r" ",newfile) # Kill dots. Lookbehind to make sure not to delete dots at start of file.
newfile = re.sub("\s+"," ",newfile) # Kill extra spaces
###################################################################################################
newfilext = newfile+extension
# Inelegant way to make sure equal strings match as newfilext cannot equal filename with .(space)\ at start
a = newfilext
a = re.sub(' ','', newfilext )
a.strip()
b = re.sub(' ','', filename )
b.strip()
# print(a, b) # For testing
if a != b : print(a,b)
# print(newfilext+filename) # For testing
if a != b : # dont overwrite existing
print(f"{filename} is being moved to {newfilext}" )
os.rename(filename,newfilext) # rename old to new
# time.sleep(3) # For testing The code works, albeit with some rough edges that need to be ironed out as this expands to a couple of hundred lines. Python, I see has a few gotchas, like being finicky about the backslash and requiring extras where I am not used to. \. doesnt seem to work to escape a dot! Need \\. And '.'+foo seemd to want to produce '.(space)/foo' for file operations.
One potential issue here is that these scripts are disastrous if used in the wrong directories.
In Perl I can force the scripts to run at a minimum of 3-4 levels deep in the directory structure. I dont know how to do this in Python (or how to count the bloody backslashes!). I can limit them to a specific directory, but that is not practical here, as these are run from a bunch of directories.
Is re the best library for regexes? In normal practice I often use them in place of split, but I dont see a simple way of doing that with re.sub.
Any pointers on fixing, cleaning up the code would be appreciated, as well as non-trivila links to doing some heavy lifting with real world regexes would be greatly appreciated. The basic tutorials are often so simplistic as to be confusing.
Until I get up to speed, I will be over-commenting with my scripts. A screwup can and has caused the loss of hundreds of files here.
I am over the 'hump' now. Python has passed the *acid test* and I do love the simplicity of file and directory access here. And parsing arrays like strings!
