The Prodos log processor

The prodos_log_processor.py is a python script which is part of the SiteController software. It processes downloaded memodata (log-) files from the Prodos webserver in a specific way.  The log processor must be launched by the JobProcessor of the SiteController. Otherwise environmental settings would be missing.

Once started the log processor scans a directory for downloaded memodata files. These memodata files are zipped and contain a bunch of other log files. In normal case there should be one (zipped) memodata file. If there are more, they all would be processed in a loop. Within this loop following processing steps are done

  1. uncompress the (zipped) memodata file
  2. select the file(s), which filenames match specific pattern
  3. store the selected files in a destination directory
  4. remove the downloaded file
  5. purging the destination directory (keep a maximum amount of stored files in the destination directory - delete the oldest files, which exceed the maximum amount)

The prodos_log_processor.py script terminates after these steps (or the loop of these steps if there were more downloaded files).

Parameters of the log processor script

The log processor accepts different parameters, they all have a default value if not specified.

parameterdescriptiondefault
-s , --srcThe directory to search for download files./opt/azeti/SiteController/tmp
-p , --patternSearch pattern to identify the download files.dl-*.zip
-t , --targetSearch pattern to identify the files to extract from the zipped download file*_MDH3_*.txt
-d , --dstDestination directory to store the extracted files./home/azeti
-k , --keepThe maximum number of extracted files to keep in the destination directory. All files, which filenames match the target search pattern are noted300

Search pattern

The search pattern are in Unix shell style

*matches everything
?matches any single character
[seq]matches any character in seq
[!seq]matches any char not in seq

Configuration of the prodos log processor

The prodos_log_processor.py script requires the python environment of the SiteController. It is also designed to be launched as a job during the execution of a SiteController action. That's why it is configured within the SiteController.cfg in the section [remote_exec_calls]:

Example snippet of the SiteController.cfg
...
[remote_exec_calls]
# provide key=value pairs to define remote commands that could be executed on
# this system via a job
...
process_memodata=/opt/azeti/SiteController/src/prodos_log_processor.py --src=/opt/azeti/SiteController/tmp --pattern=dl-*.zip --target=*_MDH3_*.txt --dst=/home/azeti --keep=300
...

All parameters in the example snippet above are at the default state, a configuration like this would have the same result:

[remote_exec_calls]
process_memodata=/opt/azeti/SiteController/src/prodos_log_processor.py

Debug information about the prodos log processor

Because the prodos_log_processor.py is started by the JobProcessor this can be observed in the log file of this module.

/opt/azeti/SiteController/log/JobProcessor.log
...
2019-04-10 03:11:47,694:7604:[JobProcessor.py:227]:INFO:-------- Job Started (mqtt) ---------
2019-04-10 03:11:47,695:7604:[JobProcessor.py:228]:DEBUG:Received job from mqtt - topic: cloud/AluPress_ProDos_1/jobs/remote_exec
2019-04-10 03:11:47,696:7604:[JobProcessor.py:364]:DEBUG:HandleSimpleJob()
2019-04-10 03:11:47,697:7604:[JobProcessor.py:293]:INFO:job is a remote_exec call "process_memodata"
2019-04-10 03:11:47,699:7604:[JobProcessor.py:307]:DEBUG:cmd: ['/opt/azeti/SiteController/src/prodos_log_processor.py', '--src=/opt/azeti/SiteController/tmp', '--pattern=dl-*.zip', '--target=*_MDH3_*.txt', '--dst=/home/azeti', '--keep=20']
2019-04-10 03:11:48,031:7604:[JobProcessor.py:56]:DEBUG:process 8247 finished with status 0
2019-04-10 03:11:48,033:7604:[JobProcessor.py:318]:DEBUG:output: successfully executed
...


The prodos_log_processor.py script itself has also an own log file.

/opt/azeti/SiteController/log/prodos_log_processor.log
...
2019-04-16 11:12:48,015:23478:[prodos_log_processor.py:157]:DEBUG:------------------------------------------
2019-04-16 11:12:48,016:23478:[prodos_log_processor.py:158]:DEBUG:Process started
2019-04-16 11:12:48,016:23478:[prodos_log_processor.py:159]:DEBUG:Source directory to process: /opt/azeti/SiteController/tmp
2019-04-16 11:12:48,017:23478:[prodos_log_processor.py:160]:DEBUG:Source file pattern to process: dl-*.zip
2019-04-16 11:12:48,017:23478:[prodos_log_processor.py:161]:DEBUG:files to keep: 20
2019-04-16 11:12:48,018:23478:[prodos_log_processor.py:107]:DEBUG:About to process dl-rest_test-2019-04-16T11:12:47.435Z.zip
2019-04-16 11:12:48,020:23478:[prodos_log_processor.py:81]:DEBUG:['12136_650_MDH3_2019-04-16_12-24-18.txt']
2019-04-16 11:12:48,024:23478:[prodos_log_processor.py:116]:DEBUG:processed dl-rest_test-2019-04-16T11:12:47.435Z.zip
2019-04-16 11:12:48,024:23478:[prodos_log_processor.py:118]:DEBUG:files to keep: 20
2019-04-16 11:12:48,025:23478:[prodos_log_processor.py:33]:DEBUG:21 files with pattern *_MDH3_*.txt in /home/azeti
2019-04-16 11:12:48,026:23478:[prodos_log_processor.py:55]:DEBUG:removed /home/azeti/12136_650_MDH3_2019-04-16_11-23-12.txt
...
2019-04-16 12:34:41,092:28021:[prodos_log_processor.py:157]:DEBUG:------------------------------------------
2019-04-16 12:34:41,093:28021:[prodos_log_processor.py:158]:DEBUG:Process started
2019-04-16 12:34:41,093:28021:[prodos_log_processor.py:159]:DEBUG:Source directory to process: /opt/azeti/SiteController/tmp
2019-04-16 12:34:41,093:28021:[prodos_log_processor.py:160]:DEBUG:Source file pattern to process: dl-*.zip
2019-04-16 12:34:41,094:28021:[prodos_log_processor.py:161]:DEBUG:files to keep: 20
2019-04-16 12:34:41,095:28021:[prodos_log_processor.py:170]:WARNING:0 files to process, should be one!
2019-04-16 14:35:20,655:1613:[prodos_log_processor.py:157]:DEBUG:------------------------------------------
...

In the second block of the log the processor was launched without a download file.

The ram disk behavior of the temp directory

In practice a download of ~250KB data takes place approximately every minute and the download gets processed and deleted in less a second. To not strain the storage drive and to limit the usage of the storage drive in case of misconfiguration the standard temp directory of the SiteController, /opt/azeti/SiteController/tmp, is configured as a 'ram disk' with a maximum space of 256MB. On each SiteController start/stop/restart the temp directory and its content gets destroyed. To switch off this ram disk behavior the SiteController.cfg needs an entry in the section [SiteController.conf]:

Example snippet of the SiteController.cfg
...
[SiteController.conf]
...
ramdisk_size=0
...