git commit -m "SM-17 <test>"

2024-12-04 17:09:06 +01:00 · 2024-12-04 17:09:06 +01:00 · 98fb313a6f
parent f03325a6a2
commit 98fb313a6f
2 changed files with 0 additions and 332 deletions
--- a/firmware/opt/innovenergy/scripts/ExtractS3README.txt
+++ b/firmware/opt/innovenergy/scripts/ExtractS3README.txt
@ -1,127 +0,0 @@
-This README file provides a comprehensive guide to utilizing a Python script for interacting with S3 storage,
-specifically designed for downloading and processing data files based on a specified time range and key parameters.
-The script requires Python3 installed on your system and makes use of the s3cmd tool for accessing data in cloud storage.
-It also illustrates the process of configuring s3cmd by creating a .s3cfg file with your access credentials.
-
-
-############ Create the .s3cfg file in home directory ################
-
-nano .s3cfg
-
-Copy this lines inside the file.
-
-[default]
-host_base = sos-ch-dk-2.exo.io
-host_bucket = %(bucket)s.sos-ch-dk-2.exo.io
-access_key = EXO4d838d1360ba9fb7d51648b0
-secret_key = _bmrp6ewWAvNwdAQoeJuC-9y02Lsx7NV6zD-WjljzCU
-use_https = True
-
-
-############ S3cmd instalation ################
-
-Please install s3cmd for retrieving data from our Cloud storage.
-
-sudo apt install s3cmd
-
-############ Python3 instalation ################
-
-To check if you have already have python3, run this command
-
-    python3 --version
-
-
-To install you can use this command:
-
-1)  sudo apt update
-
-2)  sudo apt install python3
-
-3)  python3 --version (to check if pyhton3 installed correctly)
-
-
-############ Run extractRange.py  ################
-
-usage: extractRange.py [-h] --key KEY --bucket-number BUCKET_NUMBER start_timestamp end_timestamp
-
-KEY: the key can be a one word or a path
-
-    for example: /DcDc/Devices/2/Status/Dc/Battery/voltage   ==> this will provide us a Dc battery Voltage of the DcDc device 2.
-    example : Dc/Battery/voltage ==> This will provide all DcDc Device voltage (including the avg voltage of all DcDc device)
-    example : voltage ==> This will provide all voltage of all devices in the Salimax
-
-BUCKET_NUMBER: This a number of bucket name for the instalation
-
-    List of bucket number/ instalation:
-        1: Prototype
-        2: Marti Technik (Bern)
-        3: Schreinerei Schönthal (Thun)
-        4: Wittmann Kottingbrunn
-        5: Biohof Gubelmann (Walde)
-        6: Steakhouse Mettmenstetten
-        7: Andreas Ballif / Lerchenhof
-        8: Weidmann Oberwil (ZG)
-        9: Christian Huber (EBS Elektrotechnik)
-
-
-start_timestamp end_timestamp: this must be a correct timestamp of 10 digits.
-The start_timestamp must be smaller than the end_timestamp.
-
-PS: The data will be downloaded to a folder named S3cmdData_{Bucket_Number}. If this folder does not exist, it will be created.
-If the folder exist, it will try to download data if there is no files in the folder.
-If the folder exist and contains at least one file, it will only data extraction.
-
-Example command:
-
-python3 extractRange.py 1707087500 1707091260 --key ActivePowerImportT2 --bucket-number 1
-
-
-################################  EXTENDED FEATURES FOR MORE ADVANCED USAGE ################################
-
-1) Multiple Keys Support:
-
-The script supports the extraction of data using multiple keys. Users can specify one or multiple keys separated by commas with the --keys parameter.
-This feature allows for more granular data extraction, catering to diverse data analysis requirements. For example, users can extract data for different
-metrics or parameters from the same or different CSV files within the specified range.
-
-2) Exact Match for Keys:
-
-With the --exact_match flag, the script offers an option to enforce exact matching of keys. This means that only the rows containing a key that exactly
-matches the specified key(s) will be considered during the data extraction process. This option enhances the precision of the data extraction, making it
-particularly useful when dealing with CSV files that contain similar but distinct keys.
-
-3) Dynamic Header Generation:
-
-The script dynamically generates headers for the output CSV file based on the keys provided. This ensures that the output file accurately reflects the
-extracted data, providing a clear and understandable format for subsequent analysis. The headers correspond to the keys used for data extraction, making
-it easy to identify and analyze the extracted data.
-
-4)Advanced Data Processing Capabilities:
-
-i) Booleans as Numbers: The --booleans_as_numbers flag allows users to convert boolean values (True/False) into numeric representations (1/0). This feature
-is particularly useful for analytical tasks that require numerical data processing.
-
-ii) Sampling Stepsize: The --sampling_stepsize parameter enables users to define the granularity of the time range for data extraction. By specifying the number
-of 2-second intervals, users can adjust the sampling interval, allowing for flexible data retrieval based on time.
-
-Example Command:
-
-python3 extractRange.py 1707087500 1707091260 --keys ActivePowerImportT2,Soc --bucket-number 1 --exact_match --booleans_as_numbers
-
-
-This command extracts data for ActivePowerImportT2 and TotalEnergy keys from bucket number 1, between the specified timestamps, with exact
-matching of keys and boolean values converted to numbers.
-
-Visualization and Data Analysis:
-
-After data extraction, the script facilitates data analysis by:
-
-i) Providing a visualization function to plot the extracted data. Users can modify this function to suit their specific analysis needs, adjusting
-plot labels, titles, and other matplotlib parameters.
-
-ii) Saving the extracted data in a CSV file, with dynamically generated headers based on the specified keys. This file can be used for further
-analysis or imported into data analysis tools.
-
-This Python script streamlines the process of data retrieval from S3 storage, offering flexible and powerful options for data extraction, visualization,
-and analysis. Its support for multiple keys, exact match filtering, and advanced processing capabilities make it a valuable tool for data analysts and
-researchers working with time-series data or any dataset stored in S3 buckets.
--- a/firmware/opt/innovenergy/scripts/extractS3data.py
+++ b/firmware/opt/innovenergy/scripts/extractS3data.py
@ -1,205 +0,0 @@
-import os
-import csv
-import subprocess
-import argparse
-import matplotlib.pyplot as plt
-from collections import defaultdict
-import zipfile
-import base64
-import shutil
-
-def extract_timestamp(filename):
-    timestamp_str = filename[:10]
-    try:
-        timestamp = int(timestamp_str)
-        return timestamp
-    except ValueError:
-        return 0
-
-def extract_values_by_key(csv_file, key, exact_match):
-    matched_values = defaultdict(list)
-    with open(csv_file, 'r') as file:
-        reader = csv.reader(file)
-        for row in reader:
-            if row:
-                columns = row[0].split(';')
-                if len(columns) > 1:
-                    first_column = columns[0].strip()
-                    path_key = first_column.split('/')[-1]
-                    for key_item in key:
-                        if exact_match:
-                            if key_item.lower() == row[0].split('/')[-1].split(';')[0].lower():
-                                matched_values[path_key].append(row[0])
-                        else:
-                            if key_item.lower() in first_column.lower():
-                                matched_values[path_key].append(row[0])
-    final_key = ''.join(matched_values.keys())
-    combined_values = []
-    for values in matched_values.values():
-        combined_values.extend(values)
-    final_dict = {final_key: combined_values}
-    return final_dict
-
-def list_files_in_range(start_timestamp, end_timestamp, sampling_stepsize):
-    filenames_in_range = [f"{timestamp:10d}" for timestamp in range(start_timestamp, end_timestamp + 1, 2*sampling_stepsize)]
-    return filenames_in_range
-
-def download_files(bucket_number, filenames_to_download, product_type):
-    if product_type == 0:
-        hash = "3e5b3069-214a-43ee-8d85-57d72000c19d"
-    elif product_type == 1:
-        hash = "c0436b6a-d276-4cd8-9c44-1eae86cf5d0e"
-    else:
-        raise ValueError("Invalid product type option. Use 0 or 1")
-    output_directory = f"S3cmdData_{bucket_number}"
-
-    if not os.path.exists(output_directory):
-        os.makedirs(output_directory)
-        print(f"Directory '{output_directory}' created.")
-
-    for filename in filenames_to_download:
-        stripfilename = filename.strip()
-        local_path = os.path.join(output_directory, stripfilename + ".csv")
-        if not os.path.exists(local_path):
-            s3cmd_command = f"s3cmd get s3://{bucket_number}-{hash}/{stripfilename}.csv {output_directory}/"
-            try:
-                subprocess.run(s3cmd_command, shell=True, check=True)
-                downloaded_files = [file for file in os.listdir(output_directory) if file.startswith(filename)]
-                if not downloaded_files:
-                    print(f"No matching files found for prefix '{filename}'.")
-                else:
-                    print(f"Files with prefix '{filename}' downloaded successfully.")
-            except subprocess.CalledProcessError as e:
-                print(f"Error downloading files: {e}")
-                continue
-        else:
-            print(f"File '{filename}.csv' already exists locally. Skipping download.")
-
-def decompress_file(compressed_file, output_directory):
-    base_name = os.path.splitext(os.path.basename(compressed_file))[0]
-
-    with open(compressed_file, 'rb') as file:
-        compressed_data = file.read()
-
-    # Decode the base64 encoded content
-    decoded_data = base64.b64decode(compressed_data)
-
-    zip_path = os.path.join(output_directory, 'temp.zip')
-    with open(zip_path, 'wb') as zip_file:
-        zip_file.write(decoded_data)
-
-    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
-        zip_ref.extractall(output_directory)
-
-    # Rename the extracted data.csv file to the original timestamp-based name
-    extracted_csv_path = os.path.join(output_directory, 'data.csv')
-    if os.path.exists(extracted_csv_path):
-        new_csv_path = os.path.join(output_directory, f"{base_name}.csv")
-        os.rename(extracted_csv_path, new_csv_path)
-
-    os.remove(zip_path)
-    #os.remove(compressed_file)
-    print(f"Decompressed and renamed '{compressed_file}' to '{new_csv_path}'.")
-
-
-def get_last_component(path):
-    path_without_slashes = path.replace('/', '')
-    return path_without_slashes
-
-def download_and_process_files(bucket_number, start_timestamp, end_timestamp, sampling_stepsize, key, booleans_as_numbers, exact_match, product_type):
-    output_directory = f"S3cmdData_{bucket_number}"
-
-    if os.path.exists(output_directory):
-        shutil.rmtree(output_directory)
-
-    if not os.path.exists(output_directory):
-        os.makedirs(output_directory)
-        print(f"Directory '{output_directory}' created.")
-
-    filenames_to_check = list_files_in_range(start_timestamp, end_timestamp, sampling_stepsize)
-    existing_files = [filename for filename in filenames_to_check if os.path.exists(os.path.join(output_directory, f"{filename}.csv"))]
-    files_to_download = set(filenames_to_check) - set(existing_files)
-
-    if os.listdir(output_directory):
-        print("Files already exist in the local folder. Skipping download.")
-    else:
-        if files_to_download:
-            download_files(bucket_number, files_to_download, product_type)
-
-    # Decompress all downloaded .csv files (which are actually compressed)
-    compressed_files = [os.path.join(output_directory, file) for file in os.listdir(output_directory) if file.endswith('.csv')]
-    for compressed_file in compressed_files:
-        decompress_file(compressed_file, output_directory)
-
-    csv_files = [file for file in os.listdir(output_directory) if file.endswith('.csv')]
-    csv_files.sort(key=extract_timestamp)
-
-
-    keypath = ''
-    for key_item in key:
-        keypath += get_last_component(key_item)
-    output_csv_filename = f"{keypath}_{start_timestamp}_{bucket_number}.csv"
-    with open(output_csv_filename, 'w', newline='') as csvfile:
-        csv_writer = csv.writer(csvfile)
-        header = ['time']
-        add_header = True
-
-        for csv_file in csv_files:
-            file_path = os.path.join(output_directory, csv_file)
-            extracted_values = extract_values_by_key(file_path, key, exact_match)
-            if add_header:
-                add_header = False
-                for values in extracted_values.values():
-                    first_value = values
-                    for first_val in first_value:
-                        header.append(first_val.split(';')[0].strip())
-                    break
-                csv_writer.writerow(header)
-            if extracted_values:
-                for first_column, values in extracted_values.items():
-                    if booleans_as_numbers:
-                        values = [1 if value.split(';')[1].strip() == "True" else 0 if value.split(';')[1].strip() == "False" else value.split(';')[1].strip() for value in values]
-                    values_list = []
-                    values_list.append(csv_file.replace(".csv", ""))
-                    for i, value in enumerate(values):
-                        if value is None:
-                            value = "No value provided"
-                        else:
-                            values_list.append(value.split(';')[1].strip())
-                    csv_writer.writerow(values_list)
-
-    print(f"Extracted data saved in '{output_csv_filename}'.")
-
-def parse_keys(input_string):
-    keys = [key.strip() for key in input_string.split(',')]
-    return keys
-
-def main():
-    parser = argparse.ArgumentParser(description='Download files from S3 using s3cmd and extract specific values from CSV files.')
-    parser.add_argument('start_timestamp', type=int, help='The start timestamp for the range (even number)')
-    parser.add_argument('end_timestamp', type=int, help='The end timestamp for the range (even number)')
-    parser.add_argument('--keys', type=parse_keys, required=True, help='The part to match from each CSV file, can be a single key or a comma-separated list of keys')
-    parser.add_argument('--bucket-number', type=int, required=True, help='The number of the bucket to download from')
-    parser.add_argument('--sampling_stepsize', type=int, required=False, default=1, help='The number of 2sec intervals, which define the length of the sampling interval in S3 file retrieval')
-    parser.add_argument('--booleans_as_numbers', action="store_true", required=False, help='If key used, then booleans are converted to numbers [0/1], if key not used, then booleans maintained as text [False/True]')
-    parser.add_argument('--exact_match', action="store_true", required=False, help='If key used, then key has to match exactly "=", else it is enough that key is found "in" text')
-    parser.add_argument('--product_type', required=True, help='Use 0 for Salimax and 1 for Salidomo')
-
-    args = parser.parse_args()
-    start_timestamp = args.start_timestamp
-    end_timestamp = args.end_timestamp
-    keys = args.keys
-    bucket_number = args.bucket_number
-    sampling_stepsize = args.sampling_stepsize
-    booleans_as_numbers = args.booleans_as_numbers
-    exact_match = args.exact_match
-    # new arg for product type
-    product_type = int(args.product_type)
-
-    if start_timestamp >= end_timestamp:
-        print("Error: start_timestamp must be smaller than end_timestamp.")
-        return
-    download_and_process_files(bucket_number, start_timestamp, end_timestamp, sampling_stepsize, keys, booleans_as_numbers, exact_match, product_type)
-
-if __name__ == "__main__":
-    main()