Innovenergy_trunk/firmware/opt/innovenergy/scripts/ExtractS3README.txt

128 lines
5.7 KiB
Plaintext
Raw Normal View History

2024-05-07 12:35:45 +00:00
This README file provides a comprehensive guide to utilizing a Python script for interacting with S3 storage,
specifically designed for downloading and processing data files based on a specified time range and key parameters.
The script requires Python3 installed on your system and makes use of the s3cmd tool for accessing data in cloud storage.
It also illustrates the process of configuring s3cmd by creating a .s3cfg file with your access credentials.
############ Create the .s3cfg file in home directory ################
nano .s3cfg
Copy this lines inside the file.
[default]
host_base = sos-ch-dk-2.exo.io
host_bucket = %(bucket)s.sos-ch-dk-2.exo.io
access_key = EXO4d838d1360ba9fb7d51648b0
secret_key = _bmrp6ewWAvNwdAQoeJuC-9y02Lsx7NV6zD-WjljzCU
use_https = True
############ S3cmd instalation ################
Please install s3cmd for retrieving data from our Cloud storage.
sudo apt install s3cmd
############ Python3 instalation ################
To check if you have already have python3, run this command
python3 --version
To install you can use this command:
1) sudo apt update
2) sudo apt install python3
3) python3 --version (to check if pyhton3 installed correctly)
############ Run extractRange.py ################
usage: extractRange.py [-h] --key KEY --bucket-number BUCKET_NUMBER start_timestamp end_timestamp
KEY: the key can be a one word or a path
for example: /DcDc/Devices/2/Status/Dc/Battery/voltage ==> this will provide us a Dc battery Voltage of the DcDc device 2.
example : Dc/Battery/voltage ==> This will provide all DcDc Device voltage (including the avg voltage of all DcDc device)
example : voltage ==> This will provide all voltage of all devices in the Salimax
BUCKET_NUMBER: This a number of bucket name for the instalation
List of bucket number/ instalation:
1: Prototype
2: Marti Technik (Bern)
3: Schreinerei Schönthal (Thun)
4: Wittmann Kottingbrunn
5: Biohof Gubelmann (Walde)
6: Steakhouse Mettmenstetten
7: Andreas Ballif / Lerchenhof
8: Weidmann Oberwil (ZG)
9: Christian Huber (EBS Elektrotechnik)
start_timestamp end_timestamp: this must be a correct timestamp of 10 digits.
The start_timestamp must be smaller than the end_timestamp.
PS: The data will be downloaded to a folder named S3cmdData_{Bucket_Number}. If this folder does not exist, it will be created.
If the folder exist, it will try to download data if there is no files in the folder.
If the folder exist and contains at least one file, it will only data extraction.
Example command:
python3 extractRange.py 1707087500 1707091260 --key ActivePowerImportT2 --bucket-number 1
################################ EXTENDED FEATURES FOR MORE ADVANCED USAGE ################################
1) Multiple Keys Support:
The script supports the extraction of data using multiple keys. Users can specify one or multiple keys separated by commas with the --keys parameter.
This feature allows for more granular data extraction, catering to diverse data analysis requirements. For example, users can extract data for different
metrics or parameters from the same or different CSV files within the specified range.
2) Exact Match for Keys:
With the --exact_match flag, the script offers an option to enforce exact matching of keys. This means that only the rows containing a key that exactly
matches the specified key(s) will be considered during the data extraction process. This option enhances the precision of the data extraction, making it
particularly useful when dealing with CSV files that contain similar but distinct keys.
3) Dynamic Header Generation:
The script dynamically generates headers for the output CSV file based on the keys provided. This ensures that the output file accurately reflects the
extracted data, providing a clear and understandable format for subsequent analysis. The headers correspond to the keys used for data extraction, making
it easy to identify and analyze the extracted data.
4)Advanced Data Processing Capabilities:
i) Booleans as Numbers: The --booleans_as_numbers flag allows users to convert boolean values (True/False) into numeric representations (1/0). This feature
is particularly useful for analytical tasks that require numerical data processing.
ii) Sampling Stepsize: The --sampling_stepsize parameter enables users to define the granularity of the time range for data extraction. By specifying the number
of 2-second intervals, users can adjust the sampling interval, allowing for flexible data retrieval based on time.
Example Command:
python3 extractRange.py 1707087500 1707091260 --keys ActivePowerImportT2,Soc --bucket-number 1 --exact_match --booleans_as_numbers
This command extracts data for ActivePowerImportT2 and TotalEnergy keys from bucket number 1, between the specified timestamps, with exact
matching of keys and boolean values converted to numbers.
Visualization and Data Analysis:
After data extraction, the script facilitates data analysis by:
i) Providing a visualization function to plot the extracted data. Users can modify this function to suit their specific analysis needs, adjusting
plot labels, titles, and other matplotlib parameters.
ii) Saving the extracted data in a CSV file, with dynamically generated headers based on the specified keys. This file can be used for further
analysis or imported into data analysis tools.
This Python script streamlines the process of data retrieval from S3 storage, offering flexible and powerful options for data extraction, visualization,
and analysis. Its support for multiple keys, exact match filtering, and advanced processing capabilities make it a valuable tool for data analysts and
researchers working with time-series data or any dataset stored in S3 buckets.