128 lines
5.7 KiB
Plaintext
128 lines
5.7 KiB
Plaintext
This README file provides a comprehensive guide to utilizing a Python script for interacting with S3 storage,
|
|
specifically designed for downloading and processing data files based on a specified time range and key parameters.
|
|
The script requires Python3 installed on your system and makes use of the s3cmd tool for accessing data in cloud storage.
|
|
It also illustrates the process of configuring s3cmd by creating a .s3cfg file with your access credentials.
|
|
|
|
|
|
############ Create the .s3cfg file in home directory ################
|
|
|
|
nano .s3cfg
|
|
|
|
Copy this lines inside the file.
|
|
|
|
[default]
|
|
host_base = sos-ch-dk-2.exo.io
|
|
host_bucket = %(bucket)s.sos-ch-dk-2.exo.io
|
|
access_key = EXO4d838d1360ba9fb7d51648b0
|
|
secret_key = _bmrp6ewWAvNwdAQoeJuC-9y02Lsx7NV6zD-WjljzCU
|
|
use_https = True
|
|
|
|
|
|
############ S3cmd instalation ################
|
|
|
|
Please install s3cmd for retrieving data from our Cloud storage.
|
|
|
|
sudo apt install s3cmd
|
|
|
|
############ Python3 instalation ################
|
|
|
|
To check if you have already have python3, run this command
|
|
|
|
python3 --version
|
|
|
|
|
|
To install you can use this command:
|
|
|
|
1) sudo apt update
|
|
|
|
2) sudo apt install python3
|
|
|
|
3) python3 --version (to check if pyhton3 installed correctly)
|
|
|
|
|
|
############ Run extractRange.py ################
|
|
|
|
usage: extractRange.py [-h] --key KEY --bucket-number BUCKET_NUMBER start_timestamp end_timestamp
|
|
|
|
KEY: the key can be a one word or a path
|
|
|
|
for example: /DcDc/Devices/2/Status/Dc/Battery/voltage ==> this will provide us a Dc battery Voltage of the DcDc device 2.
|
|
example : Dc/Battery/voltage ==> This will provide all DcDc Device voltage (including the avg voltage of all DcDc device)
|
|
example : voltage ==> This will provide all voltage of all devices in the Salimax
|
|
|
|
BUCKET_NUMBER: This a number of bucket name for the instalation
|
|
|
|
List of bucket number/ instalation:
|
|
1: Prototype
|
|
2: Marti Technik (Bern)
|
|
3: Schreinerei Schönthal (Thun)
|
|
4: Wittmann Kottingbrunn
|
|
5: Biohof Gubelmann (Walde)
|
|
6: Steakhouse Mettmenstetten
|
|
7: Andreas Ballif / Lerchenhof
|
|
8: Weidmann Oberwil (ZG)
|
|
9: Christian Huber (EBS Elektrotechnik)
|
|
|
|
|
|
start_timestamp end_timestamp: this must be a correct timestamp of 10 digits.
|
|
The start_timestamp must be smaller than the end_timestamp.
|
|
|
|
PS: The data will be downloaded to a folder named S3cmdData_{Bucket_Number}. If this folder does not exist, it will be created.
|
|
If the folder exist, it will try to download data if there is no files in the folder.
|
|
If the folder exist and contains at least one file, it will only data extraction.
|
|
|
|
Example command:
|
|
|
|
python3 extractRange.py 1707087500 1707091260 --key ActivePowerImportT2 --bucket-number 1
|
|
|
|
|
|
################################ EXTENDED FEATURES FOR MORE ADVANCED USAGE ################################
|
|
|
|
1) Multiple Keys Support:
|
|
|
|
The script supports the extraction of data using multiple keys. Users can specify one or multiple keys separated by commas with the --keys parameter.
|
|
This feature allows for more granular data extraction, catering to diverse data analysis requirements. For example, users can extract data for different
|
|
metrics or parameters from the same or different CSV files within the specified range.
|
|
|
|
2) Exact Match for Keys:
|
|
|
|
With the --exact_match flag, the script offers an option to enforce exact matching of keys. This means that only the rows containing a key that exactly
|
|
matches the specified key(s) will be considered during the data extraction process. This option enhances the precision of the data extraction, making it
|
|
particularly useful when dealing with CSV files that contain similar but distinct keys.
|
|
|
|
3) Dynamic Header Generation:
|
|
|
|
The script dynamically generates headers for the output CSV file based on the keys provided. This ensures that the output file accurately reflects the
|
|
extracted data, providing a clear and understandable format for subsequent analysis. The headers correspond to the keys used for data extraction, making
|
|
it easy to identify and analyze the extracted data.
|
|
|
|
4)Advanced Data Processing Capabilities:
|
|
|
|
i) Booleans as Numbers: The --booleans_as_numbers flag allows users to convert boolean values (True/False) into numeric representations (1/0). This feature
|
|
is particularly useful for analytical tasks that require numerical data processing.
|
|
|
|
ii) Sampling Stepsize: The --sampling_stepsize parameter enables users to define the granularity of the time range for data extraction. By specifying the number
|
|
of 2-second intervals, users can adjust the sampling interval, allowing for flexible data retrieval based on time.
|
|
|
|
Example Command:
|
|
|
|
python3 extractRange.py 1707087500 1707091260 --keys ActivePowerImportT2,Soc --bucket-number 1 --exact_match --booleans_as_numbers
|
|
|
|
|
|
This command extracts data for ActivePowerImportT2 and TotalEnergy keys from bucket number 1, between the specified timestamps, with exact
|
|
matching of keys and boolean values converted to numbers.
|
|
|
|
Visualization and Data Analysis:
|
|
|
|
After data extraction, the script facilitates data analysis by:
|
|
|
|
i) Providing a visualization function to plot the extracted data. Users can modify this function to suit their specific analysis needs, adjusting
|
|
plot labels, titles, and other matplotlib parameters.
|
|
|
|
ii) Saving the extracted data in a CSV file, with dynamically generated headers based on the specified keys. This file can be used for further
|
|
analysis or imported into data analysis tools.
|
|
|
|
This Python script streamlines the process of data retrieval from S3 storage, offering flexible and powerful options for data extraction, visualization,
|
|
and analysis. Its support for multiple keys, exact match filtering, and advanced processing capabilities make it a valuable tool for data analysts and
|
|
researchers working with time-series data or any dataset stored in S3 buckets.
|