Data engineer configuration model used in data analysis work

As a data engineer or analyst, I often work with various types of data, and data acquisition is an unavoidable part of the process. In this post, I’ll share the data connection configuration model I use in my daily work to help others understand how to set up and manage database connections efficiently. **MySQL Database** MySQL is one of the most commonly used relational databases. In my workflow, I mainly focus on reading and writing data, while deletion and update operations are typically handled by developers. Therefore, I didn’t include those functions in my class. To connect to MySQL, I use `MySQLdb` for reading and `sqlalchemy` for writing. Here’s a sample implementation: ```python import MySQLdb import pandas as pd from sqlalchemy import create_engine class Conn_Analyze: """Data Analysis Platform Connection""" def __init__(self, database='myanalyze'): self.database = database self.conn = None def connect(self): self.conn = MySQLdb.connect(host='***', user='root', passwd='***', db=self.database, charset='utf8') def query(self, sql): try: self.connect() data = pd.read_sql(sql, self.conn) except (AttributeError, MySQLdb.OperationalError): self.connect() data = pd.read_sql(sql, self.conn) # Reconnect if there's an error return data def store(self, mydataframe, table_name, if_exists='replace'): conn2 = "mysql+mysqldb://root:***@***:3306/%s" % self.database local_engine = create_engine(conn2) mydataframe.to_sql(table_name, local_engine, if_exists=if_exists, index=False, chunksize=10000) ``` This class connects only when a query is executed. The connection is re-established if it fails. For writing large datasets, the `chunksize` parameter ensures that the data is written in manageable batches. **MongoDB** MongoDB is a NoSQL database that stores unstructured data in JSON-like documents. When querying MongoDB, I usually use `pymongo` along with `pandas` to convert results into DataFrames for easier analysis. ```python import pymongo import pandas as pd class Conn_Mongo: """MongoDB Connection""" def __init__(self): self.mongo_utoken = pymongo.MongoClient('mongodb://***:27000').utoken # User Table def get_user_data_mongo(self, list_id): user_data = pd.DataFrame(list(self.mongo_utoken.userinfo.find({'FToken': {'$in': list(list_id)}}))) return user_data ``` This function retrieves user data based on a list of IDs. It uses the `$in` operator to filter records efficiently. **Flurry API** If your work involves mobile app analytics, Flurry is a popular platform for tracking user behavior. To fetch data from Flurry, you need an API token and construct appropriate URLs. ```python import pandas as pd import json import requests class Conn_Flurry: """Flurry API Data Fetcher""" api_token = "******.****.****" headers = {'Authorization': 'Bearer {}'.format(api_token)} url = "https://api-metrics.flurry.com/public/v1/data/appEvent/day/app?metrics=activeDevices,newDevices,averageTimePerDevice&dateTime=2017-05-23/2017-05-24" def get_results(self, url=url): data = requests.get(url, headers=self.headers) cleaned = json.loads(data.text, 'utf-8') cleaned = pd.DataFrame(cleaned['rows']) return cleaned def get_url(self, table='appEvent', timegrain='day', dimensions='app/event', metrics='occurrences', dateTime='2017-09-23/2017-05-24', filters=""): endpoint = "https://api-metrics.flurry.com/public/v1/data" url = "{}/{}/{}/{}?metrics={}&dateTime={}&filters={}".format(endpoint, table, timegrain, dimensions, metrics, dateTime, filters) return url ``` The `get_url` method helps construct custom API calls based on specific parameters like `table`, `timegrain`, `dimensions`, and `filters`. This allows flexibility in retrieving different types of data from Flurry. In summary, building a reusable connection class for each data source improves code maintainability and reduces redundancy. You can place these classes in a separate configuration file and import them wherever needed. By following such a structure, you ensure that your data workflows are consistent, efficient, and easy to debug. Whether you're working with relational databases, NoSQL systems, or third-party APIs, having a well-defined connection model is key to successful data analysis.

Cold Shrink Tube

Cold shrinkable cable accessories are made of elastomer materials (commonly used silicone rubber and ethylene-propylene rubber) injected and vulcanised in the factory, and then expanded and lined with plastic spiral supports to form various components of cable accessories. Field installation, these pre-expanded parts in the treated cable ends or joints, pull out the internal support of the plastic spiral strip (support), pressed on the cable insulation and the composition of the cable accessories. Because it is at room temperature by elastic retraction force, rather than like heat shrinkable cable accessories to be heated by fire shrinkage, so commonly known as cold shrinkable cable accessories. Early cold shrink cable termination head just additional insulation using silicone rubber cold shrink parts, electric field processing is still using stress cone type or stress band wrapping type.
Universal use of cold shrinkage stress control tube, voltage level from 10kv to 35kv. cold shrink cable joints, 1kv level using cold shrink insulation tube as reinforced insulation, 10kv level with internal and external semi-conductive shielding layer of the joints cold shrink insulation parts. Three-core cable terminal bifurcation using cold shrink branch sleeve.

Cold Shrink Tube,Cold Shrinkable tubing,Cold-shrink tube,Cold shrinkage pipe,shrink tube,Cold Shrink Cable Accessories

Mianyang Dongyao New Material Co. , https://www.mydyxc.com

Posted on