Archive for February, 2023
MySQL @SQL_MODE
Installing MySQL Workbench 8 on Windows, we discovered that the default configuration no longer sets ONLY_FULL_GROUP_BY as part of the default SQL_MODE parameter value. While I’ve written a stored function to set the SQL_MODE parameter value for a session, some students didn’t understand that such a call is only valid in the scope of a connection to the database server. They felt the function didn’t work because they didn’t understand the difference between connecting to the MySQL CLI and clicking the lightening bolt in MySQL Workbench.
So, here are the instructions to reset the default SQL_MODE parameter value for Windows. You need to edit the setting in the my.ini file, which is in the C:\ProgramData\MySQL\MySQL Server 8.0 directory. The default installation will have the following:
# Set the SQL mode to strict sql-mode="STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION" |
You need to change it to the following in an editor with Administrative privileges:
# Set the SQL mode to strict sql-mode="STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION,ONLY_FULL_GROUP_BY" |
Then, you need to connect to the services by launching services.msc from the command prompt. In the list of services find MYSQL80 service and restart it. You can verify it by connecting to the MySQL 8.0.* server and running the following SQL query:
SELECT @@SQL_MODE: |
That’s how you convert Windows to use only traditional group by behaviors in SQL. As always, I hope this helps those looking for a solution.
AlmaLinux Libraries
I discovered a dependency for MySQL Workbench on AlmaLinux 8 installation. I neglected to fully cover it when I documented the installation in a VM of AlmaLinux 9. I go back later and update that entry but for now you need the following dependencies:
proj-6.3.2-4.el8.x86_64.rpm proj-datumgrid-1.8-6.3.2.4.el8.noarch.rpm proj-devel-6.3.2-4.el8.x86_64.rpm |
Install like this:
sudo dnf install -y *.rpm |
Log file:
Last metadata expiration check: 3:01:53 ago on Fri 10 Feb 2023 03:37:49 AM UTC. Dependencies resolved. ========================================================================================== Package Architecture Version Repository Size ========================================================================================== Installing: proj x86_64 6.3.2-4.el8 @commandline 2.0 M proj-datumgrid noarch 1.8-6.3.2.4.el8 @commandline 5.4 M proj-devel x86_64 6.3.2-4.el8 @commandline 89 k Transaction Summary ========================================================================================== Install 3 Packages Total size: 7.5 M Installed size: 17 M Downloading Packages: Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Installing : proj-datumgrid-1.8-6.3.2.4.el8.noarch 1/3 Installing : proj-6.3.2-4.el8.x86_64 2/3 Installing : proj-devel-6.3.2-4.el8.x86_64 3/3 Running scriptlet: proj-devel-6.3.2-4.el8.x86_64 3/3 Verifying : proj-6.3.2-4.el8.x86_64 1/3 Verifying : proj-datumgrid-1.8-6.3.2.4.el8.noarch 2/3 Verifying : proj-devel-6.3.2-4.el8.x86_64 3/3 Installed: proj-6.3.2-4.el8.x86_64 proj-datumgrid-1.8-6.3.2.4.el8.noarch proj-devel-6.3.2-4.el8.x86_64 Complete! |
A quick update while installing AlmaLinux for AWS.
Data Engineer?
Students often ask me about data engineering. I try to explain some of the aspects, and how the tasks can be organized but I never laid out all the titles. I really like this illustration (click on image for larger size) from the Gartner Group because it does that. You can download the full “What Are the Essential Roles for Data and Analytics” paper here).
An excerpt from Gartner’s paper:
Data Engineer
Data engineering is the practice of making the appropriate data available to various data consumers (including data scientists, data and business analysts, citizen integrators, and line-of-business users). It is a discipline that involves collaboration across business and IT units. This key discipline requires skilled data engineers to support both IT and business teams.
Data engineers are primarily responsible for building, managing and operationalizing data pipelines in support of key D&A use cases. They are also primarily responsible for leading the tedious (and often complex) task of:
- Curating datasets and data pipelines created by nontechnical users (e.g., through self-service data preparation tools), data scientists or even IT resources.
- Operationalizing data delivery for production-level deployments.
I hope the summary is helpful and Gartner’s paper interesting.