Showing posts with label Data Generation. Show all posts
Showing posts with label Data Generation. Show all posts

Thursday, January 20, 2022

SQL is your friend

Today I had a minor task to implement.  The task was to initialise some new data in a file.

Typically, I would write a program as an EXCEXTFUN and do my processing via standard 2E code.  The problem with little data update programs is that unless you manage them well (i.e. remove them, label them so you can ignore them etc), they start to clutter the model.  On the other hand they are handy to have around in case someone needs to do something similar in the future.

However, unless you are a software house, you probably don't care too much that this 'product field' was initialised 8 years ago by xyz user.

That all said and done, my task was to insert a data record and there were some fields that needed to be initialised that contained the business data, and, there was also an AUDIT STAMP for the INSERT.  If you don't mind me saying, it was quite a convoluted one at that too with numerous fields.  This AUDIT STAMP required a User, Date, Time, Job Number and Job Name among other fields.  With 2E all of these are simple to MAP and are available to us via the JOB context.

My task was to implement the data without adding to the setup/fix program maintenance burden and on this occasion we have a few choices.

  1. Write a document explaining to the implementer how I want them to add a record. Either by a YWRKF hack or perhaps a user screen.
  2. Give them a small data file and ask them to CPYF the records into the target file as *ADD.
  3. Write that 2E program I spoke of above and get them to call it.

Or, as I did in this instance.  

    4. Simply give them a line of SQL I wanted them to execute.

Sweet, job done, but how do I initialise the 2E formatted date, or get the current system date and time from the machine and, where do I get the JOB details from that are so easily available within 2E code via the JOB context.

The answer is as follows: SQL has what it calls global variables (Blue) and functions (Red).  It just turns out that some of these are useful for me to update my fields in the AUDIT STAMP.

In order to initialise my fields I just substitute my insert data with the following:-

User Profile = USER
Date = DEC(CURDATE())-19000000
Time = DEC(CURTIME())
Job Number = LEFT(JOB_NAME,6)
Job Name = SUBSTRING(JOB_NAME, LOCATE('/',JOB_NAME,8)+1,10)

You will note that to get the 2E DTE format we simply minus the 19000000.  For the standard TIME and USER fields we just use the variables without any manipulation.

For Job Number we need to extract the data the JOB_NAME variable.  Usually this is formatted as 123456/LDARE/QPADEV0001 for example.  However, for people with longer names it could be 654321/PLONGNAME/EOD, also the job running the task may not be an interactive screen if it was submitted for example.

This means that to get the job name data from the JOB_NAME SQL global variable we need to call a little function that locates and substrings all the data after the 2nd '/'.  Given that the JOB_NUMBER is always 6 long I've just arbitrarily started from the 8th character.  I could have embedded more LOCATE calls to replace this hardcoded value, but as always, time was against me.

Hopefully, in the future if you need to update a table via SQL, but were put off by now knowing how to get at some of these global variables, the above will stick in your mind.

Thanks for reading. 

Lee.

Monday, July 11, 2011

Data Generation

We’ve had the ‘Ice Age’, ‘Bronze Age’, ‘Iron Age’ and now we are firmly entrenched in the technology era otherwise known as the ’Information Age’. We’ve had generation X and generation Y and even the ‘Generation Game’ with a plethora of celebrity hosts.

I guess I could affectionately refer to ‘now’ as the ‘Data Generation’

This is a subject I have spent quite a bit of time on recently. Everyone knows how hard it is when creating our datasets for testing our application(s). It’s quite okay if we have a mature application or scrambled ‘Production system’ data we can draw upon.

But!

What about those new systems and new modules. Really, do you want to input 200 locations for that new website you have created in manually? Do you want to try and come up with 50 unique customer names?

I bet $100 dollars that everyone has test data like ‘Mr Smith’, ‘Tom and Jerry’ etc who lives in ‘My Street’ or ‘The White House’. Sure, some of us may have written a program or two to create those datasets like ‘Location 001’ to ‘Location 999’. These approaches are all well and good but what about when you want demonstrate your new functionality to the CEO or perhaps to a prospective client if you are a software house.

Sylvester Stallone, 15 Kick Arse Street, Disneyland, BH 90210

is not going to cut the mustard or is

Mr Me, 21a My Street, My Tow, My City, ZP12345.

“Just use the live data”, I hear you say.

Well this would work as the data would survive most (if not all) of your validations, look good and is 100% realistic. But this doesn’t help with your marketing screen prints, online or in-house demonstrations, not to mention if someone presses the ‘send email’ button and Mr Smith from Browns Bay, Auckland, New Zealand gets a letter about his account overdraft or product that hasn’t been ordered.

So there are as many pitfalls with real-world datasets too and I’ve only brushed aside privacy laws and data etiquette.

Then there is application stress testing and performance!

Do you really want to enter 10m records into a database table? I needed to achieve this the other week to test out some SQL. Are you really going to write programs to populate these datasets and have expensive programming resources create your data for you? Well I hope not. Are you going to have multiple people performing this role in your organisation?

• Imagine having a centralised tool that could help you roll out data to multiple databases via direct connections or ODBC.
• Imagine creating realistic datasets from pre-defined lists or ‘Customer lists’
• Imagine being able to put 10m records into a database table in under 5 minutes.

Well this is not Hollywood fantasy or a pipe dream from a developer living on the edge of reason.

It’s reality.

After a quite a few years away from day to day programming I recently had a need to enter test data into a simple table. I wanted to enter 70 records into a table which had a number surrogate key, description, date and amount. This was to test some basic functionality I had written. The 70 records limit was to ensure I had gone beyond the default of 64 records for CA Plex BlockFetch functions.

Using the SQL Server Management Studio and wanting to key in sequential meaningful data it took me the best part of 15 to 20 minutes to the enter records like:-

1, Grid 1 - Description 01, 2001/01/01, 1.11
2, Grid 1 - Description 02, 2001/01/02, 2.22

Etc

I then needed to do this in Grid 2 as I was testing some dual grid code I had written.

1, Grid 2 - Description 01, 2001/01/01, 1.11
2, Grid 2 - Description 02, 2001/01/02, 2.22

So, I copied the data to the second table and manually changed the Grid 1 to Grid 2. Perhaps if I had better MS SQL knowledge (Certainly I intend to improve it) then I might have been able to do this with a relatively simple UPDATE SQL. If act, the answer is yes I now can. However, the point is I was dealing with a new database to me. CA Plex allows us to target many platforms so I could have come unstuck in MS SQL Server as easily as on mySQL via ODBC or any other database including IBM i DB2.

Do I want to become expert on all these DBMS’s. The simple answer is Yes and No.

• Yes. I want to become more familiar with the DBMS, how they perform and to tune application code and support the day to day administration and running of the systems.
• No. I don’t want to manually create data and use SQL scripts, imports etc and know the syntax for many DBMS types just for entering some ‘poxy’ test data.

So, I want to channel my learning to what is important to be.

So, how did I solve my issue?

Well I asked a dear friend of mine who I chat to most days. Mr Google. I entered ‘Data Generators’ and I downloaded 3 or 4 of them that appeared in the first page. Nobody goes to page 2 anymore, do they?

I looked at three products. (Note these are the home pages as I know over time they sometimes rebrand products or the links change and the blog post goes out of date). Suffice to say they all do data generation among other things.


I quickly discarded datanamic’s tool. Compared to the others it just didn’t stack up functionality wise. I’d go as far as saying it wasn’t worth the effort getting my corporate firewall open for it.

http://www.redgate.com/ and http://www.sqledit.com/ were different stories. Both tools are excellent and easy to use. I would say that the redgate’s tool looks more polished and was easier to understand the flow and interface. SQLedit’s tool catered for a larger number of databases natively and many via ODBC. Red-gate’s is for MS SQL Server. If targeting one (MS SQL Server) I’d go for that as they will be bale to tightly integrate with SQL Server versions and as they have a whole host of products they will be kept honest my Microsoft changes.

But!!!!!, I use CA Plex and I needed to target other DBMS also (two currently), MS SQL Server and of course IBM i (DB2400). I am at a 2E shop (originally) and therefore need to reach this platform also. I have also recently worked on mySQL with Plex ODBC so the need to hit these DBMS’s was real and present. Therefore, I purchase the SQLEdit tool.

With both tools I quickly spotted a bug or two or had questions for the support teams about functionality and use. Considering I was on the trial version (not that this should matter) the support and service I received from both companies was absolutely first class. I feel like I can add (Paul/Lynda of Red-gate) and (Igor of SQLEdit) to my Christmas card list (the personal one).

Fixes were created and sent to me within days (by both companies) or in the case of CYYMMDD data generation support for my traditional 2E DTE formats in about 5 hours from Igor and his team. I was simply blown away by the agility of both companies.

The tools are priced within $100 USD of each other for the top end features and the comparable version at $249 USD each makes these tools a steal as does the exchange rate here in NZ 

I will never ever ever ever ever ever ever (just getting my point across) manually create datasets again.

For me these tools have the following benefits and I haven’t even fully explored some of the other functionality within them. i.e. generation of referential integrity sounds datasets.

• Quick large datasets
• Patterns, Imports and rules based
• Cheap than doing it manually
• Can leverage existing data (Scramble – SQLEdit)
• Ability to create a centralise data pool for multiple developers, systems etc

Remember that realistic datasets also help us to identify systems limits (overflowing arrays etc) and performance bottle necks in the development cycle and not post implementation where the cost is significantly higher in terms of ‘budget to fix’ and ‘reputation’.

These tools (or something) similar should be on every developer’s tool belt and if you haven’t got one or are hiring someone who hasn’t. Think again. If you are having software developed for you and your provider doesn’t have these tools in their development methodology, get a new supplier!

Perhaps you should contact me!

Thanks for reading.
Lee.

Post Edit:  I have also just discovered another mainstream generator called Data Maker from http://www.grid-tools.com/.  I have requested pricing which to me means that it'll be outside of the scope of the tools mentioned in this blog and many of our budgets.  If I can get an evaluation and form a meaningful opinion I will post in the future.