Irish Social Science Data Archive
Q. On looking at the dataset in detail, we found the following errors:
* Day 452 has 2 missing entries numbered 2 and 3
* Day 669 has 2 extra entries numbered 49 and 50
* Day 298 has 2 extra entries numbered 49 and 50
Can you please tell us how can we correct these data entries.
Q. I have a question concerning the daylight saving days. I realized that the number of data per day could be different on those days, and I would like to be sure what was the convention you chose in the data file.
A. Yes, GMT day light saving changes will need to be taken into account when you are analysing the data. Basically you should have 2 less data intervals on one day & 2 additional data intervals on another day – you’ll need to check the relevant dates that fell within the dataset period to identify the days.
Q. The documentation for the Smart Electricity Meter data states that the data format should include a field DAYHH which contains a half-hour value from 1-48 in the last 2 digits. However, we have found many records containing HH values of 49 and 50. Since this is invalid, we are unsure how to handle this data.
A. We have discovered the cause of the 49/50 readings. These refer to Sunday October 31st 2010, which was the end of Daylight Savings Time – therefore this date had 25 hours associated with it (50 meter reads). Since meter reads are consecutive, and the DST fall back occurred at 02:00, reading number 4 is for 2am, reading number 6 is for the next 2am (the end of DST), reading number 8 is for 3am, reading number 10 is for 4am….. etc.
Q. I wonder whether it may be possible to be told which meters are on the same feeders in the trial.
A. We are unable to publish any additional data from the Smart Metering Trials above & beyond that which we have already published via the ISSDA. All personalised data was removed in order to preserve anonymity.
Q. I understand that in the interests of privacy and security we can't possibly be given the exact location of the smart meter data but is there a more low resolution location we could be given instead? For instance, knowing the town (or even region) would be most helpful and it wouldn't compromise the anonymity of the data.
Q. There are no postal codes associated with the IDs so we were wondering what part of Ireland was used in the survey – we are combining then with the relevant weather data of the time.
A. Unfortunately meter location data was not included in the published data set and we are unable to publish any additional data from the Smart Metering Trials above & beyond that which we have already published via the ISSDA. All personalised data including meter location information was removed in order to preserve anonymity. Hence making meter locations data, or any other data removed from the original data sets, available is not viable.
Q. We found out that there are some residents who filled out a questionnaire but have code 3 ("other") in the user allocations file. Similarly, some users have code 1 ("residential") in the user allocations file but no questionnaire exists. What does "other" mean in that context? Are we right assuming that each customer who filled out a residential questionnaire is a residential customer?
A. You are correct in stating that every respondent who completed the residential questionnaire is a residential customer. Code 3 is used for participants who did not complete the trial (attrited, excluded for technical reasons etc). Some of these may have completed the pre-trial survey, other may not and some who completed the pre-trial survey attrited afterwards and hence are not included in the trial analysis (i.e. code 3).
Q. On your website you mention that "over 5,000 Irish homes and businesses" were participating in the Electricity Customer Behaviour Trial. Could you please tell me how many of these were residential?
A. 5029 in total
Q. Is there any information on which households participated in both the electricity and the gas trial? Is there a mapping available from households of each trial? Or are the two trials completely independent (e.g., since they took place in different regions)?
A. The electricity & gas behavioural trials were completely independent – if there was any household that participated in both trials (we are not aware of any) then it would have been random.
Q. Q410, Q420, Q430 are not clear to us - does Q420 ask for the number of additional adults or the total number of adults (i.e. is the interviewee included or excluded of Q420 and Q430)? We found 40 interviewees where Q420 > Q410, which confuses us in this context. It would be very helpful to obtain some more information on the questions regarding the number of persons in a household.
A. In some cases, there may be a valid reason for additional adults to be present during the day (child minders, care workers, secondary school children etc). The approach during cleaning/analysis was to cap the adults present during the day to the number of adults reported as living in the house. To calculate the total number of persons in the household, use the following formula:
IF Q410=1 THEN 1
ELSE IF Q410=2 THEN Q420
ELSE Q420+Q43111
Q. I have a question about how to interpret answers to questions 402 and 4021 in the questionnaire (Residential, PRE-Trial):
As a response to these two questions, I find that either one of these two questions has been filled out for each household. In case question 402 has been answered, it only contains numbers between 1 and 6 instead of the actual income (e.g., 50000).
How can I interpret these answers to question 402? Did you anonymize these exact values according to the scale provided in Q4021?
Question 4021 is only asked [IF Q402=9999999]
A. A row full of 9’s means the data is missing, i.e. the respondent did not answer this question. If the respondent did answer Question 402, the surveyor did not proceed to Q4021 as it would be redundant. Therefore it was skipped and the surveyor proceeded to Question 403. People were prompted for their income, and this was then thrown into one of the six categories as described below. If they did not answer, the categories were suggested to them. The best way to utilise this data is to merge the two columns into one, removing the blank entries.
Q. Some residents have floor size "999999999" in their questionnaire. Currently we treat them as invalid entries. Beyond that, there are 38 residents whose floor size does not make sense (between 1,200 and 10,000 square meters). All of them entered "Square meters" in the question that follows the question regarding the floor size. Thus we assume the interviewers accidentally entered square meters here, because when converting these values into square feet they absolutely make sense.
A. Multiple '9' is the code for refused/don't know. The 38 large floor size figures is square feet, and this was adjusted as part of the data cleaning routine.
Q. I have some queries about the CER Electricity CBT data revised version of March 2012. In regards to the data recorded from the smart meters, the data recording seems to commence for most meters on the “195 day” of the trial corresponding to the 14th July 2009. The data recording continues to the 218 day (6th August 2009), for each meter. Is this the case? Were the trials of this length only or is it a case of MS Excel importing the data erroneously?
A. The electricity trial officially started in July 2009 with a ‘benchmark’ period running to 31 Dec 2009. The ‘active’ period of the trial then ran from 1 Jan to 31 Dec 2010. The files you received should contain data for all the meters for this entire period July 2009 – Dec 2010. Refer to the File Manifest that accompanies the data for further details. Also the trial findings reports published on www.cer.ie, & referenced on the ISSDA Web page, will provide you with further descriptive information on the trial design & implementation.
Q. Is it possible to find out what the residential tariffs, at the time of the study, were terms of pricing/ tiers etc?
A. There are a number of reports on the CER website relating to the Smart Metering Trials. This one specifies the tariffs during the trial: Submission on The Time-of-Use Tarriffs
Q. Can you point us to websites or other places where we can find Irish tariff information?
A. You can try the following sites:
energycustomers.ie (CER's own site)
Q. Understandably the files are quite large in size and this may be the problem. We are attempting to work with this data using MS Excel. The import wizard is will only recognise 1,048,576 rows of data at once. The data wizard is then repeated to get the remainder of information from the text file, however this does not result in additional information. Is all the relevant data for a meter (say meter ID1000) in the same .text file?
A. The data for each meter is spread across several date files & that the problem you are having is likely due to the limit in Excel. In their experience Excel cannot cope with a file of that size - you will need to use some other software package, such as SAS, designed for handling very large data sets. Once the data is summarised, it can be manipulated very well in Excel.
Q. I would like to use real consumption profile traces in order to classify the consumers, creating models of normal behaviour, and simulate tampering attacks to be detected by my methodology. Would that use be in compliance with the agreement?
A. That should be no problem at all.
Q. Could you please confirm that the scope of "intellectual property rights relating to the data" does not extend to intellectual property derived from the use of the data. In other words, for example, it doesn't apply to analysis techniques that we may apply, adapt and/or validate on the data.
A. The CER does not intend to have any intellectual property claims over the work that is produced by third parties utilising the anonymised smart metering trial data.
Q. I have acknowledged you, “The Commission for Energy Regulation (CER)” as the source in my reference list. I would like to mention you in the paper’s acknowledgement section as well.
A. Acknowledgement of CER and ISSDA is sufficient for any data used. Please refer to Clauses 8 and 10 in the End User Licence.
Q. Is there a maximum retention period for this dataset?
A. No - there is no maximum retention period for this dataset. However, please note that on the Data Request Form supplied by ISSDA you do need to state what the data is used for and a projected end date. Re-use of the data for other purposes requires you to re-apply under a new agreement. You also need to maintain a log of all users of the data (as per the agreement). We would like to know the number of users of the data, for statistical purposes.
Q. Having input each of the 6 Files containing 3 columns of data, I have discovered that some entries in the second column (day and halfhour) are unusual. While each entry in the second column has to follow the structure specified in the description (DAYHH where DAY is the day number, while HH=1…50 corresponds the half-hour incl. DST), I found some entries where the last two digits HH excel up until HH=95 instead of HH=48 or HH=50 as one would expect. Regarding the time code, I discovered that 6313 meters exceeded the value of 48. Does this mean that there was something like a maintenance week around day 300 or there were other irregularities? And what about day 669?
A. Response from CER: Unfortunately we do not know why this has happened.
Q. Can I gain access to the Research Microdata File (RMF) for this dataset?
A. Unfortunately, the RMF of this dataset is not available to users. The data sent to users is complete in that there is no further CER data available to users at this point in time. Should there be any update to this dataset this shall be announced through the ISSDA website.