Daily Bulletin Archive

Feb. 14, 2018

Data stories could be told by anyone who could understand and work with data, and the stories could be about any issues that are pertinent to the storyteller. The diversity of the data being used by the broad range of data users is a key factor that makes data stories engaging.

It is important to note that a storyteller is also a data user, and to be a data user, data must be shared and made accessible first. The more types of data that are made available, the higher the possibility that someone can create a compelling story by using data.

The DASH Search system from the Digital Asset Services Hub (DASH) is NCAR’s new metadata registry that facilitates the discovery, identification, and understanding of the research products and output from NCAR labs via a centralized system. The DASH Search system uses the NCAR Dialect to describe and record the resources that are available from NCAR. Once the metadata records of the available resource are submitted to the DASH Search, a potential user could effectively and efficiently locate the desired data using the information in the metadata records. Continuing to increase the access of NCAR’s data via the DASH Search system will help in communicating our science to our community and beyond, including through data stories.

To learn more about DASH Search, please visit https://data.ucar.edu/ or if you would like to submit a metadata record of your data to DASH Search, please contact us at datahelp@ucar.edu.

Day 4’s post will discuss “Connected conversations.”

Feb. 13, 2018

Before using data to tell a story, the data should be evaluated for its quality. Although data quality can be difficult to measure, quality attributes of the data, including completeness, accuracy, credibility, and consistency, are key for building a trustworthy story. Without high-quality data, readers could easily lose confidence in the story, or worse yet, quickly deem the story and its data as hearsay.

In order to achieve high- quality data and mitigate the chance for the data to be misused, it is critical to also have high-quality documentation or metadata for the data. At NCAR, the NCAR Dialect is the designated metadata standard used by the Digital Asset Services Hub (DASH) services, including the DASH Search system. The NCAR Dialect is a customized metadata schema that is designed based on international metadata standards for scientific data. The NCAR Dialect is capable of recording in-depth descriptions to assist with data understandability as well as capturing information that is essential for identification and discovery of the assets. The DASH Search Request to Submit Form demonstrates the elements that are included in the NCAR Dialect.

To learn more about the NCAR Dialect or if you would like to submit a metadata record of your data to DASH Search, please contact us at datahelp@ucar.edu.

Coming up for Day 3 is a post on “Telling Stories with Data.”

Feb. 12, 2018

While sharing research results with one’s identified discipline(s) is crucial for advancing specific studies, communicating scientific discoveries outside of one’s immediate science community could often bring major breakthroughs beyond the initial designs or intends of the original research. In particular, allowing the public to understand and even participate in science could help promote support for science, including the development of new policies, funds, and education programs.

Among the different options for communicating science, using data to tell a story or telling a story that is backed up by data can help a scientific issue to become more personal and relatable, and therefore, more actionable. In order to begin telling a data story, however, one needs to know what data are available, and data management is a vital method for organizing data and allowing data to be preserved for use/re-use.

The Digital Asset Services Hub (DASH) offers a variety of data management services, including the DMP Preparation Guidance and Template Document and DMP Checklist for Awarded Proposals. The Data Curation & Stewardship Coordinator could also help in providing consultation for data management questions or issues. Please contact us at datahelp@ucar.edu if you are interested in learning more.

Stay tuned for the Day 2 post, “Stories about data”!

Feb. 9, 2018

Love Your Data (LYD) Week 2018 is an international event coordinated by academic libraries and data archives to promote research data as being “the foundation of the scholarly record and crucial for advancing our knowledge of the world around us.”

This year’s theme for LYD week is “data stories.” In support of LYD week (Monday, February 12, to Friday, February 16), NCAR’s Data Curation & Stewardship Coordinator will share one post a day discussing how the Digital Asset Services Hub (DASH) as well as its resources and services could help with the following topics:

  • Monday: Why data stories?

  • Tuesday: Stories about data

  • Wednesday: Telling stories with data

  • Thursday: Connected conversations

  • Friday: We are data

Stay tuned for these LYD posts  next week, and please feel welcome to get in touch with DASH at datahelp@ucar.edu if you have any questions, need additional information, or would like to talk more about data and data-related topics with the Data Curation & Stewardship Coordinator.

Feb. 8, 2018

What’s the difference between running Cheyenne jobs efficiently and inefficiently? The CISL Consulting Services Group (CSG) recently encountered a case where revising a batch script select statement made a huge difference.

A WRF user was running simulations on 60 Cheyenne nodes, intending to use all 36 cores of each node with 4 MPI processes and 9 OpenMP threads per process. The following select statement likely would have been fine if the user hadn’t compiled WRF with the dmpar option, which enables only distributed-memory MPI support, instead of dm+sm, which enables both MPI and OpenMP support:

#PBS -l select=60:ncpus=36:mpiprocs=4:ompthreads=9

With an assist from CSG, the user modified the select statement as follows to use 36 MPI processes, and jobs that ran at 10.8% efficiency now run at more than 99%:

#PBS -l select=60:ncpus=36:mpiprocs=36:ompthreads=1

Improvements like that can make your allocation go a lot farther. Ask yourself if some of your jobs run significantly slower than you think they should. Do you unexpectedly run out of wall-clock time? Take another look at how you’re requesting resources in your job script (and how you compiled your code), and don’t hesitate to contact CSG for assistance.

Feb. 8, 2018

CISL has installed new versions of Python (2.7.14 and 3.6.4) for users of the Cheyenne system, with new functionality for loading NCAR-provided Python packages. Users now load all of the latest packages at once by running a new ncar_pylib script that activates the NCAR package library in a virtual environment. Packages for earlier versions of Python can be loaded only with module load commands.

Implementing virtual environments enables users to quickly access multiple versions of their package-development codes. Users who want to customize their Python environment can simply clone the package environment as a starting point, then make modifications. The new approach also will help users avoid errors when installing their own packages by using the virtual environment rather than home directories on GLADE.

Python 2.7.14 and 3.6.4 and the NCAR package library methodology will become the default on the Cheyenne system in February, on a date to be announced. The CISL Python documentation page has been updated to describe the new procedures.

Feb. 6, 2018

Cheyenne will be unavailable on Tuesday, February 6, starting at approximately 7 a.m. MST to allow CISL staff to update system software components. The outage is expected to last until approximately 6 p.m. but every effort will be made to return the system to service as soon as possible.

A system reservation will prevent batch jobs from executing after 7 a.m. All batch queues will be suspended and Cheyenne’s login nodes will be unavailable throughout the update period. All batch jobs and interactive processes that are still executing when the outage begins will be killed.

Jobs and interactive sessions that are running on the Geyser and Caldera clusters when the update period begins will not be interrupted but users will not be able to log in to or submit new jobs to those systems until Cheyenne is returned to service. Users who need access to the Geyser or Caldera systems on Tuesday are advised to initiate an interactive session before 7 a.m. on Tuesday.

CISL will inform users through the Notifier service when Cheyenne is restored. We apologize to all users for the inconvenience this will cause and thank you for your patience.

Feb. 6, 2018

Cheyenne downtime: Feb. 6 8 am. to 8 pm.

No downtime: HPSS, GLADE, Geyser_Caldera

Feb. 2, 2018

Registration is now open for a free, four-hour tutorial on wrf-python from 8 a.m. to noon on Wednesday, March 7, at the NCAR/UCAR Corporate Technical Training Center (CG-2) in Boulder. The tutorial is a beginner-friendly introduction to wrf-python for users of the Python programming language. Seating is limited to 16 students. The deadline for registration is February 21. See this link for more information and registration.

Jan. 30, 2018

HPSS downtime: Tuesday, January 30 from 7:00 am - 11:00 am MST

No downtime: Cheyenne, GLADE, Geyser_Caldera