Project

General

Profile

Bug #445

theaded file i/o bug

Added by Berk Hess over 9 years ago. Updated over 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Erik Lindahl
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

When multiple threads open the same trr file through read_first_frame
the trnheader gets corrupted. On my machine on one test system I consistency
get or correct output or corruption sh->pres_size and sh->v_size in do_trnheader.

This is no longer a critical issue, since I just removed the only occasion
in the code where multiple threads would open the same trr file.

Berk

md.c (95.9 KB) md.c md.c that reads the rerun trajectory on all nodes Berk Hess, 06/23/2010 01:46 PM

History

#1 Updated by Rossen Apostolov over 9 years ago

Sander,

Did you latest patches fixed that?

#2 Updated by Sander Pronk over 9 years ago

(In reply to comment #0)

When multiple threads open the same trr file through read_first_frame
the trnheader gets corrupted. On my machine on one test system I consistency
get or correct output or corruption sh->pres_size and sh->v_size in
do_trnheader.

This is no longer a critical issue, since I just removed the only occasion
in the code where multiple threads would open the same trr file.

Berk

This should have been fixed now; is there a way to test concurrent I/O operations with the current git/master?

#3 Updated by Berk Hess over 9 years ago

I checked it with the old md.c rerun code that reads the -rerun trajectory
on all nodes and it fails on 4 threads 50% of the time with:
xdrclose: no such open xdr file
through the close_trj call in md.c.
In all cases the results are 100% correct, so the problem seems to be only
in the closing of the file.

I think this is a bug in threaded i/o.
I have attached the md.c that reads the trajectory on all nodes.
You should be able to take a random system with a few trr frames
and run rerun on it to reproduce the error.

Berk.

#4 Updated by Berk Hess over 9 years ago

Created an attachment (id=481)
md.c that reads the rerun trajectory on all nodes

#5 Updated by Sander Pronk over 9 years ago

The trouble was XDR I/O operations: they relied on some global variables. The latest git master fixes this.

Also available in: Atom PDF