Skip to content

CLOUDSTACK-10140: Fix for when template is created from snapshot template.properties are corrupted#2322

Merged
yadvr merged 1 commit into
apache:masterfrom
bwsw:master
Nov 28, 2017
Merged

CLOUDSTACK-10140: Fix for when template is created from snapshot template.properties are corrupted#2322
yadvr merged 1 commit into
apache:masterfrom
bwsw:master

Conversation

@bwsw
Copy link
Copy Markdown
Contributor

@bwsw bwsw commented Nov 13, 2017

No description provided.

@rafaelweingartner
Copy link
Copy Markdown
Member

@bwsw can you provide more details about this issue?
I read the #CLOUDSTACK-10140 Jira ticket and also the code, but I am still not understanding what is being done here.

I have some doubts here, for instance in NfsSecondaryStorageResource.java you stopped writing the size to the property, but I do not understand why.

@bwsw
Copy link
Copy Markdown
Contributor Author

bwsw commented Nov 17, 2017

Hi, @rafaelweingartner
Basically, I started from this one:

https://github.com/apache/cloudstack/pull/2320/files#diff-71901db20e3d4adb6711e69c1c1ca2ba

it was found by @rhtyd and I fixed Id from 1 to actual code. I don't include it into PR because master already includes it. Next I went to debugging and opened the issue (CLOUDSTACK-10140) you saw in Jira.

root@cs2-secstorage-nfs1:/secondary/template/tmpl/2/209# cat template.properties 
uniquename=209-2-3accb15b-8647-356b-9791-9e2f1ec36b6b
filename=bbab2601-050d-4c18-ae35-00f2ba44de28.qcow2
size=3254517760
qcow2.filename=bbab2601-050d-4c18-ae35-00f2ba44de28.qcow2
qcow2.virtualsize=10737418240
public=true
id=1

ID is fixed by code from 1st diff, but other information is incorrect including size (I don't see it in my snippet but AFAIK "size" field was incorrect and included something like UUID-like with "-").

Next, https://issues.apache.org/jira/secure/ViewProfile.jspa?name=weizhou user provided the patch and everything started to work fine. Might be the problem in buffered IO which leads to corrupted data? Closing of try leads to sync and processor is able to append correctly?

Frankly speaking, I didn't dive into details and it's a shame but It takes a lot of time to move across all the ACS code, because I'm not a "real" java developer. But this patch really works. I patched the 4.10, compiled, deployed into Agent, recreated SSVM and templates started to work properly. Basically, I recommend to consult Wei Zhou about his patch.

@rafaelweingartner
Copy link
Copy Markdown
Member

Thanks for the information.
Can you also explain something else? What do you mean with template.properties?
Is that it was not being created? Or maybe the information contained in the file was not accurate?

@ustcweizhou
Copy link
Copy Markdown
Contributor

@rafaelweingartner yes.
the content of template.properties is not correct, if the template is create from snapshot.
After restarting ssvm, the template will be removed from database but still exist on secondary storage, because the template.properties is not correct.

We had this issue a year ago and fixed it by the patch I posted.
It is a regression bug caused by commit bb2c02d

In short, the template.properties should be saved before going to the next step.

+                }
+                try {

@rafaelweingartner
Copy link
Copy Markdown
Member

@ustcweizhou thanks!
Now thinks are a little bit more clear.

Copy link
Copy Markdown
Member

@rafaelweingartner rafaelweingartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the code seems to be ok.
The classes that are being changes need a little bit of love.. but that is the case for some other PR.

LGTM

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Nov 22, 2017

@bwsw do a git pull --rebase <asf/upstream remo> master please

@rafaelweingartner
Copy link
Copy Markdown
Member

@bwsw your rebase did not work because you applied a rebase on a merge that you had already done previously, right?

It is easier to checrry-pick the first two commits on top of a clean master branch of yours and then push it here.

Conflicts:
	services/secondary-storage/server/src/org/apache/cloudstack/storage/resource/NfsSecondaryStorageResource.java
@bwsw
Copy link
Copy Markdown
Contributor Author

bwsw commented Nov 23, 2017

@rhtyd done
@rafaelweingartner thanks a lot.

bufferWriter.write("filename=" + fileName);
bufferWriter.write("\n");
long size = _storage.getSize(destFileFullPath);
bufferWriter.write("size=" + size);
Copy link
Copy Markdown
Member

@yadvr yadvr Nov 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwsw should we not write size? Is it because, we're going to save size at line 560/556?

Copy link
Copy Markdown
Member

@yadvr yadvr Nov 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a non issue, ignore.

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Nov 27, 2017

LGTM
@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-1306

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Nov 27, 2017

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-1702)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 35316 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2322-t1702-kvm-centos7.zip
Intermitten failure detected: /marvin/tests/smoke/test_deploy_vms_with_varied_deploymentplanners.py
Intermitten failure detected: /marvin/tests/smoke/test_host_annotations.py
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: /marvin/tests/smoke/test_router_dnsservice.py
Intermitten failure detected: /marvin/tests/smoke/test_ssvm.py
Intermitten failure detected: /marvin/tests/smoke/test_volumes.py
Intermitten failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Test completed. 63 look OK, 4 have error(s)

Test Result Time (s) Test File
test_01_vpc_remote_access_vpn Failure 55.93 test_vpc_vpn.py
test_07_resize_fail Failure 15.39 test_volumes.py
test_04_rvpc_privategw_static_routes Failure 203.14 test_privategw_acl.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 97.48 test_privategw_acl.py
test_02_vpc_privategw_static_routes Failure 117.59 test_privategw_acl.py
test_01_vpc_privategw_acl Failure 121.95 test_privategw_acl.py
test_05_add_annotation_for_invalid_entityType Error 0.06 test_host_annotations.py
test_change_service_offering_for_vm_with_snapshots Skipped 0.00 test_vm_snapshots.py
test_09_copy_delete_template Skipped 0.01 test_templates.py
test_06_copy_template Skipped 0.00 test_templates.py
test_static_role_account_acls Skipped 0.02 test_staticroles.py
test_11_ss_nfs_version_on_ssvm Skipped 0.02 test_ssvm.py
test_01_scale_vm Skipped 0.00 test_scale_vm.py
test_01_primary_storage_iscsi Skipped 0.11 test_primary_storage.py
test_vm_nic_adapter_vmxnet3 Skipped 0.00 test_nic_adapter_type.py
test_03_nic_multiple_vmware Skipped 1.05 test_nic.py
test_nested_virtualization_vmware Skipped 0.00 test_nested_virtualization.py
test_06_copy_iso Skipped 0.00 test_iso.py
test_list_ha_for_host_valid Skipped 0.02 test_hostha_simulator.py
test_list_ha_for_host_invalid Skipped 0.02 test_hostha_simulator.py
test_list_ha_for_host Skipped 0.02 test_hostha_simulator.py
test_hostha_enable_feature_without_setting_provider Skipped 0.01 test_hostha_simulator.py
test_hostha_enable_feature_valid Skipped 0.02 test_hostha_simulator.py
test_hostha_disable_feature_valid Skipped 0.02 test_hostha_simulator.py
test_hostha_configure_invalid_provider Skipped 0.02 test_hostha_simulator.py
test_hostha_configure_default_driver Skipped 0.04 test_hostha_simulator.py
test_ha_verify_fsm_recovering Skipped 0.02 test_hostha_simulator.py
test_ha_verify_fsm_fenced Skipped 0.01 test_hostha_simulator.py
test_ha_verify_fsm_degraded Skipped 0.01 test_hostha_simulator.py
test_ha_verify_fsm_available Skipped 0.02 test_hostha_simulator.py
test_ha_multiple_mgmt_server_ownership Skipped 0.02 test_hostha_simulator.py
test_ha_list_providers Skipped 0.01 test_hostha_simulator.py
test_ha_enable_feature_invalid Skipped 0.02 test_hostha_simulator.py
test_ha_disable_feature_invalid Skipped 0.01 test_hostha_simulator.py
test_ha_configure_enabledisable_across_clusterzones Skipped 0.02 test_hostha_simulator.py
test_configure_ha_provider_valid Skipped 0.02 test_hostha_simulator.py
test_configure_ha_provider_invalid Skipped 0.02 test_hostha_simulator.py
test_deploy_vgpu_enabled_vm Skipped 0.03 test_deploy_vgpu_enabled_vm.py
test_3d_gpu_support Skipped 0.03 test_deploy_vgpu_enabled_vm.py

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Nov 28, 2017

Lgtm, ignoring intermittent known failures.

@yadvr yadvr merged commit bec48c6 into apache:master Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants